Informática Educativa Carlosfmur@gmail.com: July 2015

Friday 31 July 2015

Machine learning

Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning explores the construction and study of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from example inputs in order to make data-driven predictions or decisions, rather than following strictly static program instructions.

Machine learning is closely related to and often overlaps with computational statistics; a discipline that also specializes in prediction-making. It has strong ties to mathematical optimization, which delivers methods, theory and application domains to the field. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms is infeasible. Example applications include spam filtering, optical character recognition (OCR), search engines and computer vision. Machine learning is sometimes conflated with data mining, although that focuses more on exploratory data analysis.^[6] Machine learning and pattern recognition "can be viewed as two facets of the same field."

When employed in industrial contexts, machine learning methods may be referred to as predictive analytics or predictive modelling.

In 1959, Arthur Samuel defined machine learning as a "Field of study that gives computers the ability to learn without being explicitly programmed".

Tom M. Mitchell provided a widely quoted, more formal definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E". This definition is notable for its defining machine learning in fundamentallyoperational rather than cognitive terms, thus following Alan Turing's proposal in his paper "Computing Machinery and Intelligence" that the question "Can machines think?" be replaced with the question "Can machines do what we (as thinking entities) can do?"

See more https://en.wikipedia.org/wiki/Machine_learning

Data Analysis Examples

The pages below contain examples (often hypothetical) illustrating the application of different statistical analysis techniques using different statistical packages. Each page provides a handful of examples of when the analysis might be used along with sample data, an example analysis and an explanation of the output, followed by references for more information. These pages merely introduce the essence of the technique and do not provide a comprehensive description of how to use it.

The combination of topics and packages reflect questions that are often asked in our statistical consulting. As such, this heavily reflects the demand from our clients at walk in consulting, not demand of readers from around the world. Many worthy topics will not be covered because they are not reflected in questions by our clients. Also, not all analysis techniques will be covered in all packages, again largely determined by client demand. If an analysis is not shown in a particular package,this does not imply that the package cannot do the analysis, it may simply mean that the analysis is not commonly done in that package by our clients.

	Stata	SAS	SPSS	Mplus	R
Regression Models
Robust Regression	Stata	SAS			R
Models for Binary and Categorical Outcomes
Logistic Regression	Stata	SAS	SPSS	Mplus	R
Exact Logistic Regression	Stata	SAS			R
Multinomial Logistic Regression	Stata	SAS	SPSS	Mplus	R
Ordinal Logistic Regression	Stata	SAS	SPSS	Mplus	R
Probit Regression	Stata	SAS	SPSS	Mplus	R
Count Models
Poisson Regression	Stata	SAS	SPSS	Mplus	R
Negative Binomial Regression	Stata	SAS	SPSS	Mplus	R
Zero-inflated Poisson Regression	Stata	SAS		Mplus	R
Zero-inflated Negative Binomial Regression	Stata	SAS		Mplus	R
Zero-truncated Poisson	Stata	SAS			R
Zero-truncated Negative Binomial	Stata	SAS		Mplus	R
Censored and Truncated Regression
Tobit Regression	Stata	SAS		Mplus	R
Truncated Regression	Stata	SAS			R
Interval Regression	Stata	SAS			R
Multivariate Analysis
One-way MANOVA	Stata	SAS	SPSS
Discriminant Function Analysis	Stata	SAS	SPSS
Canonical Correlation Analysis	Stata	SAS	SPSS		R
Multivariate Multiple Regression	Stata	SAS		Mplus
Mixed Effects Models
Generalized Linear Mixed Models	Introduction to GLMMs
Mixed Effects Logistic Regression	Stata				R
Other
Latent Class Analysis				Mplus

See more: http://www.ats.ucla.edu/stat/dae/

Wednesday 29 July 2015

Clustering (español)

Problems
Classification Clustering Regression Anomaly detection Association rules Reinforcement learning Structured prediction Feature learning Online learning Semi-supervised learning Unsupervised learning Grammar induction
Supervised learning (classification • regression)
Decision trees Ensembles (Bagging,Boosting, Random forest) k-NN Linear regression Naive Bayes Neural networks Logistic regression Perceptron Support vector machine (SVM) Relevance vector machine (RVM)
Clustering
BIRCH Hierarchical k-means Expectation-maximization (EM) DBSCAN OPTICS Mean-shift
Dimensionality reduction
Factor analysis CCA ICA LDA NMF PCA t-SNE
Structured prediction
Graphical models (Bayes net, CRF, HMM)
Anomaly detection
k-NN Local outlier factor
Neural nets
Autoencoder Deep learning Multilayer perceptron RNN Restricted Boltzmann machine SOM Convolutional neural network
Theory
Bias-variance dilemma Computational learning theory Empirical risk minimization PAC learning Statistical learning VC theory
Machine learning portal Computer science portal Statistics portal

El análisis de agrupamiento o clustering es la tarea de agrupar un conjunto de objetos de tal manera que los objetos en el mismo grupo (llamado acluster) son más similares (en un sentido u otro) entre sí que a las de otros grupos (clusters). Es una tarea principal de la minería de datos de exploración, y una técnica común para el análisis de datos estadísticos, que se utiliza en muchos campos, incluyendo el aprendizaje automático, reconocimiento de patrones, análisis de imágenes, recuperación de información, y la bioinformática.
El análisis de conglomerados en sí no es un algoritmo específico, pero la tarea general que hay que resolver. Se puede lograr por diversos algoritmos que difieren significativamente en su noción de lo que constituye un clúster y cómo encontrar de manera eficiente. Nociones populares de grupos incluyen grupos con pequeñas distancias entre los miembros del clúster, áreas densas del espacio de datos, intervalos o particulares distribuciones estadísticas. Por lo tanto Clustering puede formularse como un problema de optimización multi-objetivo. La configuración del algoritmo de agrupamiento y de parámetros apropiados (incluidos los valores tales como la función de distancia de usar, un umbral de densidad o el número de grupos esperados) dependen del conjunto de datos individuales y el uso de los resultados previstos. El análisis de conglomerados, como tal, no es una tarea automática, sino un proceso iterativo de descubrimiento de conocimiento o la optimización multi-objetivo interactiva que implica el juicio y el fracaso. A menudo será necesario modificar los parámetros de preprocesamiento y modelo de datos hasta que el resultado alcanza las propiedades deseadas.
Además el término agrupación, hay una serie de términos con significados similares, incluyendo la clasificación automática, taxonomía numérica, botryology (del griego βότρυς "uva") y el análisis tipológico. Las sutiles diferencias son a menudo en el uso de los resultados: mientras que en la minería de datos, los grupos resultantes son el asunto de interés, en la clasificación automática el poder discriminativo resultante es de interés. Esto conduce a menudo a malentendidos entre los investigadores procedentes de los campos de la minería de datos y aprendizaje automático, ya que utilizar los mismos términos y con frecuencia los mismos algoritmos, pero tienen diferentes objetivos.
El análisis de conglomerados se originó en antropología por el conductor y Kroeber en 1932 y se presentó a la psicología por Zubin en 1938 andRobert Tryon en 1939] y famoso utilizado por Cattell a partir de 1943 para la clasificación teoría de los rasgos de la psicología de la personalidad.
ver más: https://en.wikipedia.org/wiki/Cluster_analysis

Cluster analysis

Problems
Classification Clustering Regression Anomaly detection Association rules Reinforcement learning Structured prediction Feature learning Online learning Semi-supervised learning Unsupervised learning Grammar induction
Supervised learning (classification • regression)
Decision trees Ensembles (Bagging,Boosting, Random forest) k-NN Linear regression Naive Bayes Neural networks Logistic regression Perceptron Support vector machine (SVM) Relevance vector machine (RVM)
Clustering
BIRCH Hierarchical k-means Expectation-maximization (EM) DBSCAN OPTICS Mean-shift
Dimensionality reduction
Factor analysis CCA ICA LDA NMF PCA t-SNE
Structured prediction
Graphical models (Bayes net, CRF, HMM)
Anomaly detection
k-NN Local outlier factor
Neural nets
Autoencoder Deep learning Multilayer perceptron RNN Restricted Boltzmann machine SOM Convolutional neural network
Theory
Bias-variance dilemma Computational learning theory Empirical risk minimization PAC learning Statistical learning VC theory
Machine learning portal Computer science portal Statistics portal

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called acluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning,pattern recognition, image analysis, information retrieval, and bioinformatics.

Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It will often be necessary to modify data preprocessing and model parameters until the result achieves the desired properties.

Besides the term clustering, there are a number of terms with similar meanings, including automatic classification, numerical taxonomy, botryology (from Greek βότρυς "grape") and typological analysis. The subtle differences are often in the usage of the results: while in data mining, the resulting groups are the matter of interest, in automatic classification the resulting discriminative power is of interest. This often leads to misunderstandings between researchers coming from the fields of data mining and machine learning, since they use the same terms and often the same algorithms, but have different goals.

Cluster analysis was originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology by Zubin in 1938 andRobert Tryon in 1939^] and famously used by Cattell beginning in 1943 for trait theory classification in personality psychology.

see more: https://en.wikipedia.org/wiki/Cluster_analysis

Informática Educativa Carlosfmur@gmail.com

Search This Blog