## HIM 650 Topic 8 DQ 2

answer

Supervised and unsupervised learning are two fundamental categories of machine learning. However, due to the different features of supervised learning and unsupervised learning, these two categories of machine learning have different target functions and analytic techniques. In this paper, we will give a brief introduction to the basic concepts of supervised and unsupervised learning, discuss their target functions, show their analytic techniques and real-world applications.

Duda, Hart and Stork (2001) explain how to analyze unsupervised and supervised learning. Neural networks is among the list of supervised learning as it uses a relationship between inputs and outputs. Networks are also capable of dealing with noise, missing data and data shifts. They use graph theory and cluster analysis for unsupervised learning. Duda, Hart and Stork (2001) provide an example with research done on marriage licenses in Vermont 1849-1910. After the new company was formed to handle marriage licenses, in the first ten years there were several different formats for the marriage records. That created issues for researchers when trying find marriages from before the new company formed. This report provides an understanding of how neural networks are trained for supervised learning and unsupervised learning using Algorithms such as PCA, Kmeans where clustering is used with K-means method.

In this paper we introduce two proposed techniques for analyzing supervised and unsupervised learning. The first technique, based on a mathematical theorem, enables a more exact analysis of the most commonly used supervised learning algorithms. We provide an example to illustrate this technique, proving mathematically that Logistic Regression can be reduced to a Bernoulli process. This follows from the result that the SVM representation of a given set of data is asymptotically bounded by the distribution (e.g., the Chi-squared distribution) obtained by inverting each term in the sum defining the SVM weight vector.

The two most common supervised and unsupervised learning techniques are classification (where we generate rules from data) and regression (where we predict variables from data). Here, we analyze a supervised classifier analysis of email terms in order to construct a spam filter system. Heavily reliant on features that are binary, that is, present or absent, it attempts to classify an email based on its terms and appearances.

Supervised and unsupervised learning are two broad categories that help the data scientist to decide the type of algorithms used to analyze the data. While supervised learning involves the creation of a model based on an existing set of data in order to predict unknown patterns, unsupervised learning requires nothing but a set of data, and helps in discovering patterns or categorizing the data points. The most commonly used algorithms in supervised classification include anomaly detection, decision trees, random forests, boosting, naïve Bayes algorithm, logistic regression etc. The most commonly used algorithms in unsupervised classification include K-means clustering, principal component analysis (PCA), kernel methods such as support vector machines, singular value decomposition (SVD), and independent component analysis (ICA.)

Supervised and unsupervised learning are two methods of statistical pattern recognition. Supervised learning utilizes known information (called a “training set”) to find patterns, usually in datasets from experimentation. The more complex the dataset becomes, the harder it is to manually find the correct rule.

There are two types of machine learning mechanisms: unsupervised and supervised. Unsupervised learning, based on the statistical technique of cluster analysis, forms clusters and offers a means of quantifying the similarity or dissimilarity among the clusters. The clusters can be either homogeneous (all data points in one cluster have the same value) or heterogeneous (points with different values). In supervised mode, note that both the correct output data is used to train a classifier in addition to the input data set [4].

Supervised learning and unsupervised learning use the same algorithms, but the difference between the two is in what they are trying to learn. In supervised learning, data is used to find patterns in a set of data that can be used to predict future events. In unsupervised learning, the data is only used for its informational content, not for predicting future events. An example of an algorithm that uses supervised learning is logistic regression, which is used to predict probabilities that an independent variable will yield a certain dependent variable. An example of an algorithm that uses unsupervised learning is k-means clustering, which clusters inputs based on their similarity.

Supervised learning is the process of applying a function to an existing set of inputs. These inputs are classified and labeled to provide more information about the relationships between input and output values. The key distinction between unsupervised and supervised learning is that the latter requires an additional dataset.

Supervised learning is a form of data mining, which is the process of analyzing and interpreting large data sets that consist of information. Supervised learning relies on well-defined/assumed relationships between inputs (features) and outputs (targets).

question

Discuss the analytic techniques used to analyze supervised and unsupervised learning. Provide an example.