Supervised Learning
1.We already know what we want.
2.There are certain rules between inputs and outputs.
Regression Problem
we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function.
Classification Problem
we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.
Examples
1.A function can predict house's price depend on its area.
2.A judgement can predict tomorrow's weather depend on history data.
Unsupervised Learning
1.We know little or no idea about the output.
2.Cluster the data depending on its own characteristics.
Example
Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.
Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment. (i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).
Hypothesis Function
h(x,θ) = θ0 + θ1*x
Cost Function J(x1,x2..)
We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference (actually a fancier version of an average) of all the results of the hypothesis with inputs from x's and the actual output y's.
Gradient Descent
So we have our hypothesis function and we have a way of measuring how well it fits into the data. Now we need to estimate the parameters in the hypothesis function. That's where gradient descent comes in.