Each observation is represented by a set of numbers(features).
\
Formally, given training set (xi,yi) for i=1…n, we want to create a classification model f that can predict label y for a new x.
The machine learning algorithm will create the function f for you.
The predicted y for a new x is the sigh of f(x).
Loss Functions For Classificaiton
How do we measure classification error?
Statistical Learning Theory For Supervised Learning
Statistical Learning Theory
- Ockham's Razor: The best models are simple models that fit the data well.
- William of Ockham,English frier and philosopher (1287-1347) said that among hypotheses that predict equally well, we should choose the one with the fewest assumptions.
We need a balance between accuracy and simplicity.
Most common machine learning methods choose f to minimize training error and complexity.
Aims to thwart the "curse" of dimensionality.
Basic Outline for ML
- step 1: Split data randomly into training and test sets.
- step 2: Estimate coefficients/ Train Model:
- step 3: Score model: Compute score for each xi in the test set
- step 4: Evaluate model.
Logistic Regression
simple, fast, often competes with the best ML algorithms.
Another perspective:
Evaluation Measures for Classifiers
ROC Curves
- Started during WWII for analyzing radar signals.
- For a particular False Positive Rate(FPR), what is the True Positive Rate(TPR)?
- FPR = number of negatives that were classified by the ML algorithm as positives / total number of negatives
- TPR = number of positives that were classified by the ML algorithm as positives / total number of positives.
TPR=7/11
FPR=3/11
TPR=3/11
FPR=2/11