Giving computers the ability to learn without being programmed
The three types of machine learning
Supervised Learning
Learning from labeled data so the model predict unseen or future data
Regression
- Predicted value: Predicting a continuous numeric value (an infinite number of outcomes possible)
- Example: Predicting house prices
Linear Regression
Classification
- Predicting the outcome (Y) based on the input (X)
- Predicted value: Discrete categorical values
- Limited number of outcomes possible
- Example: Breast cancer detection
Unsupervised Learning
- Finding interesting patterns in unlabeled data
- It’s not about finding the correct answer for every input, but rather the algorithm discovering on its own what patterns or structures it might fit into
Clustering
Grouping similar data points together
The first key steps to implementing Linear Regression are
- Define a Cost Function
\(\begin{align} J(w,b)=\frac{1}{2m}(\displaystyle\sum_{i=1}^{m}(f_{w,b}(x^{(i)})-y^{(i)})^2 \end{align}\)
w and b are parameters of the model, adjusted as the model learns from the data. They’re also referred to as “coefficients” or “weights”
How to measure the performance of a classification model
The performance of a classification model is typically measured using metrics such as accuracy, precision, recall, and F1 score
\(\begin{align} Accuracy = \frac{TP+TN}{Total\,number\,of\,predictions} \end{align}\)
The proportion of instances that are correctly predicted across all predictions
\(\begin{align} Precision = \frac{TP}{TP+FP}\\ \end{align}\)
The proportion of true positives among the values predicted as True by the model
\(\begin{align} Recall = \frac{TP}{TP+FN} \end{align}\)
The proportion of true predictions made by the model that are actually true
\(\begin{align} F1 Score(Accuracy + Recall) \end{align}\)
Gradient descent
- Basic techniques widely used in machine learning, including advanced neural network models.
- Algorithms to find the minimum value of a cost function by changing the Weight and Bias (parameters) until the cost function has a minimum value.
- If the cost function is not bow-shaped or hammock-shaped, there may be more than one possible minimum.
- Finding the global or local minimum of the function.
- Depending on the starting point and the shape of the cost function, you may have local minima with different gradients, so initial conditions are important
- For linear regression, parameters are often set to zero initially
Gradient descent algorithm
\(\begin{aligned} &w = w - \alpha \frac{\partial}{\partial w} J(w, b)\\\\ &\alpha: \text{Learning rate (size)} (\alpha \text{ always has a positive value})\\\\ &\frac{\partial}{\partial w} J(w, b): \text{Derivative (descent direction)}\\\\ \end{aligned}\)
The partial derivative of the Cost Function with respect to weight indicates the gradient, and the sign of this gradient determines how weights are adjusted. If the gradient is positive, reduce the weight to decrease the value of the cost function;
if the gradient is negative, increase the weight to decrease the cost function value
Reference:
Coursera - Supervised Machine Learning: Regression and Classification
Raschka, S., & Mirjalili, V. (2017). Python machine learning : machine learning and deep learning with Python, scikit-learn, and TensorFlow. In Packt Publishing eBooks. http://202.62.95.70:8080/jspui/handle/123456789/12650
https://sumniya.tistory.com/26