Derivatives and Optimization

Why is calculus important in machine learning?

Machine Learning

Machine learning models are essentially designed to make predictions or decisions based on input data, and to measure the accuracy of their predictions, they define something called a loss function (or cost function), which measures the difference between the predicted value and the actual value of the data. Ultimately, the goal of training a machine learning model is to find a set of parameters (weights and biases in the case of neural networks) that minimize this loss function, indicating that the model’s predictions are as close as possible to the actual outcome.

In this process, to minimize the value of the loss function, the concept of differentiation is used

Where derivatives are used

The derivative is a measure for determining how far you need to move to reach the minimum value of the loss function, and is used to calculate the slope of the loss function at a particular point(gradient descent)
When we want to find the model that best fits the data, we calculate the loss function and minimize it, and repeat the process

Model training

The model starts with an arbitrary line ( \rightarrow ) something that looks like ( Wx + b ) and then tweak the results to optimize the best possible prediction for the existing data points

Linear Regression
Classification
- Sentiment analysis (results are expressed as happy/sad/joyful/etc.)

Currently, machine learning problems are handled by many different mathematical techniques.

Derivative = Derivative

Derivative is the instantaneous rate of change of a function.
Derivative of a function at a point = slope of the tangent line at that point
To find the maximum or minimum value of a function, you must find one of the points where the derivative goes to zero.

How to represent derivatives

Lagrange's Notation
\( y = f(x) \rightarrow f'(x) \)
Leibniz's Notation
\( \frac{dy}{dx} = \frac{d}{dx}f(x) \)

Derivatives of line of equation

\(\begin{aligned} f(x) &= ax + b \rightarrow f'(x) = a \\\\ \frac{\Delta y}{\Delta x} &= \frac{a(x+\Delta x) + b - (ax + b)}{\Delta x} \\\\ \qquad a\frac{\Delta x}{\Delta x} &= a \end{aligned}\)

Derivatives - Quadratic Equations

\(\begin{aligned} \text{Quadratics: } y &= f(x) = x^2 \\\\ \;\text{Slope: } \frac{\Delta f}{\Delta x} &= \frac{f(x+\Delta x)-f(x)}{\Delta x} \\\\ &= \frac{(x+\Delta x)^2-x^2}{\Delta x} \\\\ &= \frac{x^2+2x\Delta x+(\Delta x)^2-x^2}{\Delta x} \\\\ &= 2x+\Delta x \\\\ & \text{as } \Delta x \rightarrow 0 \\\\ f(x) &= x^2 \rightarrow f'(x)=2x \end{aligned}\)

Derivatives - Higher degree polynomials

\(\begin{aligned} \text{Cubic: } y &= f(x) = x^3 \\ \text{Slope: } \frac{\Delta f}{\Delta x} &= \frac{(x+\Delta x)^3 - x^3}{\Delta x} \\ &= \frac{(x^3 + 3x^2\Delta x + 3x(\Delta x)^2 + (\Delta x)^3) - x^3}{\Delta x} \\ &= \frac{3x^2\Delta x + 3x(\Delta x)^2 + (\Delta x)^3}{\Delta x} \\ &= 3x^2 + 3x\Delta x + (\Delta x)^2 \\\\ &\, \text{as } \Delta x \rightarrow 0 \\ & \Rightarrow f'(x)=3x^2 \end{aligned}\)

Derivative of \( \frac{1}{x} \)

\[\begin{aligned} Inverse:y=f(x)=x^-1=\frac{1}{x}\\\\ Slope: \frac{\Delta x}=\frac{f(x+{\Delta x})-f(x)}\\\\ \frac{\Delta f}{\Delta x}=\frac{(x+{\Delta x})^-1-x^-1}\\\\ =\frac{\frac{1}{x+{\Delta x}}-\frac{1}{x}}\\\\ =\frac{\frac{x-(x+{\Delta x})}{(x+{\Delta x})x}}\\\\ =-\frac{1}{x^2+x{\Delta x}}\\\\\\\\ as {\Delta x} \rightarrow 0\\\\ f(X)=x^-1 \rightarrow f'(x)=-x^-2 \end{aligned}\]

Derivatives and Optimization

Why is calculus important in machine learning?

Machine Learning

Where derivatives are used

Model training

Related problem types

Derivative = Derivative

How to represent derivatives

Derivatives of line of equation

Derivatives - Quadratic Equations

Derivatives - Higher degree polynomials

Derivative of \( \frac{1}{x} \)

Further Reading

Machine Learning

Statistic

Modulo Operation