Quick Recap Before Learning Neural Networks

3 min readNov 30, 2020

Artificial Intelligence, Deep Neural Network, and Machine Learning all seems like the craze lately in data science. After tunnel-visioning into multiple documentation and tutorial for awhile I realized that I needed to take a step back and regather my foundation. This is a summary of quick math recaps I needed before I started building neural network models.

Linear Regression

it’s a line!

A linear regression is a prediction method which finds a best fit line. The x represents the an observation of a single feature, the weight of the observation as m, and b represents an offset value. Essentially, a prediction is made by an emphasis of an observation from a baseline.

multiple regression

Multiple linear regression is a combined prediction. In context to the data a multiple linear regression is a weighted sum of individual features from a baseline.

The best fit is determined by the finding a line that has the smallest sum of squares of errors between the prediction and data. Since the data of linear regression assumes that each features are independent the tuning for best fit can be done through gradient decent method.

Dot Product and Vectorization

Now let’s dive into the weighted sum further. Remember that we assumed that each features were independent? That allows us treat each features as vector and use matrix calculation. The weighted sum can be represented as a dot product of observations. The same logic applies to higher dimensions.

Dot product representing all slopes of a multiple linear regression

The offset, b, also can be represented as a vector. In this case it would be a single constant or a scalar, but remember that its dimension needs to grow as the observation grows to fit the linear transformation.

Logistic Regression

If you were to take the linear regression and plug it into a sigmoid function a probabilistic prediction can be made rather than a continuous prediction. The key to remember is that logistic regression is fit between 0 and 1, the output representing a log-odd of observation falling into a binary category.

Again in context to the data, logistic regression is trying to predict the possibility of a binary answer by plugging in a weighted sum of all features in an observation through a sigmoid function.

The math if you were curious

Linear Neural Network Layer

Linear layer in one of foundation in neural networks and performing what we discussed so far a far larger scale is essentially what’s under the hood.

Linear layer calculates the likelihood of a binary outcome by passing a weighted sum (linear transformation) of features through an activation function like a sigmoid function.

As we mentioned before the assumption of feature independence allows usage of gradient decent to change individual weights to reduce errors in prediction.

I hope this clarified some of the mystery of math behind the neural networks. What part of the math stumped you while learning neural networks?

Quick Recap Before Learning Neural Networks

Linear Regression

Dot Product and Vectorization

Logistic Regression

Linear Neural Network Layer

Written by Yung Han Jeong