Logistic is a classification algorithm based on probability, instead of predicting the class, it gives the log likelihood of being the given class i.e \(p(y/x) = p/(1-p)\)

Let’s look at a little bit of theory about it. We use the same linear equation only difference is, the y is binary, that is either 0 or 1.

\[h_{\theta}(x) = \theta_{0} + \theta_{1}x\]

However we calculate the *y* using the sigmoid function to return a probability value between 0 and 1. \[P(class=1) = \frac{1} {1 + e^{-z}}\] The reason this function returns 0 or 1 for any large positive or negative and the z would be

\[h_{\theta}(x) = 1 / (1 + e^{-z}), z = \theta_{0} + \theta_{1}x\]

**Cost function**

\[J(\theta) = \sum_{i=1}^{n}y_{i}log(h_{\theta}(x_{i})) + (1-y_{i})log(1-h_{\theta}(x_{i}))\]

Here is a complete derivation, a nice one, lecture notes on logistic regression

**Update rule **

Same as Linear except the prediction function Sigmoid

\[\theta_0 := \theta_0 – \alpha\sum_{i=1}^{n}(h_{\theta}(x^{(i)})-y)\]

\[\theta_{j} := \theta_{j} – \alpha\sum_{i=1}^{n}(h_{\theta}(x^{(i)})-y)x_{\theta}^{(i)}-\frac{\lambda}{N}\theta_{j}\]

So, after all what one has to do for Logistic regression, here is the pseudo code

- initialize weights \(theta_{j}\) with random numbers
- Get predictions \(h\) using sigmoid function \(z = {\theta^{T}X}\) and \(h = 1/1-e^{-z}\)
- Use update rule to update new weights \(\theta_{j} – \alpha \frac{1}{m} X^{T}(h-y)\)
- Repeat until minimal cost (iterate 1000 times over)

**Linear vs Logistic **

Linear Regression is used to get values like a score or a price that is *continuous* while Logistic is binary or multi classification and the values are like 0, 1, 2, that is *discrete*.

Logistic uses probability function to predict where as Linear uses straight line fitting \(y=mX+C\)

**Why use Sigmoid?**

Sigmoid \(\frac{ 1 } {1 + e^{-z}}\) outputs a value between \(0\) and \(1\) which is very convenient way to get the probability. It also outputs between \(0\) and \(1\) for very large positive and negative values