Logistic is  a classification algorithm based on probability, instead of predicting the class, it gives the log likelihood of being the given class i.e $$p(y/x) = p/(1-p)$$

Let’s look at a little bit of theory about it. We use the same linear equation only difference is, the y is binary, that is  either 0 or 1.

$h_{\theta}(x) = \theta_{0} + \theta_{1}x$

However we calculate the y using the sigmoid function to return a probability value between 0 and 1. $P(class=1) = \frac{1} {1 + e^{-z}}$ The reason this function returns 0 or 1 for any large positive or negative and the z would be

$h_{\theta}(x) = 1 / (1 + e^{-z}), z = \theta_{0} + \theta_{1}x$

Cost function

$J(\theta) = \sum_{i=1}^{n}y_{i}log(h_{\theta}(x_{i})) + (1-y_{i})log(1-h_{\theta}(x_{i}))$

Here is a complete derivation, a nice one, lecture notes on logistic regression

Update rule

Same as Linear except the prediction function Sigmoid

$\theta_0 := \theta_0 – \alpha\sum_{i=1}^{n}(h_{\theta}(x^{(i)})-y)$

$\theta_{j} := \theta_{j} – \alpha\sum_{i=1}^{n}(h_{\theta}(x^{(i)})-y)x_{\theta}^{(i)}-\frac{\lambda}{N}\theta_{j}$

So, after all what one has to do for Logistic regression, here is the pseudo code

• initialize weights  $$theta_{j}$$ with random numbers
• Get predictions  $$h$$ using sigmoid function $$z = {\theta^{T}X}$$ and $$h = 1/1-e^{-z}$$
• Use update rule to update new weights  $$\theta_{j} – \alpha \frac{1}{m} X^{T}(h-y)$$
• Repeat until minimal cost (iterate 1000 times over)
Linear vs Logistic

Linear Regression is used to get values like a score or a price that is continuous while Logistic is binary or multi classification and the values are like 0, 1, 2, that is discrete.

Logistic uses probability function to predict where as Linear uses straight line fitting $$y=mX+C$$

Why use Sigmoid?

Sigmoid $$\frac{ 1 } {1 + e^{-z}}$$ outputs a value between $$0$$ and $$1$$  which is very convenient way to get the probability. It also outputs  between $$0$$ and $$1$$  for very large positive and negative values