derivative of cost function for Logistic Regression

Multi tool use

I am going over the lectures on Machine Learning at Coursera.

I am struggling with the following. How can the partial derivative of

$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$

where $h_{theta}(x)$ is defined as follows

$$h_{theta}(x)=g(theta^{T}x)$$
$$g(z)=frac{1}{1+e^{-z}}$$

be $$ frac{partial}{partialtheta_{j}}J(theta) =sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i$$

In other words, how would we go about calculating the partial derivative with respect to $theta$ of the cost function (the logs are natural logarithms):

$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$

edited Aug 27 '13 at 12:26

Avitus

11.6k11840

asked Aug 27 '13 at 10:41

dreamwalker

545156

$begingroup$
I think to resolve $theta$ by gradient will be hard way (or impossible??). Because it different with linear classfication, it will not has close form. So i suggest you can use other method example Newton's method. BTW, do you find $theta$ using above way?
$endgroup$
– John
Jul 22 '14 at 2:16

5

$begingroup$
missing $frac{1}{m}$ for the derivative of the Cost
$endgroup$
– bourneli
Apr 20 '17 at 5:01

add a comment |

I am going over the lectures on Machine Learning at Coursera.

I am struggling with the following. How can the partial derivative of

$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$

where $h_{theta}(x)$ is defined as follows

$$h_{theta}(x)=g(theta^{T}x)$$
$$g(z)=frac{1}{1+e^{-z}}$$

be $$ frac{partial}{partialtheta_{j}}J(theta) =sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i$$

In other words, how would we go about calculating the partial derivative with respect to $theta$ of the cost function (the logs are natural logarithms):

$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$

edited Aug 27 '13 at 12:26

Avitus

11.6k11840

asked Aug 27 '13 at 10:41

dreamwalker

545156

$begingroup$
I think to resolve $theta$ by gradient will be hard way (or impossible??). Because it different with linear classfication, it will not has close form. So i suggest you can use other method example Newton's method. BTW, do you find $theta$ using above way?
$endgroup$
– John
Jul 22 '14 at 2:16

5

$begingroup$
missing $frac{1}{m}$ for the derivative of the Cost
$endgroup$
– bourneli
Apr 20 '17 at 5:01

add a comment |

I am going over the lectures on Machine Learning at Coursera.

I am struggling with the following. How can the partial derivative of

$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$

where $h_{theta}(x)$ is defined as follows

$$h_{theta}(x)=g(theta^{T}x)$$
$$g(z)=frac{1}{1+e^{-z}}$$

be $$ frac{partial}{partialtheta_{j}}J(theta) =sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i$$

In other words, how would we go about calculating the partial derivative with respect to $theta$ of the cost function (the logs are natural logarithms):

$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$

edited Aug 27 '13 at 12:26

Avitus

11.6k11840

asked Aug 27 '13 at 10:41

dreamwalker

545156

I am going over the lectures on Machine Learning at Coursera.

I am struggling with the following. How can the partial derivative of

$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$

where $h_{theta}(x)$ is defined as follows

$$h_{theta}(x)=g(theta^{T}x)$$
$$g(z)=frac{1}{1+e^{-z}}$$

be $$ frac{partial}{partialtheta_{j}}J(theta) =sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i$$

In other words, how would we go about calculating the partial derivative with respect to $theta$ of the cost function (the logs are natural logarithms):

$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$

statistics regression machine-learning partial-derivative

edited Aug 27 '13 at 12:26

Avitus

11.6k11840

asked Aug 27 '13 at 10:41

dreamwalker

545156

edited Aug 27 '13 at 12:26

Avitus

11.6k11840

asked Aug 27 '13 at 10:41

dreamwalker

545156

edited Aug 27 '13 at 12:26

Avitus

11.6k11840

edited Aug 27 '13 at 12:26

Avitus

11.6k11840

edited Aug 27 '13 at 12:26

Avitus

11.6k11840

asked Aug 27 '13 at 10:41

dreamwalker

545156

asked Aug 27 '13 at 10:41

dreamwalker

545156

asked Aug 27 '13 at 10:41

dreamwalker

545156

$begingroup$
I think to resolve $theta$ by gradient will be hard way (or impossible??). Because it different with linear classfication, it will not has close form. So i suggest you can use other method example Newton's method. BTW, do you find $theta$ using above way?
$endgroup$
– John
Jul 22 '14 at 2:16

5

$begingroup$
missing $frac{1}{m}$ for the derivative of the Cost
$endgroup$
– bourneli
Apr 20 '17 at 5:01

add a comment |

$begingroup$
I think to resolve $theta$ by gradient will be hard way (or impossible??). Because it different with linear classfication, it will not has close form. So i suggest you can use other method example Newton's method. BTW, do you find $theta$ using above way?
$endgroup$
– John
Jul 22 '14 at 2:16

5

$begingroup$
missing $frac{1}{m}$ for the derivative of the Cost
$endgroup$
– bourneli
Apr 20 '17 at 5:01

I think to resolve $theta$ by gradient will be hard way (or impossible??). Because it different with linear classfication, it will not has close form. So i suggest you can use other method example Newton's method. BTW, do you find $theta$ using above way?

– John
Jul 22 '14 at 2:16

missing $frac{1}{m}$ for the derivative of the Cost

– bourneli
Apr 20 '17 at 5:01

add a comment |

5 Answers
5

active

oldest

votes

113

The reason is the following. We use the notation:

$$theta x^i:=theta_0+theta_1 x^i_1+dots+theta_p x^i_p.$$

Then

$$log h_theta(x^i)=logfrac{1}{1+e^{-theta x^i} }=-log ( 1+e^{-theta x^i} ),$$ $$log(1- h_theta(x^i))=log(1-frac{1}{1+e^{-theta x^i} })=log (e^{-theta x^i} )-log ( 1+e^{-theta x^i} )=-theta x^i-log ( 1+e^{-theta x^i} ),$$ [ this used: $ 1 = frac{(1+e^{-theta x^i})}{(1+e^{-theta x^i})},$ the 1's in numerator cancel, then we used: $log(x/y) = log(x) - log(y)$]

Since our original cost function is the form of:

$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$

Plugging in the two simplified expressions above, we obtain
$$J(theta)=-frac{1}{m}sum_{i=1}^m left[-y^i(log ( 1+e^{-theta x^i})) + (1-y^i)(-theta x^i-log ( 1+e^{-theta x^i} ))right]$$, which can be simplified to:
$$J(theta)=-frac{1}{m}sum_{i=1}^m left[y_itheta x^i-theta x^i-log(1+e^{-theta x^i})right]=-frac{1}{m}sum_{i=1}^m left[y_itheta x^i-log(1+e^{theta x^i})right],~~(*)$$

where the second equality follows from

$$-theta x^i-log(1+e^{-theta x^i})=
-left[ log e^{theta x^i}+
log(1+e^{-theta x^i} )
right]=-log(1+e^{theta x^i}). $$ [ we used $ log(x) + log(y) = log(x y) $ ]

All you need now is to compute the partial derivatives of $(*)$ w.r.t. $theta_j$. As
$$frac{partial}{partial theta_j}y_itheta x^i=y_ix^i_j, $$
$$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}=x^i_jh_theta(x^i),$$

the thesis follows.

edited Jan 13 at 14:45

amWhy

192k28225439

answered Aug 27 '13 at 12:25

Avitus

11.6k11840

1

$begingroup$
Can't upvote as I don't have 15 reputation just yet! :) Will google the maximum entropy principle as I have no clue what that is! as a side note I am not sure how you made the jump from log(1 - hypothesis(x)) to log(a) - log(b) but will raise another question for this as I don't think I can type latex here, really impressed with your answer! learning all this stuff on my own is proving to be quite a challenge thus the more kudos to you for providing such an elegant answer! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 13:54

1

$begingroup$
yes!!! I couldn't see that you were using this property $log(frac{a}{b})=log a-log b$ Now everything makes sense :) Thank you so much! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 14:26

4

$begingroup$
Awesome explanation, thank you very much! The only thing I am still struggling with is the very last line, how the derivative was made in $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$ ? Could you provide a hint for it? Thank you very much for the help!
$endgroup$
– Pedro Lopes
Dec 1 '15 at 21:40

6

$begingroup$
@codewarrior hope this helps. $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}} $$ $$ = frac{{x^i_j}}{{e^{-theta x^i}*(1+e^{theta x^i})}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{-theta x^i + theta x^i}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{0}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+1}} $$ $$ =frac{{x^i_j}}{{1+e^{-theta x^i}}} $$ $$ =x^i_j*h_theta(x^i) $$ as $$ h_theta(x^i) = frac{{1}}{{1+e^{theta x^i}}} $$
$endgroup$
– Rudresha Parameshappa
Jan 2 '17 at 13:06

2

$begingroup$
@Israel, logarithm is usually base e in math. Take a look at When log is written without a base, is the equation normally referring to log base 10 or natural log?
$endgroup$
– gdrt
Mar 11 '18 at 11:46

|
show 10 more comments

Pedro,
=> partial fractions

$$log(1 - frac{a}{b})$$

$$1 - frac{a}{b} = frac{b}{b} - frac{a}{b} = frac{b-a}{b},$$
$$log(1 - frac{a}{b}) = log(frac{b-a}{b}) = log(b-a) - log(b)$$

edited Apr 13 '16 at 15:39

answered Apr 13 '16 at 15:23

Richard Wheatley

413

add a comment |

@pedro-lopes, it is called as: chain rule.
$$(u(v))' = u(v)' * v'$$
For example:
$$y = sin(3x - 5)$$
$$u(v) = sin(3x - 5)$$
$$v = (3x - 5)$$
$$y' = sin(3x - 5)' = cos(3x - 5) * (3 - 0) = 3cos(3x-5)$$

Regarding: $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$
$$u(v) = log(1+e^{theta x^i})$$
$$v = 1+e^{theta x^i}$$
$$frac{partial}{partial theta}log(1+e^{theta x^i}) = frac{partial}{partial theta}log(1+e^{theta x^i}) * frac{partial}{partial theta}(1+e^{theta x^i}) = frac{1}{1+e^{theta x^i}} * (0 + xe^{theta x^i}) = frac{xe^{theta x^i}}{1+e^{theta x^i}} $$
Note that $$log(x)' = frac{1}{x}$$
Hope that I answered on your question!

edited Apr 17 '17 at 13:29

The Count

2,29961431

answered Apr 17 '17 at 13:17

RedEyed

1313

add a comment |

We have,
begin{align*}
L(theta) &= -frac{1}{m}sumlimits_{i=1}^{m}{y_i. log P(y_i|x_i,theta) + (1-y_i). log{(1 - P(y_i|x_i,theta))}} \
h_theta(x_i) &= P(y_i|x_i,theta) = P(y_i=1|x_i,theta) = frac{1}{1+exp{left(-sumlimits_k theta_k x_i^k right)}}
end{align*}

Then,
begin{align*}
log{(P(y_i|x_i,theta))}=log{(P(y_i=1|x_i,theta))} &=-log{left(1+exp{left(-sumlimits_k theta_k x_i^k right)} right)} \
Rightarrow frac{partial }{partial theta_j} log P(y_i|x_i,theta) =frac{x_i^j.exp{left(-sumlimits_k theta_k x_i^kright)}}{1+exp{left(-sumlimits_k theta_k x_i^kright)}} &= x_i^j.left(1-P(y_i|x_i,theta)right) end{align*}
and
begin{align*}
log{(1-P(y_i|x_i,theta))}=log{(1-P(y_i=1|x_i,theta))} &=-sumlimits_k theta_k x_i^k -log{left(1+exp{left(-sumlimits_k theta_k x_i^k right)} right)} \
Rightarrow frac{partial }{partial theta_j} log{(1 - P(y_i|x_i,theta))} &= -x_i^j + x_i^j.left(1-P(y_i|x_i,theta)right) = -x_i^j.P(y_i|x_i,theta) \
end{align*}

Hence,

begin{align*}
frac{partial }{partial theta_j} L(theta) &= -frac{1}{m}sumlimits_{i=1}^{m}{y_i.frac{partial }{partial theta_j} log P(y_i|x_i,theta) + (1-y_i).frac{partial }{partial theta_j} log{(1 - P(y_i|x_i,theta))}} \
&=-frac{1}{m}sumlimits_{i=1}^{m}{y_i.x_i^j.left(1-P(y_i|x_i,theta)right) - (1-y_i).x_i^j.P(y_i|x_i,theta)} \
&=-frac{1}{m}sumlimits_{i=1}^{m}{y_i.x_i^j - x_i^j.P(y_i|x_i,theta)} \
&=frac{1}{m}sumlimits_{i=1}^{m}{(P(y_i|x_i,theta)-y_i).x_i^j}
end{align*} (Proved)

edited Dec 5 '17 at 11:42

answered Nov 27 '17 at 12:50

Sandipan Dey

25515

$begingroup$
The logistic regression implementation with gradient-descent using this derivative can be found here: sandipanweb.wordpress.com/2017/11/25/…
$endgroup$
– Sandipan Dey
Nov 27 '17 at 12:53

add a comment |

$${
J(theta)=-frac{1}{m} sum_{i=1}^{m} y^ilog(h_theta(x^i))+(1-y^i)log(1-h_theta(x^i))
}$$
where $h_theta(x)$ is defined as follows
$${
h_theta(x)=g(theta^Tx),
}$$
$${
g(z)=frac{1}{1+e^{-z}}
}$$
Note that $g(z)'=g(z)*(1-g(z))$ and
we can simply write right side of summation as
$${
ylog(g)+(1-y)log(1-g)
}$$
and the derivative of it as
$${
y frac{1}{g}g'+(1-y) left( frac{1}{1-g}right) (-g') \
=left( frac{y}{g}- frac{1-y}{1-g}right) g' \
= frac{y(1-g)-g(1-y)}{g(1-g)}g' \
= frac{y-y*g-g+g*y}{g(1-g)}g' \
= frac{y-y*g-g+g*y}{g(1-g)}g(1-g)*x \
=(y-g)*x
}$$

and then we can rewrite above as
$${
frac{partial}{partialtheta_{j}}J(theta) =frac{1}{m}sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i
}$$

answered Dec 5 '18 at 11:59

Junghak Ahn

111

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f477207%2fderivative-of-cost-function-for-logistic-regression%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

113

The reason is the following. We use the notation:

$$theta x^i:=theta_0+theta_1 x^i_1+dots+theta_p x^i_p.$$

Then

Since our original cost function is the form of:

$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$

where the second equality follows from

$$-theta x^i-log(1+e^{-theta x^i})=
-left[ log e^{theta x^i}+
log(1+e^{-theta x^i} )
right]=-log(1+e^{theta x^i}). $$ [ we used $ log(x) + log(y) = log(x y) $ ]

the thesis follows.

edited Jan 13 at 14:45

amWhy

192k28225439

answered Aug 27 '13 at 12:25

Avitus

11.6k11840

1

$begingroup$
Can't upvote as I don't have 15 reputation just yet! :) Will google the maximum entropy principle as I have no clue what that is! as a side note I am not sure how you made the jump from log(1 - hypothesis(x)) to log(a) - log(b) but will raise another question for this as I don't think I can type latex here, really impressed with your answer! learning all this stuff on my own is proving to be quite a challenge thus the more kudos to you for providing such an elegant answer! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 13:54

1

$begingroup$
yes!!! I couldn't see that you were using this property $log(frac{a}{b})=log a-log b$ Now everything makes sense :) Thank you so much! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 14:26

4

$begingroup$
Awesome explanation, thank you very much! The only thing I am still struggling with is the very last line, how the derivative was made in $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$ ? Could you provide a hint for it? Thank you very much for the help!
$endgroup$
– Pedro Lopes
Dec 1 '15 at 21:40

6

$begingroup$
@codewarrior hope this helps. $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}} $$ $$ = frac{{x^i_j}}{{e^{-theta x^i}*(1+e^{theta x^i})}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{-theta x^i + theta x^i}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{0}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+1}} $$ $$ =frac{{x^i_j}}{{1+e^{-theta x^i}}} $$ $$ =x^i_j*h_theta(x^i) $$ as $$ h_theta(x^i) = frac{{1}}{{1+e^{theta x^i}}} $$
$endgroup$
– Rudresha Parameshappa
Jan 2 '17 at 13:06

2

$begingroup$
@Israel, logarithm is usually base e in math. Take a look at When log is written without a base, is the equation normally referring to log base 10 or natural log?
$endgroup$
– gdrt
Mar 11 '18 at 11:46

|
show 10 more comments

113

The reason is the following. We use the notation:

$$theta x^i:=theta_0+theta_1 x^i_1+dots+theta_p x^i_p.$$

Then

Since our original cost function is the form of:

$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$

where the second equality follows from

$$-theta x^i-log(1+e^{-theta x^i})=
-left[ log e^{theta x^i}+
log(1+e^{-theta x^i} )
right]=-log(1+e^{theta x^i}). $$ [ we used $ log(x) + log(y) = log(x y) $ ]

the thesis follows.

edited Jan 13 at 14:45

amWhy

192k28225439

answered Aug 27 '13 at 12:25

Avitus

11.6k11840

1

$begingroup$
Can't upvote as I don't have 15 reputation just yet! :) Will google the maximum entropy principle as I have no clue what that is! as a side note I am not sure how you made the jump from log(1 - hypothesis(x)) to log(a) - log(b) but will raise another question for this as I don't think I can type latex here, really impressed with your answer! learning all this stuff on my own is proving to be quite a challenge thus the more kudos to you for providing such an elegant answer! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 13:54

1

$begingroup$
yes!!! I couldn't see that you were using this property $log(frac{a}{b})=log a-log b$ Now everything makes sense :) Thank you so much! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 14:26

4

$begingroup$
Awesome explanation, thank you very much! The only thing I am still struggling with is the very last line, how the derivative was made in $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$ ? Could you provide a hint for it? Thank you very much for the help!
$endgroup$
– Pedro Lopes
Dec 1 '15 at 21:40

6

$begingroup$
@codewarrior hope this helps. $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}} $$ $$ = frac{{x^i_j}}{{e^{-theta x^i}*(1+e^{theta x^i})}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{-theta x^i + theta x^i}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{0}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+1}} $$ $$ =frac{{x^i_j}}{{1+e^{-theta x^i}}} $$ $$ =x^i_j*h_theta(x^i) $$ as $$ h_theta(x^i) = frac{{1}}{{1+e^{theta x^i}}} $$
$endgroup$
– Rudresha Parameshappa
Jan 2 '17 at 13:06

2

$begingroup$
@Israel, logarithm is usually base e in math. Take a look at When log is written without a base, is the equation normally referring to log base 10 or natural log?
$endgroup$
– gdrt
Mar 11 '18 at 11:46

|
show 10 more comments

113

The reason is the following. We use the notation:

$$theta x^i:=theta_0+theta_1 x^i_1+dots+theta_p x^i_p.$$

Then

Since our original cost function is the form of:

$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$

where the second equality follows from

$$-theta x^i-log(1+e^{-theta x^i})=
-left[ log e^{theta x^i}+
log(1+e^{-theta x^i} )
right]=-log(1+e^{theta x^i}). $$ [ we used $ log(x) + log(y) = log(x y) $ ]

the thesis follows.

edited Jan 13 at 14:45

amWhy

192k28225439

answered Aug 27 '13 at 12:25

Avitus

11.6k11840

The reason is the following. We use the notation:

$$theta x^i:=theta_0+theta_1 x^i_1+dots+theta_p x^i_p.$$

Then

Since our original cost function is the form of:

$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$

where the second equality follows from

$$-theta x^i-log(1+e^{-theta x^i})=
-left[ log e^{theta x^i}+
log(1+e^{-theta x^i} )
right]=-log(1+e^{theta x^i}). $$ [ we used $ log(x) + log(y) = log(x y) $ ]

the thesis follows.

edited Jan 13 at 14:45

amWhy

192k28225439

answered Aug 27 '13 at 12:25

Avitus

11.6k11840

edited Jan 13 at 14:45

amWhy

192k28225439

edited Jan 13 at 14:45

amWhy

192k28225439

edited Jan 13 at 14:45

amWhy

192k28225439

answered Aug 27 '13 at 12:25

Avitus

11.6k11840

answered Aug 27 '13 at 12:25

Avitus

11.6k11840

answered Aug 27 '13 at 12:25

Avitus

11.6k11840

1

$begingroup$
Can't upvote as I don't have 15 reputation just yet! :) Will google the maximum entropy principle as I have no clue what that is! as a side note I am not sure how you made the jump from log(1 - hypothesis(x)) to log(a) - log(b) but will raise another question for this as I don't think I can type latex here, really impressed with your answer! learning all this stuff on my own is proving to be quite a challenge thus the more kudos to you for providing such an elegant answer! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 13:54

1

$begingroup$
yes!!! I couldn't see that you were using this property $log(frac{a}{b})=log a-log b$ Now everything makes sense :) Thank you so much! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 14:26

4

$begingroup$
Awesome explanation, thank you very much! The only thing I am still struggling with is the very last line, how the derivative was made in $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$ ? Could you provide a hint for it? Thank you very much for the help!
$endgroup$
– Pedro Lopes
Dec 1 '15 at 21:40

6

$begingroup$
@codewarrior hope this helps. $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}} $$ $$ = frac{{x^i_j}}{{e^{-theta x^i}*(1+e^{theta x^i})}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{-theta x^i + theta x^i}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{0}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+1}} $$ $$ =frac{{x^i_j}}{{1+e^{-theta x^i}}} $$ $$ =x^i_j*h_theta(x^i) $$ as $$ h_theta(x^i) = frac{{1}}{{1+e^{theta x^i}}} $$
$endgroup$
– Rudresha Parameshappa
Jan 2 '17 at 13:06

2

$begingroup$
@Israel, logarithm is usually base e in math. Take a look at When log is written without a base, is the equation normally referring to log base 10 or natural log?
$endgroup$
– gdrt
Mar 11 '18 at 11:46

|
show 10 more comments

1

$begingroup$
Can't upvote as I don't have 15 reputation just yet! :) Will google the maximum entropy principle as I have no clue what that is! as a side note I am not sure how you made the jump from log(1 - hypothesis(x)) to log(a) - log(b) but will raise another question for this as I don't think I can type latex here, really impressed with your answer! learning all this stuff on my own is proving to be quite a challenge thus the more kudos to you for providing such an elegant answer! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 13:54

1

$begingroup$
yes!!! I couldn't see that you were using this property $log(frac{a}{b})=log a-log b$ Now everything makes sense :) Thank you so much! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 14:26

4

$begingroup$
Awesome explanation, thank you very much! The only thing I am still struggling with is the very last line, how the derivative was made in $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$ ? Could you provide a hint for it? Thank you very much for the help!
$endgroup$
– Pedro Lopes
Dec 1 '15 at 21:40

6

$begingroup$
@codewarrior hope this helps. $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}} $$ $$ = frac{{x^i_j}}{{e^{-theta x^i}*(1+e^{theta x^i})}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{-theta x^i + theta x^i}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{0}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+1}} $$ $$ =frac{{x^i_j}}{{1+e^{-theta x^i}}} $$ $$ =x^i_j*h_theta(x^i) $$ as $$ h_theta(x^i) = frac{{1}}{{1+e^{theta x^i}}} $$
$endgroup$
– Rudresha Parameshappa
Jan 2 '17 at 13:06

2

$begingroup$
@Israel, logarithm is usually base e in math. Take a look at When log is written without a base, is the equation normally referring to log base 10 or natural log?
$endgroup$
– gdrt
Mar 11 '18 at 11:46

Can't upvote as I don't have 15 reputation just yet! :) Will google the maximum entropy principle as I have no clue what that is! as a side note I am not sure how you made the jump from log(1 - hypothesis(x)) to log(a) - log(b) but will raise another question for this as I don't think I can type latex here, really impressed with your answer! learning all this stuff on my own is proving to be quite a challenge thus the more kudos to you for providing such an elegant answer! :)

– dreamwalker
Aug 27 '13 at 13:54

yes!!! I couldn't see that you were using this property $log(frac{a}{b})=log a-log b$ Now everything makes sense :) Thank you so much! :)

– dreamwalker
Aug 27 '13 at 14:26

Awesome explanation, thank you very much! The only thing I am still struggling with is the very last line, how the derivative was made in $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$ ? Could you provide a hint for it? Thank you very much for the help!

– Pedro Lopes
Dec 1 '15 at 21:40

@codewarrior hope this helps. $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}} $$ $$ = frac{{x^i_j}}{{e^{-theta x^i}*(1+e^{theta x^i})}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{-theta x^i + theta x^i}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{0}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+1}} $$ $$ =frac{{x^i_j}}{{1+e^{-theta x^i}}} $$ $$ =x^i_j*h_theta(x^i) $$ as $$ h_theta(x^i) = frac{{1}}{{1+e^{theta x^i}}} $$

– Rudresha Parameshappa
Jan 2 '17 at 13:06

@Israel, logarithm is usually base e in math. Take a look at When log is written without a base, is the equation normally referring to log base 10 or natural log?

– gdrt
Mar 11 '18 at 11:46

|
show 10 more comments

Pedro,
=> partial fractions

$$log(1 - frac{a}{b})$$

$$1 - frac{a}{b} = frac{b}{b} - frac{a}{b} = frac{b-a}{b},$$
$$log(1 - frac{a}{b}) = log(frac{b-a}{b}) = log(b-a) - log(b)$$

edited Apr 13 '16 at 15:39

answered Apr 13 '16 at 15:23

Richard Wheatley

413

add a comment |

Pedro,
=> partial fractions

$$log(1 - frac{a}{b})$$

$$1 - frac{a}{b} = frac{b}{b} - frac{a}{b} = frac{b-a}{b},$$
$$log(1 - frac{a}{b}) = log(frac{b-a}{b}) = log(b-a) - log(b)$$

edited Apr 13 '16 at 15:39

answered Apr 13 '16 at 15:23

Richard Wheatley

413

add a comment |

Pedro,
=> partial fractions

$$log(1 - frac{a}{b})$$

$$1 - frac{a}{b} = frac{b}{b} - frac{a}{b} = frac{b-a}{b},$$
$$log(1 - frac{a}{b}) = log(frac{b-a}{b}) = log(b-a) - log(b)$$

edited Apr 13 '16 at 15:39

answered Apr 13 '16 at 15:23

Richard Wheatley

413

Pedro,
=> partial fractions

$$log(1 - frac{a}{b})$$

$$1 - frac{a}{b} = frac{b}{b} - frac{a}{b} = frac{b-a}{b},$$
$$log(1 - frac{a}{b}) = log(frac{b-a}{b}) = log(b-a) - log(b)$$

edited Apr 13 '16 at 15:39

answered Apr 13 '16 at 15:23

Richard Wheatley

413

edited Apr 13 '16 at 15:39

answered Apr 13 '16 at 15:23

Richard Wheatley

413

answered Apr 13 '16 at 15:23

Richard Wheatley

413

answered Apr 13 '16 at 15:23

Richard Wheatley

413

add a comment |

@pedro-lopes, it is called as: chain rule.
$$(u(v))' = u(v)' * v'$$
For example:
$$y = sin(3x - 5)$$
$$u(v) = sin(3x - 5)$$
$$v = (3x - 5)$$
$$y' = sin(3x - 5)' = cos(3x - 5) * (3 - 0) = 3cos(3x-5)$$

edited Apr 17 '17 at 13:29

The Count

2,29961431

answered Apr 17 '17 at 13:17

RedEyed

1313

add a comment |

@pedro-lopes, it is called as: chain rule.
$$(u(v))' = u(v)' * v'$$
For example:
$$y = sin(3x - 5)$$
$$u(v) = sin(3x - 5)$$
$$v = (3x - 5)$$
$$y' = sin(3x - 5)' = cos(3x - 5) * (3 - 0) = 3cos(3x-5)$$

edited Apr 17 '17 at 13:29

The Count

2,29961431

answered Apr 17 '17 at 13:17

RedEyed

1313

add a comment |

@pedro-lopes, it is called as: chain rule.
$$(u(v))' = u(v)' * v'$$
For example:
$$y = sin(3x - 5)$$
$$u(v) = sin(3x - 5)$$
$$v = (3x - 5)$$
$$y' = sin(3x - 5)' = cos(3x - 5) * (3 - 0) = 3cos(3x-5)$$

edited Apr 17 '17 at 13:29

The Count

2,29961431

answered Apr 17 '17 at 13:17

RedEyed

1313

@pedro-lopes, it is called as: chain rule.
$$(u(v))' = u(v)' * v'$$
For example:
$$y = sin(3x - 5)$$
$$u(v) = sin(3x - 5)$$
$$v = (3x - 5)$$
$$y' = sin(3x - 5)' = cos(3x - 5) * (3 - 0) = 3cos(3x-5)$$

edited Apr 17 '17 at 13:29

The Count

2,29961431

answered Apr 17 '17 at 13:17

RedEyed

1313

edited Apr 17 '17 at 13:29

The Count

2,29961431

edited Apr 17 '17 at 13:29

The Count

2,29961431

edited Apr 17 '17 at 13:29

The Count

2,29961431

answered Apr 17 '17 at 13:17

RedEyed

1313

answered Apr 17 '17 at 13:17

RedEyed

1313

answered Apr 17 '17 at 13:17

RedEyed

1313

add a comment |

Hence,

edited Dec 5 '17 at 11:42

answered Nov 27 '17 at 12:50

Sandipan Dey

25515

$begingroup$
The logistic regression implementation with gradient-descent using this derivative can be found here: sandipanweb.wordpress.com/2017/11/25/…
$endgroup$
– Sandipan Dey
Nov 27 '17 at 12:53

add a comment |

Hence,

edited Dec 5 '17 at 11:42

answered Nov 27 '17 at 12:50

Sandipan Dey

25515

$begingroup$
The logistic regression implementation with gradient-descent using this derivative can be found here: sandipanweb.wordpress.com/2017/11/25/…
$endgroup$
– Sandipan Dey
Nov 27 '17 at 12:53

add a comment |

Hence,

edited Dec 5 '17 at 11:42

answered Nov 27 '17 at 12:50

Sandipan Dey

25515

Hence,

edited Dec 5 '17 at 11:42

answered Nov 27 '17 at 12:50

Sandipan Dey

25515

edited Dec 5 '17 at 11:42

answered Nov 27 '17 at 12:50

Sandipan Dey

25515

answered Nov 27 '17 at 12:50

Sandipan Dey

25515

answered Nov 27 '17 at 12:50

Sandipan Dey

25515

$begingroup$
The logistic regression implementation with gradient-descent using this derivative can be found here: sandipanweb.wordpress.com/2017/11/25/…
$endgroup$
– Sandipan Dey
Nov 27 '17 at 12:53

add a comment |

$begingroup$
The logistic regression implementation with gradient-descent using this derivative can be found here: sandipanweb.wordpress.com/2017/11/25/…
$endgroup$
– Sandipan Dey
Nov 27 '17 at 12:53

The logistic regression implementation with gradient-descent using this derivative can be found here: sandipanweb.wordpress.com/2017/11/25/…

– Sandipan Dey
Nov 27 '17 at 12:53

add a comment |

and then we can rewrite above as
$${
frac{partial}{partialtheta_{j}}J(theta) =frac{1}{m}sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i
}$$

answered Dec 5 '18 at 11:59

Junghak Ahn

111

add a comment |

and then we can rewrite above as
$${
frac{partial}{partialtheta_{j}}J(theta) =frac{1}{m}sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i
}$$

answered Dec 5 '18 at 11:59

Junghak Ahn

111

add a comment |

and then we can rewrite above as
$${
frac{partial}{partialtheta_{j}}J(theta) =frac{1}{m}sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i
}$$

answered Dec 5 '18 at 11:59

Junghak Ahn

111

and then we can rewrite above as
$${
frac{partial}{partialtheta_{j}}J(theta) =frac{1}{m}sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i
}$$

answered Dec 5 '18 at 11:59

Junghak Ahn

111

answered Dec 5 '18 at 11:59

Junghak Ahn

111

answered Dec 5 '18 at 11:59

Junghak Ahn

111

answered Dec 5 '18 at 11:59

Junghak Ahn

111

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

tYKKM paKSo7nPi93hLtoy,bQs3 WQm8C WGLEW,R8Gv,WGW t 7FTlhjm,f,FzaBIRq4ut29OL5,TI7NNJosvidCa,54oKlhsTZ5rkW

搜尋此網誌

Jtdylktuy