derivative of cost function for Logistic Regression
$begingroup$
I am going over the lectures on Machine Learning at Coursera.
I am struggling with the following. How can the partial derivative of
$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$
where $h_{theta}(x)$ is defined as follows
$$h_{theta}(x)=g(theta^{T}x)$$
$$g(z)=frac{1}{1+e^{-z}}$$
be $$ frac{partial}{partialtheta_{j}}J(theta) =sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i$$
In other words, how would we go about calculating the partial derivative with respect to $theta$ of the cost function (the logs are natural logarithms):
$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$
statistics regression machine-learning partial-derivative
$endgroup$
add a comment |
$begingroup$
I am going over the lectures on Machine Learning at Coursera.
I am struggling with the following. How can the partial derivative of
$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$
where $h_{theta}(x)$ is defined as follows
$$h_{theta}(x)=g(theta^{T}x)$$
$$g(z)=frac{1}{1+e^{-z}}$$
be $$ frac{partial}{partialtheta_{j}}J(theta) =sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i$$
In other words, how would we go about calculating the partial derivative with respect to $theta$ of the cost function (the logs are natural logarithms):
$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$
statistics regression machine-learning partial-derivative
$endgroup$
$begingroup$
I think to resolve $theta$ by gradient will be hard way (or impossible??). Because it different with linear classfication, it will not has close form. So i suggest you can use other method example Newton's method. BTW, do you find $theta$ using above way?
$endgroup$
– John
Jul 22 '14 at 2:16
5
$begingroup$
missing $frac{1}{m}$ for the derivative of the Cost
$endgroup$
– bourneli
Apr 20 '17 at 5:01
add a comment |
$begingroup$
I am going over the lectures on Machine Learning at Coursera.
I am struggling with the following. How can the partial derivative of
$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$
where $h_{theta}(x)$ is defined as follows
$$h_{theta}(x)=g(theta^{T}x)$$
$$g(z)=frac{1}{1+e^{-z}}$$
be $$ frac{partial}{partialtheta_{j}}J(theta) =sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i$$
In other words, how would we go about calculating the partial derivative with respect to $theta$ of the cost function (the logs are natural logarithms):
$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$
statistics regression machine-learning partial-derivative
$endgroup$
I am going over the lectures on Machine Learning at Coursera.
I am struggling with the following. How can the partial derivative of
$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$
where $h_{theta}(x)$ is defined as follows
$$h_{theta}(x)=g(theta^{T}x)$$
$$g(z)=frac{1}{1+e^{-z}}$$
be $$ frac{partial}{partialtheta_{j}}J(theta) =sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i$$
In other words, how would we go about calculating the partial derivative with respect to $theta$ of the cost function (the logs are natural logarithms):
$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$
statistics regression machine-learning partial-derivative
statistics regression machine-learning partial-derivative
edited Aug 27 '13 at 12:26
Avitus
11.6k11840
11.6k11840
asked Aug 27 '13 at 10:41
dreamwalkerdreamwalker
545156
545156
$begingroup$
I think to resolve $theta$ by gradient will be hard way (or impossible??). Because it different with linear classfication, it will not has close form. So i suggest you can use other method example Newton's method. BTW, do you find $theta$ using above way?
$endgroup$
– John
Jul 22 '14 at 2:16
5
$begingroup$
missing $frac{1}{m}$ for the derivative of the Cost
$endgroup$
– bourneli
Apr 20 '17 at 5:01
add a comment |
$begingroup$
I think to resolve $theta$ by gradient will be hard way (or impossible??). Because it different with linear classfication, it will not has close form. So i suggest you can use other method example Newton's method. BTW, do you find $theta$ using above way?
$endgroup$
– John
Jul 22 '14 at 2:16
5
$begingroup$
missing $frac{1}{m}$ for the derivative of the Cost
$endgroup$
– bourneli
Apr 20 '17 at 5:01
$begingroup$
I think to resolve $theta$ by gradient will be hard way (or impossible??). Because it different with linear classfication, it will not has close form. So i suggest you can use other method example Newton's method. BTW, do you find $theta$ using above way?
$endgroup$
– John
Jul 22 '14 at 2:16
$begingroup$
I think to resolve $theta$ by gradient will be hard way (or impossible??). Because it different with linear classfication, it will not has close form. So i suggest you can use other method example Newton's method. BTW, do you find $theta$ using above way?
$endgroup$
– John
Jul 22 '14 at 2:16
5
5
$begingroup$
missing $frac{1}{m}$ for the derivative of the Cost
$endgroup$
– bourneli
Apr 20 '17 at 5:01
$begingroup$
missing $frac{1}{m}$ for the derivative of the Cost
$endgroup$
– bourneli
Apr 20 '17 at 5:01
add a comment |
5 Answers
5
active
oldest
votes
$begingroup$
The reason is the following. We use the notation:
$$theta x^i:=theta_0+theta_1 x^i_1+dots+theta_p x^i_p.$$
Then
$$log h_theta(x^i)=logfrac{1}{1+e^{-theta x^i} }=-log ( 1+e^{-theta x^i} ),$$ $$log(1- h_theta(x^i))=log(1-frac{1}{1+e^{-theta x^i} })=log (e^{-theta x^i} )-log ( 1+e^{-theta x^i} )=-theta x^i-log ( 1+e^{-theta x^i} ),$$ [ this used: $ 1 = frac{(1+e^{-theta x^i})}{(1+e^{-theta x^i})},$ the 1's in numerator cancel, then we used: $log(x/y) = log(x) - log(y)$]
Since our original cost function is the form of:
$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$
Plugging in the two simplified expressions above, we obtain
$$J(theta)=-frac{1}{m}sum_{i=1}^m left[-y^i(log ( 1+e^{-theta x^i})) + (1-y^i)(-theta x^i-log ( 1+e^{-theta x^i} ))right]$$, which can be simplified to:
$$J(theta)=-frac{1}{m}sum_{i=1}^m left[y_itheta x^i-theta x^i-log(1+e^{-theta x^i})right]=-frac{1}{m}sum_{i=1}^m left[y_itheta x^i-log(1+e^{theta x^i})right],~~(*)$$
where the second equality follows from
$$-theta x^i-log(1+e^{-theta x^i})=
-left[ log e^{theta x^i}+
log(1+e^{-theta x^i} )
right]=-log(1+e^{theta x^i}). $$ [ we used $ log(x) + log(y) = log(x y) $ ]
All you need now is to compute the partial derivatives of $(*)$ w.r.t. $theta_j$. As
$$frac{partial}{partial theta_j}y_itheta x^i=y_ix^i_j, $$
$$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}=x^i_jh_theta(x^i),$$
the thesis follows.
$endgroup$
1
$begingroup$
Can't upvote as I don't have 15 reputation just yet! :) Will google the maximum entropy principle as I have no clue what that is! as a side note I am not sure how you made the jump from log(1 - hypothesis(x)) to log(a) - log(b) but will raise another question for this as I don't think I can type latex here, really impressed with your answer! learning all this stuff on my own is proving to be quite a challenge thus the more kudos to you for providing such an elegant answer! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 13:54
1
$begingroup$
yes!!! I couldn't see that you were using this property $log(frac{a}{b})=log a-log b$ Now everything makes sense :) Thank you so much! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 14:26
4
$begingroup$
Awesome explanation, thank you very much! The only thing I am still struggling with is the very last line, how the derivative was made in $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$ ? Could you provide a hint for it? Thank you very much for the help!
$endgroup$
– Pedro Lopes
Dec 1 '15 at 21:40
6
$begingroup$
@codewarrior hope this helps. $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}} $$ $$ = frac{{x^i_j}}{{e^{-theta x^i}*(1+e^{theta x^i})}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{-theta x^i + theta x^i}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{0}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+1}} $$ $$ =frac{{x^i_j}}{{1+e^{-theta x^i}}} $$ $$ =x^i_j*h_theta(x^i) $$ as $$ h_theta(x^i) = frac{{1}}{{1+e^{theta x^i}}} $$
$endgroup$
– Rudresha Parameshappa
Jan 2 '17 at 13:06
2
$begingroup$
@Israel, logarithm is usually base e in math. Take a look at When log is written without a base, is the equation normally referring to log base 10 or natural log?
$endgroup$
– gdrt
Mar 11 '18 at 11:46
|
show 10 more comments
$begingroup$
Pedro,
=> partial fractions
$$log(1 - frac{a}{b})$$
$$1 - frac{a}{b} = frac{b}{b} - frac{a}{b} = frac{b-a}{b},$$
$$log(1 - frac{a}{b}) = log(frac{b-a}{b}) = log(b-a) - log(b)$$
$endgroup$
add a comment |
$begingroup$
@pedro-lopes, it is called as: chain rule.
$$(u(v))' = u(v)' * v'$$
For example:
$$y = sin(3x - 5)$$
$$u(v) = sin(3x - 5)$$
$$v = (3x - 5)$$
$$y' = sin(3x - 5)' = cos(3x - 5) * (3 - 0) = 3cos(3x-5)$$
Regarding: $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$
$$u(v) = log(1+e^{theta x^i})$$
$$v = 1+e^{theta x^i}$$
$$frac{partial}{partial theta}log(1+e^{theta x^i}) = frac{partial}{partial theta}log(1+e^{theta x^i}) * frac{partial}{partial theta}(1+e^{theta x^i}) = frac{1}{1+e^{theta x^i}} * (0 + xe^{theta x^i}) = frac{xe^{theta x^i}}{1+e^{theta x^i}} $$
Note that $$log(x)' = frac{1}{x}$$
Hope that I answered on your question!
$endgroup$
add a comment |
$begingroup$
We have,
begin{align*}
L(theta) &= -frac{1}{m}sumlimits_{i=1}^{m}{y_i. log P(y_i|x_i,theta) + (1-y_i). log{(1 - P(y_i|x_i,theta))}} \
h_theta(x_i) &= P(y_i|x_i,theta) = P(y_i=1|x_i,theta) = frac{1}{1+exp{left(-sumlimits_k theta_k x_i^k right)}}
end{align*}
Then,
begin{align*}
log{(P(y_i|x_i,theta))}=log{(P(y_i=1|x_i,theta))} &=-log{left(1+exp{left(-sumlimits_k theta_k x_i^k right)} right)} \
Rightarrow frac{partial }{partial theta_j} log P(y_i|x_i,theta) =frac{x_i^j.exp{left(-sumlimits_k theta_k x_i^kright)}}{1+exp{left(-sumlimits_k theta_k x_i^kright)}} &= x_i^j.left(1-P(y_i|x_i,theta)right) end{align*}
and
begin{align*}
log{(1-P(y_i|x_i,theta))}=log{(1-P(y_i=1|x_i,theta))} &=-sumlimits_k theta_k x_i^k -log{left(1+exp{left(-sumlimits_k theta_k x_i^k right)} right)} \
Rightarrow frac{partial }{partial theta_j} log{(1 - P(y_i|x_i,theta))} &= -x_i^j + x_i^j.left(1-P(y_i|x_i,theta)right) = -x_i^j.P(y_i|x_i,theta) \
end{align*}
Hence,
begin{align*}
frac{partial }{partial theta_j} L(theta) &= -frac{1}{m}sumlimits_{i=1}^{m}{y_i.frac{partial }{partial theta_j} log P(y_i|x_i,theta) + (1-y_i).frac{partial }{partial theta_j} log{(1 - P(y_i|x_i,theta))}} \
&=-frac{1}{m}sumlimits_{i=1}^{m}{y_i.x_i^j.left(1-P(y_i|x_i,theta)right) - (1-y_i).x_i^j.P(y_i|x_i,theta)} \
&=-frac{1}{m}sumlimits_{i=1}^{m}{y_i.x_i^j - x_i^j.P(y_i|x_i,theta)} \
&=frac{1}{m}sumlimits_{i=1}^{m}{(P(y_i|x_i,theta)-y_i).x_i^j}
end{align*} (Proved)
$endgroup$
$begingroup$
The logistic regression implementation with gradient-descent using this derivative can be found here: sandipanweb.wordpress.com/2017/11/25/…
$endgroup$
– Sandipan Dey
Nov 27 '17 at 12:53
add a comment |
$begingroup$
$${
J(theta)=-frac{1}{m} sum_{i=1}^{m} y^ilog(h_theta(x^i))+(1-y^i)log(1-h_theta(x^i))
}$$
where $h_theta(x)$ is defined as follows
$${
h_theta(x)=g(theta^Tx),
}$$
$${
g(z)=frac{1}{1+e^{-z}}
}$$
Note that $g(z)'=g(z)*(1-g(z))$ and
we can simply write right side of summation as
$${
ylog(g)+(1-y)log(1-g)
}$$
and the derivative of it as
$${
y frac{1}{g}g'+(1-y) left( frac{1}{1-g}right) (-g') \
=left( frac{y}{g}- frac{1-y}{1-g}right) g' \
= frac{y(1-g)-g(1-y)}{g(1-g)}g' \
= frac{y-y*g-g+g*y}{g(1-g)}g' \
= frac{y-y*g-g+g*y}{g(1-g)}g(1-g)*x \
=(y-g)*x
}$$
and then we can rewrite above as
$${
frac{partial}{partialtheta_{j}}J(theta) =frac{1}{m}sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i
}$$
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f477207%2fderivative-of-cost-function-for-logistic-regression%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The reason is the following. We use the notation:
$$theta x^i:=theta_0+theta_1 x^i_1+dots+theta_p x^i_p.$$
Then
$$log h_theta(x^i)=logfrac{1}{1+e^{-theta x^i} }=-log ( 1+e^{-theta x^i} ),$$ $$log(1- h_theta(x^i))=log(1-frac{1}{1+e^{-theta x^i} })=log (e^{-theta x^i} )-log ( 1+e^{-theta x^i} )=-theta x^i-log ( 1+e^{-theta x^i} ),$$ [ this used: $ 1 = frac{(1+e^{-theta x^i})}{(1+e^{-theta x^i})},$ the 1's in numerator cancel, then we used: $log(x/y) = log(x) - log(y)$]
Since our original cost function is the form of:
$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$
Plugging in the two simplified expressions above, we obtain
$$J(theta)=-frac{1}{m}sum_{i=1}^m left[-y^i(log ( 1+e^{-theta x^i})) + (1-y^i)(-theta x^i-log ( 1+e^{-theta x^i} ))right]$$, which can be simplified to:
$$J(theta)=-frac{1}{m}sum_{i=1}^m left[y_itheta x^i-theta x^i-log(1+e^{-theta x^i})right]=-frac{1}{m}sum_{i=1}^m left[y_itheta x^i-log(1+e^{theta x^i})right],~~(*)$$
where the second equality follows from
$$-theta x^i-log(1+e^{-theta x^i})=
-left[ log e^{theta x^i}+
log(1+e^{-theta x^i} )
right]=-log(1+e^{theta x^i}). $$ [ we used $ log(x) + log(y) = log(x y) $ ]
All you need now is to compute the partial derivatives of $(*)$ w.r.t. $theta_j$. As
$$frac{partial}{partial theta_j}y_itheta x^i=y_ix^i_j, $$
$$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}=x^i_jh_theta(x^i),$$
the thesis follows.
$endgroup$
1
$begingroup$
Can't upvote as I don't have 15 reputation just yet! :) Will google the maximum entropy principle as I have no clue what that is! as a side note I am not sure how you made the jump from log(1 - hypothesis(x)) to log(a) - log(b) but will raise another question for this as I don't think I can type latex here, really impressed with your answer! learning all this stuff on my own is proving to be quite a challenge thus the more kudos to you for providing such an elegant answer! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 13:54
1
$begingroup$
yes!!! I couldn't see that you were using this property $log(frac{a}{b})=log a-log b$ Now everything makes sense :) Thank you so much! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 14:26
4
$begingroup$
Awesome explanation, thank you very much! The only thing I am still struggling with is the very last line, how the derivative was made in $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$ ? Could you provide a hint for it? Thank you very much for the help!
$endgroup$
– Pedro Lopes
Dec 1 '15 at 21:40
6
$begingroup$
@codewarrior hope this helps. $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}} $$ $$ = frac{{x^i_j}}{{e^{-theta x^i}*(1+e^{theta x^i})}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{-theta x^i + theta x^i}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{0}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+1}} $$ $$ =frac{{x^i_j}}{{1+e^{-theta x^i}}} $$ $$ =x^i_j*h_theta(x^i) $$ as $$ h_theta(x^i) = frac{{1}}{{1+e^{theta x^i}}} $$
$endgroup$
– Rudresha Parameshappa
Jan 2 '17 at 13:06
2
$begingroup$
@Israel, logarithm is usually base e in math. Take a look at When log is written without a base, is the equation normally referring to log base 10 or natural log?
$endgroup$
– gdrt
Mar 11 '18 at 11:46
|
show 10 more comments
$begingroup$
The reason is the following. We use the notation:
$$theta x^i:=theta_0+theta_1 x^i_1+dots+theta_p x^i_p.$$
Then
$$log h_theta(x^i)=logfrac{1}{1+e^{-theta x^i} }=-log ( 1+e^{-theta x^i} ),$$ $$log(1- h_theta(x^i))=log(1-frac{1}{1+e^{-theta x^i} })=log (e^{-theta x^i} )-log ( 1+e^{-theta x^i} )=-theta x^i-log ( 1+e^{-theta x^i} ),$$ [ this used: $ 1 = frac{(1+e^{-theta x^i})}{(1+e^{-theta x^i})},$ the 1's in numerator cancel, then we used: $log(x/y) = log(x) - log(y)$]
Since our original cost function is the form of:
$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$
Plugging in the two simplified expressions above, we obtain
$$J(theta)=-frac{1}{m}sum_{i=1}^m left[-y^i(log ( 1+e^{-theta x^i})) + (1-y^i)(-theta x^i-log ( 1+e^{-theta x^i} ))right]$$, which can be simplified to:
$$J(theta)=-frac{1}{m}sum_{i=1}^m left[y_itheta x^i-theta x^i-log(1+e^{-theta x^i})right]=-frac{1}{m}sum_{i=1}^m left[y_itheta x^i-log(1+e^{theta x^i})right],~~(*)$$
where the second equality follows from
$$-theta x^i-log(1+e^{-theta x^i})=
-left[ log e^{theta x^i}+
log(1+e^{-theta x^i} )
right]=-log(1+e^{theta x^i}). $$ [ we used $ log(x) + log(y) = log(x y) $ ]
All you need now is to compute the partial derivatives of $(*)$ w.r.t. $theta_j$. As
$$frac{partial}{partial theta_j}y_itheta x^i=y_ix^i_j, $$
$$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}=x^i_jh_theta(x^i),$$
the thesis follows.
$endgroup$
1
$begingroup$
Can't upvote as I don't have 15 reputation just yet! :) Will google the maximum entropy principle as I have no clue what that is! as a side note I am not sure how you made the jump from log(1 - hypothesis(x)) to log(a) - log(b) but will raise another question for this as I don't think I can type latex here, really impressed with your answer! learning all this stuff on my own is proving to be quite a challenge thus the more kudos to you for providing such an elegant answer! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 13:54
1
$begingroup$
yes!!! I couldn't see that you were using this property $log(frac{a}{b})=log a-log b$ Now everything makes sense :) Thank you so much! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 14:26
4
$begingroup$
Awesome explanation, thank you very much! The only thing I am still struggling with is the very last line, how the derivative was made in $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$ ? Could you provide a hint for it? Thank you very much for the help!
$endgroup$
– Pedro Lopes
Dec 1 '15 at 21:40
6
$begingroup$
@codewarrior hope this helps. $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}} $$ $$ = frac{{x^i_j}}{{e^{-theta x^i}*(1+e^{theta x^i})}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{-theta x^i + theta x^i}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{0}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+1}} $$ $$ =frac{{x^i_j}}{{1+e^{-theta x^i}}} $$ $$ =x^i_j*h_theta(x^i) $$ as $$ h_theta(x^i) = frac{{1}}{{1+e^{theta x^i}}} $$
$endgroup$
– Rudresha Parameshappa
Jan 2 '17 at 13:06
2
$begingroup$
@Israel, logarithm is usually base e in math. Take a look at When log is written without a base, is the equation normally referring to log base 10 or natural log?
$endgroup$
– gdrt
Mar 11 '18 at 11:46
|
show 10 more comments
$begingroup$
The reason is the following. We use the notation:
$$theta x^i:=theta_0+theta_1 x^i_1+dots+theta_p x^i_p.$$
Then
$$log h_theta(x^i)=logfrac{1}{1+e^{-theta x^i} }=-log ( 1+e^{-theta x^i} ),$$ $$log(1- h_theta(x^i))=log(1-frac{1}{1+e^{-theta x^i} })=log (e^{-theta x^i} )-log ( 1+e^{-theta x^i} )=-theta x^i-log ( 1+e^{-theta x^i} ),$$ [ this used: $ 1 = frac{(1+e^{-theta x^i})}{(1+e^{-theta x^i})},$ the 1's in numerator cancel, then we used: $log(x/y) = log(x) - log(y)$]
Since our original cost function is the form of:
$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$
Plugging in the two simplified expressions above, we obtain
$$J(theta)=-frac{1}{m}sum_{i=1}^m left[-y^i(log ( 1+e^{-theta x^i})) + (1-y^i)(-theta x^i-log ( 1+e^{-theta x^i} ))right]$$, which can be simplified to:
$$J(theta)=-frac{1}{m}sum_{i=1}^m left[y_itheta x^i-theta x^i-log(1+e^{-theta x^i})right]=-frac{1}{m}sum_{i=1}^m left[y_itheta x^i-log(1+e^{theta x^i})right],~~(*)$$
where the second equality follows from
$$-theta x^i-log(1+e^{-theta x^i})=
-left[ log e^{theta x^i}+
log(1+e^{-theta x^i} )
right]=-log(1+e^{theta x^i}). $$ [ we used $ log(x) + log(y) = log(x y) $ ]
All you need now is to compute the partial derivatives of $(*)$ w.r.t. $theta_j$. As
$$frac{partial}{partial theta_j}y_itheta x^i=y_ix^i_j, $$
$$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}=x^i_jh_theta(x^i),$$
the thesis follows.
$endgroup$
The reason is the following. We use the notation:
$$theta x^i:=theta_0+theta_1 x^i_1+dots+theta_p x^i_p.$$
Then
$$log h_theta(x^i)=logfrac{1}{1+e^{-theta x^i} }=-log ( 1+e^{-theta x^i} ),$$ $$log(1- h_theta(x^i))=log(1-frac{1}{1+e^{-theta x^i} })=log (e^{-theta x^i} )-log ( 1+e^{-theta x^i} )=-theta x^i-log ( 1+e^{-theta x^i} ),$$ [ this used: $ 1 = frac{(1+e^{-theta x^i})}{(1+e^{-theta x^i})},$ the 1's in numerator cancel, then we used: $log(x/y) = log(x) - log(y)$]
Since our original cost function is the form of:
$$J(theta)=-frac{1}{m}sum_{i=1}^{m}y^{i}log(h_theta(x^{i}))+(1-y^{i})log(1-h_theta(x^{i}))$$
Plugging in the two simplified expressions above, we obtain
$$J(theta)=-frac{1}{m}sum_{i=1}^m left[-y^i(log ( 1+e^{-theta x^i})) + (1-y^i)(-theta x^i-log ( 1+e^{-theta x^i} ))right]$$, which can be simplified to:
$$J(theta)=-frac{1}{m}sum_{i=1}^m left[y_itheta x^i-theta x^i-log(1+e^{-theta x^i})right]=-frac{1}{m}sum_{i=1}^m left[y_itheta x^i-log(1+e^{theta x^i})right],~~(*)$$
where the second equality follows from
$$-theta x^i-log(1+e^{-theta x^i})=
-left[ log e^{theta x^i}+
log(1+e^{-theta x^i} )
right]=-log(1+e^{theta x^i}). $$ [ we used $ log(x) + log(y) = log(x y) $ ]
All you need now is to compute the partial derivatives of $(*)$ w.r.t. $theta_j$. As
$$frac{partial}{partial theta_j}y_itheta x^i=y_ix^i_j, $$
$$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}=x^i_jh_theta(x^i),$$
the thesis follows.
edited Jan 13 at 14:45
amWhy
192k28225439
192k28225439
answered Aug 27 '13 at 12:25
AvitusAvitus
11.6k11840
11.6k11840
1
$begingroup$
Can't upvote as I don't have 15 reputation just yet! :) Will google the maximum entropy principle as I have no clue what that is! as a side note I am not sure how you made the jump from log(1 - hypothesis(x)) to log(a) - log(b) but will raise another question for this as I don't think I can type latex here, really impressed with your answer! learning all this stuff on my own is proving to be quite a challenge thus the more kudos to you for providing such an elegant answer! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 13:54
1
$begingroup$
yes!!! I couldn't see that you were using this property $log(frac{a}{b})=log a-log b$ Now everything makes sense :) Thank you so much! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 14:26
4
$begingroup$
Awesome explanation, thank you very much! The only thing I am still struggling with is the very last line, how the derivative was made in $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$ ? Could you provide a hint for it? Thank you very much for the help!
$endgroup$
– Pedro Lopes
Dec 1 '15 at 21:40
6
$begingroup$
@codewarrior hope this helps. $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}} $$ $$ = frac{{x^i_j}}{{e^{-theta x^i}*(1+e^{theta x^i})}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{-theta x^i + theta x^i}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{0}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+1}} $$ $$ =frac{{x^i_j}}{{1+e^{-theta x^i}}} $$ $$ =x^i_j*h_theta(x^i) $$ as $$ h_theta(x^i) = frac{{1}}{{1+e^{theta x^i}}} $$
$endgroup$
– Rudresha Parameshappa
Jan 2 '17 at 13:06
2
$begingroup$
@Israel, logarithm is usually base e in math. Take a look at When log is written without a base, is the equation normally referring to log base 10 or natural log?
$endgroup$
– gdrt
Mar 11 '18 at 11:46
|
show 10 more comments
1
$begingroup$
Can't upvote as I don't have 15 reputation just yet! :) Will google the maximum entropy principle as I have no clue what that is! as a side note I am not sure how you made the jump from log(1 - hypothesis(x)) to log(a) - log(b) but will raise another question for this as I don't think I can type latex here, really impressed with your answer! learning all this stuff on my own is proving to be quite a challenge thus the more kudos to you for providing such an elegant answer! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 13:54
1
$begingroup$
yes!!! I couldn't see that you were using this property $log(frac{a}{b})=log a-log b$ Now everything makes sense :) Thank you so much! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 14:26
4
$begingroup$
Awesome explanation, thank you very much! The only thing I am still struggling with is the very last line, how the derivative was made in $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$ ? Could you provide a hint for it? Thank you very much for the help!
$endgroup$
– Pedro Lopes
Dec 1 '15 at 21:40
6
$begingroup$
@codewarrior hope this helps. $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}} $$ $$ = frac{{x^i_j}}{{e^{-theta x^i}*(1+e^{theta x^i})}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{-theta x^i + theta x^i}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{0}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+1}} $$ $$ =frac{{x^i_j}}{{1+e^{-theta x^i}}} $$ $$ =x^i_j*h_theta(x^i) $$ as $$ h_theta(x^i) = frac{{1}}{{1+e^{theta x^i}}} $$
$endgroup$
– Rudresha Parameshappa
Jan 2 '17 at 13:06
2
$begingroup$
@Israel, logarithm is usually base e in math. Take a look at When log is written without a base, is the equation normally referring to log base 10 or natural log?
$endgroup$
– gdrt
Mar 11 '18 at 11:46
1
1
$begingroup$
Can't upvote as I don't have 15 reputation just yet! :) Will google the maximum entropy principle as I have no clue what that is! as a side note I am not sure how you made the jump from log(1 - hypothesis(x)) to log(a) - log(b) but will raise another question for this as I don't think I can type latex here, really impressed with your answer! learning all this stuff on my own is proving to be quite a challenge thus the more kudos to you for providing such an elegant answer! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 13:54
$begingroup$
Can't upvote as I don't have 15 reputation just yet! :) Will google the maximum entropy principle as I have no clue what that is! as a side note I am not sure how you made the jump from log(1 - hypothesis(x)) to log(a) - log(b) but will raise another question for this as I don't think I can type latex here, really impressed with your answer! learning all this stuff on my own is proving to be quite a challenge thus the more kudos to you for providing such an elegant answer! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 13:54
1
1
$begingroup$
yes!!! I couldn't see that you were using this property $log(frac{a}{b})=log a-log b$ Now everything makes sense :) Thank you so much! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 14:26
$begingroup$
yes!!! I couldn't see that you were using this property $log(frac{a}{b})=log a-log b$ Now everything makes sense :) Thank you so much! :)
$endgroup$
– dreamwalker
Aug 27 '13 at 14:26
4
4
$begingroup$
Awesome explanation, thank you very much! The only thing I am still struggling with is the very last line, how the derivative was made in $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$ ? Could you provide a hint for it? Thank you very much for the help!
$endgroup$
– Pedro Lopes
Dec 1 '15 at 21:40
$begingroup$
Awesome explanation, thank you very much! The only thing I am still struggling with is the very last line, how the derivative was made in $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$ ? Could you provide a hint for it? Thank you very much for the help!
$endgroup$
– Pedro Lopes
Dec 1 '15 at 21:40
6
6
$begingroup$
@codewarrior hope this helps. $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}} $$ $$ = frac{{x^i_j}}{{e^{-theta x^i}*(1+e^{theta x^i})}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{-theta x^i + theta x^i}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{0}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+1}} $$ $$ =frac{{x^i_j}}{{1+e^{-theta x^i}}} $$ $$ =x^i_j*h_theta(x^i) $$ as $$ h_theta(x^i) = frac{{1}}{{1+e^{theta x^i}}} $$
$endgroup$
– Rudresha Parameshappa
Jan 2 '17 at 13:06
$begingroup$
@codewarrior hope this helps. $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}} $$ $$ = frac{{x^i_j}}{{e^{-theta x^i}*(1+e^{theta x^i})}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{-theta x^i + theta x^i}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+e^{0}}} $$ $$ =frac{{x^i_j}}{{e^{-theta x^i}+1}} $$ $$ =frac{{x^i_j}}{{1+e^{-theta x^i}}} $$ $$ =x^i_j*h_theta(x^i) $$ as $$ h_theta(x^i) = frac{{1}}{{1+e^{theta x^i}}} $$
$endgroup$
– Rudresha Parameshappa
Jan 2 '17 at 13:06
2
2
$begingroup$
@Israel, logarithm is usually base e in math. Take a look at When log is written without a base, is the equation normally referring to log base 10 or natural log?
$endgroup$
– gdrt
Mar 11 '18 at 11:46
$begingroup$
@Israel, logarithm is usually base e in math. Take a look at When log is written without a base, is the equation normally referring to log base 10 or natural log?
$endgroup$
– gdrt
Mar 11 '18 at 11:46
|
show 10 more comments
$begingroup$
Pedro,
=> partial fractions
$$log(1 - frac{a}{b})$$
$$1 - frac{a}{b} = frac{b}{b} - frac{a}{b} = frac{b-a}{b},$$
$$log(1 - frac{a}{b}) = log(frac{b-a}{b}) = log(b-a) - log(b)$$
$endgroup$
add a comment |
$begingroup$
Pedro,
=> partial fractions
$$log(1 - frac{a}{b})$$
$$1 - frac{a}{b} = frac{b}{b} - frac{a}{b} = frac{b-a}{b},$$
$$log(1 - frac{a}{b}) = log(frac{b-a}{b}) = log(b-a) - log(b)$$
$endgroup$
add a comment |
$begingroup$
Pedro,
=> partial fractions
$$log(1 - frac{a}{b})$$
$$1 - frac{a}{b} = frac{b}{b} - frac{a}{b} = frac{b-a}{b},$$
$$log(1 - frac{a}{b}) = log(frac{b-a}{b}) = log(b-a) - log(b)$$
$endgroup$
Pedro,
=> partial fractions
$$log(1 - frac{a}{b})$$
$$1 - frac{a}{b} = frac{b}{b} - frac{a}{b} = frac{b-a}{b},$$
$$log(1 - frac{a}{b}) = log(frac{b-a}{b}) = log(b-a) - log(b)$$
edited Apr 13 '16 at 15:39
answered Apr 13 '16 at 15:23
Richard WheatleyRichard Wheatley
413
413
add a comment |
add a comment |
$begingroup$
@pedro-lopes, it is called as: chain rule.
$$(u(v))' = u(v)' * v'$$
For example:
$$y = sin(3x - 5)$$
$$u(v) = sin(3x - 5)$$
$$v = (3x - 5)$$
$$y' = sin(3x - 5)' = cos(3x - 5) * (3 - 0) = 3cos(3x-5)$$
Regarding: $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$
$$u(v) = log(1+e^{theta x^i})$$
$$v = 1+e^{theta x^i}$$
$$frac{partial}{partial theta}log(1+e^{theta x^i}) = frac{partial}{partial theta}log(1+e^{theta x^i}) * frac{partial}{partial theta}(1+e^{theta x^i}) = frac{1}{1+e^{theta x^i}} * (0 + xe^{theta x^i}) = frac{xe^{theta x^i}}{1+e^{theta x^i}} $$
Note that $$log(x)' = frac{1}{x}$$
Hope that I answered on your question!
$endgroup$
add a comment |
$begingroup$
@pedro-lopes, it is called as: chain rule.
$$(u(v))' = u(v)' * v'$$
For example:
$$y = sin(3x - 5)$$
$$u(v) = sin(3x - 5)$$
$$v = (3x - 5)$$
$$y' = sin(3x - 5)' = cos(3x - 5) * (3 - 0) = 3cos(3x-5)$$
Regarding: $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$
$$u(v) = log(1+e^{theta x^i})$$
$$v = 1+e^{theta x^i}$$
$$frac{partial}{partial theta}log(1+e^{theta x^i}) = frac{partial}{partial theta}log(1+e^{theta x^i}) * frac{partial}{partial theta}(1+e^{theta x^i}) = frac{1}{1+e^{theta x^i}} * (0 + xe^{theta x^i}) = frac{xe^{theta x^i}}{1+e^{theta x^i}} $$
Note that $$log(x)' = frac{1}{x}$$
Hope that I answered on your question!
$endgroup$
add a comment |
$begingroup$
@pedro-lopes, it is called as: chain rule.
$$(u(v))' = u(v)' * v'$$
For example:
$$y = sin(3x - 5)$$
$$u(v) = sin(3x - 5)$$
$$v = (3x - 5)$$
$$y' = sin(3x - 5)' = cos(3x - 5) * (3 - 0) = 3cos(3x-5)$$
Regarding: $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$
$$u(v) = log(1+e^{theta x^i})$$
$$v = 1+e^{theta x^i}$$
$$frac{partial}{partial theta}log(1+e^{theta x^i}) = frac{partial}{partial theta}log(1+e^{theta x^i}) * frac{partial}{partial theta}(1+e^{theta x^i}) = frac{1}{1+e^{theta x^i}} * (0 + xe^{theta x^i}) = frac{xe^{theta x^i}}{1+e^{theta x^i}} $$
Note that $$log(x)' = frac{1}{x}$$
Hope that I answered on your question!
$endgroup$
@pedro-lopes, it is called as: chain rule.
$$(u(v))' = u(v)' * v'$$
For example:
$$y = sin(3x - 5)$$
$$u(v) = sin(3x - 5)$$
$$v = (3x - 5)$$
$$y' = sin(3x - 5)' = cos(3x - 5) * (3 - 0) = 3cos(3x-5)$$
Regarding: $$frac{partial}{partial theta_j}log(1+e^{theta x^i})=frac{x^i_je^{theta x^i}}{1+e^{theta x^i}}$$
$$u(v) = log(1+e^{theta x^i})$$
$$v = 1+e^{theta x^i}$$
$$frac{partial}{partial theta}log(1+e^{theta x^i}) = frac{partial}{partial theta}log(1+e^{theta x^i}) * frac{partial}{partial theta}(1+e^{theta x^i}) = frac{1}{1+e^{theta x^i}} * (0 + xe^{theta x^i}) = frac{xe^{theta x^i}}{1+e^{theta x^i}} $$
Note that $$log(x)' = frac{1}{x}$$
Hope that I answered on your question!
edited Apr 17 '17 at 13:29
The Count
2,29961431
2,29961431
answered Apr 17 '17 at 13:17
RedEyedRedEyed
1313
1313
add a comment |
add a comment |
$begingroup$
We have,
begin{align*}
L(theta) &= -frac{1}{m}sumlimits_{i=1}^{m}{y_i. log P(y_i|x_i,theta) + (1-y_i). log{(1 - P(y_i|x_i,theta))}} \
h_theta(x_i) &= P(y_i|x_i,theta) = P(y_i=1|x_i,theta) = frac{1}{1+exp{left(-sumlimits_k theta_k x_i^k right)}}
end{align*}
Then,
begin{align*}
log{(P(y_i|x_i,theta))}=log{(P(y_i=1|x_i,theta))} &=-log{left(1+exp{left(-sumlimits_k theta_k x_i^k right)} right)} \
Rightarrow frac{partial }{partial theta_j} log P(y_i|x_i,theta) =frac{x_i^j.exp{left(-sumlimits_k theta_k x_i^kright)}}{1+exp{left(-sumlimits_k theta_k x_i^kright)}} &= x_i^j.left(1-P(y_i|x_i,theta)right) end{align*}
and
begin{align*}
log{(1-P(y_i|x_i,theta))}=log{(1-P(y_i=1|x_i,theta))} &=-sumlimits_k theta_k x_i^k -log{left(1+exp{left(-sumlimits_k theta_k x_i^k right)} right)} \
Rightarrow frac{partial }{partial theta_j} log{(1 - P(y_i|x_i,theta))} &= -x_i^j + x_i^j.left(1-P(y_i|x_i,theta)right) = -x_i^j.P(y_i|x_i,theta) \
end{align*}
Hence,
begin{align*}
frac{partial }{partial theta_j} L(theta) &= -frac{1}{m}sumlimits_{i=1}^{m}{y_i.frac{partial }{partial theta_j} log P(y_i|x_i,theta) + (1-y_i).frac{partial }{partial theta_j} log{(1 - P(y_i|x_i,theta))}} \
&=-frac{1}{m}sumlimits_{i=1}^{m}{y_i.x_i^j.left(1-P(y_i|x_i,theta)right) - (1-y_i).x_i^j.P(y_i|x_i,theta)} \
&=-frac{1}{m}sumlimits_{i=1}^{m}{y_i.x_i^j - x_i^j.P(y_i|x_i,theta)} \
&=frac{1}{m}sumlimits_{i=1}^{m}{(P(y_i|x_i,theta)-y_i).x_i^j}
end{align*} (Proved)
$endgroup$
$begingroup$
The logistic regression implementation with gradient-descent using this derivative can be found here: sandipanweb.wordpress.com/2017/11/25/…
$endgroup$
– Sandipan Dey
Nov 27 '17 at 12:53
add a comment |
$begingroup$
We have,
begin{align*}
L(theta) &= -frac{1}{m}sumlimits_{i=1}^{m}{y_i. log P(y_i|x_i,theta) + (1-y_i). log{(1 - P(y_i|x_i,theta))}} \
h_theta(x_i) &= P(y_i|x_i,theta) = P(y_i=1|x_i,theta) = frac{1}{1+exp{left(-sumlimits_k theta_k x_i^k right)}}
end{align*}
Then,
begin{align*}
log{(P(y_i|x_i,theta))}=log{(P(y_i=1|x_i,theta))} &=-log{left(1+exp{left(-sumlimits_k theta_k x_i^k right)} right)} \
Rightarrow frac{partial }{partial theta_j} log P(y_i|x_i,theta) =frac{x_i^j.exp{left(-sumlimits_k theta_k x_i^kright)}}{1+exp{left(-sumlimits_k theta_k x_i^kright)}} &= x_i^j.left(1-P(y_i|x_i,theta)right) end{align*}
and
begin{align*}
log{(1-P(y_i|x_i,theta))}=log{(1-P(y_i=1|x_i,theta))} &=-sumlimits_k theta_k x_i^k -log{left(1+exp{left(-sumlimits_k theta_k x_i^k right)} right)} \
Rightarrow frac{partial }{partial theta_j} log{(1 - P(y_i|x_i,theta))} &= -x_i^j + x_i^j.left(1-P(y_i|x_i,theta)right) = -x_i^j.P(y_i|x_i,theta) \
end{align*}
Hence,
begin{align*}
frac{partial }{partial theta_j} L(theta) &= -frac{1}{m}sumlimits_{i=1}^{m}{y_i.frac{partial }{partial theta_j} log P(y_i|x_i,theta) + (1-y_i).frac{partial }{partial theta_j} log{(1 - P(y_i|x_i,theta))}} \
&=-frac{1}{m}sumlimits_{i=1}^{m}{y_i.x_i^j.left(1-P(y_i|x_i,theta)right) - (1-y_i).x_i^j.P(y_i|x_i,theta)} \
&=-frac{1}{m}sumlimits_{i=1}^{m}{y_i.x_i^j - x_i^j.P(y_i|x_i,theta)} \
&=frac{1}{m}sumlimits_{i=1}^{m}{(P(y_i|x_i,theta)-y_i).x_i^j}
end{align*} (Proved)
$endgroup$
$begingroup$
The logistic regression implementation with gradient-descent using this derivative can be found here: sandipanweb.wordpress.com/2017/11/25/…
$endgroup$
– Sandipan Dey
Nov 27 '17 at 12:53
add a comment |
$begingroup$
We have,
begin{align*}
L(theta) &= -frac{1}{m}sumlimits_{i=1}^{m}{y_i. log P(y_i|x_i,theta) + (1-y_i). log{(1 - P(y_i|x_i,theta))}} \
h_theta(x_i) &= P(y_i|x_i,theta) = P(y_i=1|x_i,theta) = frac{1}{1+exp{left(-sumlimits_k theta_k x_i^k right)}}
end{align*}
Then,
begin{align*}
log{(P(y_i|x_i,theta))}=log{(P(y_i=1|x_i,theta))} &=-log{left(1+exp{left(-sumlimits_k theta_k x_i^k right)} right)} \
Rightarrow frac{partial }{partial theta_j} log P(y_i|x_i,theta) =frac{x_i^j.exp{left(-sumlimits_k theta_k x_i^kright)}}{1+exp{left(-sumlimits_k theta_k x_i^kright)}} &= x_i^j.left(1-P(y_i|x_i,theta)right) end{align*}
and
begin{align*}
log{(1-P(y_i|x_i,theta))}=log{(1-P(y_i=1|x_i,theta))} &=-sumlimits_k theta_k x_i^k -log{left(1+exp{left(-sumlimits_k theta_k x_i^k right)} right)} \
Rightarrow frac{partial }{partial theta_j} log{(1 - P(y_i|x_i,theta))} &= -x_i^j + x_i^j.left(1-P(y_i|x_i,theta)right) = -x_i^j.P(y_i|x_i,theta) \
end{align*}
Hence,
begin{align*}
frac{partial }{partial theta_j} L(theta) &= -frac{1}{m}sumlimits_{i=1}^{m}{y_i.frac{partial }{partial theta_j} log P(y_i|x_i,theta) + (1-y_i).frac{partial }{partial theta_j} log{(1 - P(y_i|x_i,theta))}} \
&=-frac{1}{m}sumlimits_{i=1}^{m}{y_i.x_i^j.left(1-P(y_i|x_i,theta)right) - (1-y_i).x_i^j.P(y_i|x_i,theta)} \
&=-frac{1}{m}sumlimits_{i=1}^{m}{y_i.x_i^j - x_i^j.P(y_i|x_i,theta)} \
&=frac{1}{m}sumlimits_{i=1}^{m}{(P(y_i|x_i,theta)-y_i).x_i^j}
end{align*} (Proved)
$endgroup$
We have,
begin{align*}
L(theta) &= -frac{1}{m}sumlimits_{i=1}^{m}{y_i. log P(y_i|x_i,theta) + (1-y_i). log{(1 - P(y_i|x_i,theta))}} \
h_theta(x_i) &= P(y_i|x_i,theta) = P(y_i=1|x_i,theta) = frac{1}{1+exp{left(-sumlimits_k theta_k x_i^k right)}}
end{align*}
Then,
begin{align*}
log{(P(y_i|x_i,theta))}=log{(P(y_i=1|x_i,theta))} &=-log{left(1+exp{left(-sumlimits_k theta_k x_i^k right)} right)} \
Rightarrow frac{partial }{partial theta_j} log P(y_i|x_i,theta) =frac{x_i^j.exp{left(-sumlimits_k theta_k x_i^kright)}}{1+exp{left(-sumlimits_k theta_k x_i^kright)}} &= x_i^j.left(1-P(y_i|x_i,theta)right) end{align*}
and
begin{align*}
log{(1-P(y_i|x_i,theta))}=log{(1-P(y_i=1|x_i,theta))} &=-sumlimits_k theta_k x_i^k -log{left(1+exp{left(-sumlimits_k theta_k x_i^k right)} right)} \
Rightarrow frac{partial }{partial theta_j} log{(1 - P(y_i|x_i,theta))} &= -x_i^j + x_i^j.left(1-P(y_i|x_i,theta)right) = -x_i^j.P(y_i|x_i,theta) \
end{align*}
Hence,
begin{align*}
frac{partial }{partial theta_j} L(theta) &= -frac{1}{m}sumlimits_{i=1}^{m}{y_i.frac{partial }{partial theta_j} log P(y_i|x_i,theta) + (1-y_i).frac{partial }{partial theta_j} log{(1 - P(y_i|x_i,theta))}} \
&=-frac{1}{m}sumlimits_{i=1}^{m}{y_i.x_i^j.left(1-P(y_i|x_i,theta)right) - (1-y_i).x_i^j.P(y_i|x_i,theta)} \
&=-frac{1}{m}sumlimits_{i=1}^{m}{y_i.x_i^j - x_i^j.P(y_i|x_i,theta)} \
&=frac{1}{m}sumlimits_{i=1}^{m}{(P(y_i|x_i,theta)-y_i).x_i^j}
end{align*} (Proved)
edited Dec 5 '17 at 11:42
answered Nov 27 '17 at 12:50
Sandipan DeySandipan Dey
25515
25515
$begingroup$
The logistic regression implementation with gradient-descent using this derivative can be found here: sandipanweb.wordpress.com/2017/11/25/…
$endgroup$
– Sandipan Dey
Nov 27 '17 at 12:53
add a comment |
$begingroup$
The logistic regression implementation with gradient-descent using this derivative can be found here: sandipanweb.wordpress.com/2017/11/25/…
$endgroup$
– Sandipan Dey
Nov 27 '17 at 12:53
$begingroup$
The logistic regression implementation with gradient-descent using this derivative can be found here: sandipanweb.wordpress.com/2017/11/25/…
$endgroup$
– Sandipan Dey
Nov 27 '17 at 12:53
$begingroup$
The logistic regression implementation with gradient-descent using this derivative can be found here: sandipanweb.wordpress.com/2017/11/25/…
$endgroup$
– Sandipan Dey
Nov 27 '17 at 12:53
add a comment |
$begingroup$
$${
J(theta)=-frac{1}{m} sum_{i=1}^{m} y^ilog(h_theta(x^i))+(1-y^i)log(1-h_theta(x^i))
}$$
where $h_theta(x)$ is defined as follows
$${
h_theta(x)=g(theta^Tx),
}$$
$${
g(z)=frac{1}{1+e^{-z}}
}$$
Note that $g(z)'=g(z)*(1-g(z))$ and
we can simply write right side of summation as
$${
ylog(g)+(1-y)log(1-g)
}$$
and the derivative of it as
$${
y frac{1}{g}g'+(1-y) left( frac{1}{1-g}right) (-g') \
=left( frac{y}{g}- frac{1-y}{1-g}right) g' \
= frac{y(1-g)-g(1-y)}{g(1-g)}g' \
= frac{y-y*g-g+g*y}{g(1-g)}g' \
= frac{y-y*g-g+g*y}{g(1-g)}g(1-g)*x \
=(y-g)*x
}$$
and then we can rewrite above as
$${
frac{partial}{partialtheta_{j}}J(theta) =frac{1}{m}sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i
}$$
$endgroup$
add a comment |
$begingroup$
$${
J(theta)=-frac{1}{m} sum_{i=1}^{m} y^ilog(h_theta(x^i))+(1-y^i)log(1-h_theta(x^i))
}$$
where $h_theta(x)$ is defined as follows
$${
h_theta(x)=g(theta^Tx),
}$$
$${
g(z)=frac{1}{1+e^{-z}}
}$$
Note that $g(z)'=g(z)*(1-g(z))$ and
we can simply write right side of summation as
$${
ylog(g)+(1-y)log(1-g)
}$$
and the derivative of it as
$${
y frac{1}{g}g'+(1-y) left( frac{1}{1-g}right) (-g') \
=left( frac{y}{g}- frac{1-y}{1-g}right) g' \
= frac{y(1-g)-g(1-y)}{g(1-g)}g' \
= frac{y-y*g-g+g*y}{g(1-g)}g' \
= frac{y-y*g-g+g*y}{g(1-g)}g(1-g)*x \
=(y-g)*x
}$$
and then we can rewrite above as
$${
frac{partial}{partialtheta_{j}}J(theta) =frac{1}{m}sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i
}$$
$endgroup$
add a comment |
$begingroup$
$${
J(theta)=-frac{1}{m} sum_{i=1}^{m} y^ilog(h_theta(x^i))+(1-y^i)log(1-h_theta(x^i))
}$$
where $h_theta(x)$ is defined as follows
$${
h_theta(x)=g(theta^Tx),
}$$
$${
g(z)=frac{1}{1+e^{-z}}
}$$
Note that $g(z)'=g(z)*(1-g(z))$ and
we can simply write right side of summation as
$${
ylog(g)+(1-y)log(1-g)
}$$
and the derivative of it as
$${
y frac{1}{g}g'+(1-y) left( frac{1}{1-g}right) (-g') \
=left( frac{y}{g}- frac{1-y}{1-g}right) g' \
= frac{y(1-g)-g(1-y)}{g(1-g)}g' \
= frac{y-y*g-g+g*y}{g(1-g)}g' \
= frac{y-y*g-g+g*y}{g(1-g)}g(1-g)*x \
=(y-g)*x
}$$
and then we can rewrite above as
$${
frac{partial}{partialtheta_{j}}J(theta) =frac{1}{m}sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i
}$$
$endgroup$
$${
J(theta)=-frac{1}{m} sum_{i=1}^{m} y^ilog(h_theta(x^i))+(1-y^i)log(1-h_theta(x^i))
}$$
where $h_theta(x)$ is defined as follows
$${
h_theta(x)=g(theta^Tx),
}$$
$${
g(z)=frac{1}{1+e^{-z}}
}$$
Note that $g(z)'=g(z)*(1-g(z))$ and
we can simply write right side of summation as
$${
ylog(g)+(1-y)log(1-g)
}$$
and the derivative of it as
$${
y frac{1}{g}g'+(1-y) left( frac{1}{1-g}right) (-g') \
=left( frac{y}{g}- frac{1-y}{1-g}right) g' \
= frac{y(1-g)-g(1-y)}{g(1-g)}g' \
= frac{y-y*g-g+g*y}{g(1-g)}g' \
= frac{y-y*g-g+g*y}{g(1-g)}g(1-g)*x \
=(y-g)*x
}$$
and then we can rewrite above as
$${
frac{partial}{partialtheta_{j}}J(theta) =frac{1}{m}sum_{i=1}^{m}(h_theta(x^{i})-y^i)x_j^i
}$$
answered Dec 5 '18 at 11:59
Junghak AhnJunghak Ahn
111
111
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f477207%2fderivative-of-cost-function-for-logistic-regression%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
I think to resolve $theta$ by gradient will be hard way (or impossible??). Because it different with linear classfication, it will not has close form. So i suggest you can use other method example Newton's method. BTW, do you find $theta$ using above way?
$endgroup$
– John
Jul 22 '14 at 2:16
5
$begingroup$
missing $frac{1}{m}$ for the derivative of the Cost
$endgroup$
– bourneli
Apr 20 '17 at 5:01