Subtraction of slope in gradient descent
$begingroup$
In the gradient descent algorithm say $f(x)$ (quadratic function) is the objective function. SO the algorithm is defined as
$$x_i = x_i - afrac{partial f(x)}{partial x_i}$$
I Just dont quite understand the meaning of doing a subtraction. I'm intuitively able to follow that we are going in the direction of steepest descent but have some questions. The derivative of $f(x)$ is going to give us the equation of a line. So when we substitute the value of $x_i$ in $f'(x)$ , what we get is a $y$ coordinate: $y_i$. So I dont understand how we subtract a $y$ coordinate from an $x$ coordinate ?
calculus optimization machine-learning
$endgroup$
add a comment |
$begingroup$
In the gradient descent algorithm say $f(x)$ (quadratic function) is the objective function. SO the algorithm is defined as
$$x_i = x_i - afrac{partial f(x)}{partial x_i}$$
I Just dont quite understand the meaning of doing a subtraction. I'm intuitively able to follow that we are going in the direction of steepest descent but have some questions. The derivative of $f(x)$ is going to give us the equation of a line. So when we substitute the value of $x_i$ in $f'(x)$ , what we get is a $y$ coordinate: $y_i$. So I dont understand how we subtract a $y$ coordinate from an $x$ coordinate ?
calculus optimization machine-learning
$endgroup$
add a comment |
$begingroup$
In the gradient descent algorithm say $f(x)$ (quadratic function) is the objective function. SO the algorithm is defined as
$$x_i = x_i - afrac{partial f(x)}{partial x_i}$$
I Just dont quite understand the meaning of doing a subtraction. I'm intuitively able to follow that we are going in the direction of steepest descent but have some questions. The derivative of $f(x)$ is going to give us the equation of a line. So when we substitute the value of $x_i$ in $f'(x)$ , what we get is a $y$ coordinate: $y_i$. So I dont understand how we subtract a $y$ coordinate from an $x$ coordinate ?
calculus optimization machine-learning
$endgroup$
In the gradient descent algorithm say $f(x)$ (quadratic function) is the objective function. SO the algorithm is defined as
$$x_i = x_i - afrac{partial f(x)}{partial x_i}$$
I Just dont quite understand the meaning of doing a subtraction. I'm intuitively able to follow that we are going in the direction of steepest descent but have some questions. The derivative of $f(x)$ is going to give us the equation of a line. So when we substitute the value of $x_i$ in $f'(x)$ , what we get is a $y$ coordinate: $y_i$. So I dont understand how we subtract a $y$ coordinate from an $x$ coordinate ?
calculus optimization machine-learning
calculus optimization machine-learning
edited Sep 7 '12 at 15:40
Michael Hardy
1
1
asked Sep 7 '12 at 5:23
karthik Akarthik A
1396
1396
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
The direction of $nabla f$ is the direction of greatest increase of $f$. (This can be shown by writing out the directional derivative of $f$ using the chain rule, and comparing the result with a dot product of the direction vector with the gradient vector.) You want to go toward the direction of greatest decrease, so move along $-nabla f$.
$endgroup$
$begingroup$
Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
$endgroup$
– karthik A
Sep 7 '12 at 5:34
$begingroup$
Okay I figured it out. Thanks !!
$endgroup$
– karthik A
Sep 7 '12 at 5:41
add a comment |
$begingroup$
But still, why is it MINUS?
Because your goal is to MINIMIZE J(θ).
So, in the maximization problem, you need to ADD alpha * slope.
$endgroup$
add a comment |
$begingroup$
My understanding of this minus sign is about the assumption of SGD. The assumption is that the objective function $J$ is a convex function where has the optimal solution (global, local) at $theta_{*}$ where the partial derivatives are $0$, so that's why the parameters are updated by moving to the reverse direction of function "changing faster", because SGD wants $J$ changes slower and slower and gradually hits the convex point.
$endgroup$
1
$begingroup$
Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
$endgroup$
– dantopa
Dec 27 '18 at 5:56
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f192244%2fsubtraction-of-slope-in-gradient-descent%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The direction of $nabla f$ is the direction of greatest increase of $f$. (This can be shown by writing out the directional derivative of $f$ using the chain rule, and comparing the result with a dot product of the direction vector with the gradient vector.) You want to go toward the direction of greatest decrease, so move along $-nabla f$.
$endgroup$
$begingroup$
Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
$endgroup$
– karthik A
Sep 7 '12 at 5:34
$begingroup$
Okay I figured it out. Thanks !!
$endgroup$
– karthik A
Sep 7 '12 at 5:41
add a comment |
$begingroup$
The direction of $nabla f$ is the direction of greatest increase of $f$. (This can be shown by writing out the directional derivative of $f$ using the chain rule, and comparing the result with a dot product of the direction vector with the gradient vector.) You want to go toward the direction of greatest decrease, so move along $-nabla f$.
$endgroup$
$begingroup$
Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
$endgroup$
– karthik A
Sep 7 '12 at 5:34
$begingroup$
Okay I figured it out. Thanks !!
$endgroup$
– karthik A
Sep 7 '12 at 5:41
add a comment |
$begingroup$
The direction of $nabla f$ is the direction of greatest increase of $f$. (This can be shown by writing out the directional derivative of $f$ using the chain rule, and comparing the result with a dot product of the direction vector with the gradient vector.) You want to go toward the direction of greatest decrease, so move along $-nabla f$.
$endgroup$
The direction of $nabla f$ is the direction of greatest increase of $f$. (This can be shown by writing out the directional derivative of $f$ using the chain rule, and comparing the result with a dot product of the direction vector with the gradient vector.) You want to go toward the direction of greatest decrease, so move along $-nabla f$.
answered Sep 7 '12 at 5:28
TunococTunococ
8,2261931
8,2261931
$begingroup$
Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
$endgroup$
– karthik A
Sep 7 '12 at 5:34
$begingroup$
Okay I figured it out. Thanks !!
$endgroup$
– karthik A
Sep 7 '12 at 5:41
add a comment |
$begingroup$
Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
$endgroup$
– karthik A
Sep 7 '12 at 5:34
$begingroup$
Okay I figured it out. Thanks !!
$endgroup$
– karthik A
Sep 7 '12 at 5:41
$begingroup$
Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
$endgroup$
– karthik A
Sep 7 '12 at 5:34
$begingroup$
Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
$endgroup$
– karthik A
Sep 7 '12 at 5:34
$begingroup$
Okay I figured it out. Thanks !!
$endgroup$
– karthik A
Sep 7 '12 at 5:41
$begingroup$
Okay I figured it out. Thanks !!
$endgroup$
– karthik A
Sep 7 '12 at 5:41
add a comment |
$begingroup$
But still, why is it MINUS?
Because your goal is to MINIMIZE J(θ).
So, in the maximization problem, you need to ADD alpha * slope.
$endgroup$
add a comment |
$begingroup$
But still, why is it MINUS?
Because your goal is to MINIMIZE J(θ).
So, in the maximization problem, you need to ADD alpha * slope.
$endgroup$
add a comment |
$begingroup$
But still, why is it MINUS?
Because your goal is to MINIMIZE J(θ).
So, in the maximization problem, you need to ADD alpha * slope.
$endgroup$
But still, why is it MINUS?
Because your goal is to MINIMIZE J(θ).
So, in the maximization problem, you need to ADD alpha * slope.
answered Oct 20 '17 at 1:10
AaronAaron
1934
1934
add a comment |
add a comment |
$begingroup$
My understanding of this minus sign is about the assumption of SGD. The assumption is that the objective function $J$ is a convex function where has the optimal solution (global, local) at $theta_{*}$ where the partial derivatives are $0$, so that's why the parameters are updated by moving to the reverse direction of function "changing faster", because SGD wants $J$ changes slower and slower and gradually hits the convex point.
$endgroup$
1
$begingroup$
Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
$endgroup$
– dantopa
Dec 27 '18 at 5:56
add a comment |
$begingroup$
My understanding of this minus sign is about the assumption of SGD. The assumption is that the objective function $J$ is a convex function where has the optimal solution (global, local) at $theta_{*}$ where the partial derivatives are $0$, so that's why the parameters are updated by moving to the reverse direction of function "changing faster", because SGD wants $J$ changes slower and slower and gradually hits the convex point.
$endgroup$
1
$begingroup$
Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
$endgroup$
– dantopa
Dec 27 '18 at 5:56
add a comment |
$begingroup$
My understanding of this minus sign is about the assumption of SGD. The assumption is that the objective function $J$ is a convex function where has the optimal solution (global, local) at $theta_{*}$ where the partial derivatives are $0$, so that's why the parameters are updated by moving to the reverse direction of function "changing faster", because SGD wants $J$ changes slower and slower and gradually hits the convex point.
$endgroup$
My understanding of this minus sign is about the assumption of SGD. The assumption is that the objective function $J$ is a convex function where has the optimal solution (global, local) at $theta_{*}$ where the partial derivatives are $0$, so that's why the parameters are updated by moving to the reverse direction of function "changing faster", because SGD wants $J$ changes slower and slower and gradually hits the convex point.
edited Dec 27 '18 at 5:56
Avraham
2,5271131
2,5271131
answered Dec 27 '18 at 4:53
Charles ChowCharles Chow
1012
1012
1
$begingroup$
Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
$endgroup$
– dantopa
Dec 27 '18 at 5:56
add a comment |
1
$begingroup$
Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
$endgroup$
– dantopa
Dec 27 '18 at 5:56
1
1
$begingroup$
Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
$endgroup$
– dantopa
Dec 27 '18 at 5:56
$begingroup$
Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
$endgroup$
– dantopa
Dec 27 '18 at 5:56
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f192244%2fsubtraction-of-slope-in-gradient-descent%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown