Subtraction of slope in gradient descent












3












$begingroup$


In the gradient descent algorithm say $f(x)$ (quadratic function) is the objective function. SO the algorithm is defined as



$$x_i = x_i - afrac{partial f(x)}{partial x_i}$$



I Just dont quite understand the meaning of doing a subtraction. I'm intuitively able to follow that we are going in the direction of steepest descent but have some questions. The derivative of $f(x)$ is going to give us the equation of a line. So when we substitute the value of $x_i$ in $f'(x)$ , what we get is a $y$ coordinate: $y_i$. So I dont understand how we subtract a $y$ coordinate from an $x$ coordinate ?










share|cite|improve this question











$endgroup$

















    3












    $begingroup$


    In the gradient descent algorithm say $f(x)$ (quadratic function) is the objective function. SO the algorithm is defined as



    $$x_i = x_i - afrac{partial f(x)}{partial x_i}$$



    I Just dont quite understand the meaning of doing a subtraction. I'm intuitively able to follow that we are going in the direction of steepest descent but have some questions. The derivative of $f(x)$ is going to give us the equation of a line. So when we substitute the value of $x_i$ in $f'(x)$ , what we get is a $y$ coordinate: $y_i$. So I dont understand how we subtract a $y$ coordinate from an $x$ coordinate ?










    share|cite|improve this question











    $endgroup$















      3












      3








      3


      4



      $begingroup$


      In the gradient descent algorithm say $f(x)$ (quadratic function) is the objective function. SO the algorithm is defined as



      $$x_i = x_i - afrac{partial f(x)}{partial x_i}$$



      I Just dont quite understand the meaning of doing a subtraction. I'm intuitively able to follow that we are going in the direction of steepest descent but have some questions. The derivative of $f(x)$ is going to give us the equation of a line. So when we substitute the value of $x_i$ in $f'(x)$ , what we get is a $y$ coordinate: $y_i$. So I dont understand how we subtract a $y$ coordinate from an $x$ coordinate ?










      share|cite|improve this question











      $endgroup$




      In the gradient descent algorithm say $f(x)$ (quadratic function) is the objective function. SO the algorithm is defined as



      $$x_i = x_i - afrac{partial f(x)}{partial x_i}$$



      I Just dont quite understand the meaning of doing a subtraction. I'm intuitively able to follow that we are going in the direction of steepest descent but have some questions. The derivative of $f(x)$ is going to give us the equation of a line. So when we substitute the value of $x_i$ in $f'(x)$ , what we get is a $y$ coordinate: $y_i$. So I dont understand how we subtract a $y$ coordinate from an $x$ coordinate ?







      calculus optimization machine-learning






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Sep 7 '12 at 15:40









      Michael Hardy

      1




      1










      asked Sep 7 '12 at 5:23









      karthik Akarthik A

      1396




      1396






















          3 Answers
          3






          active

          oldest

          votes


















          4












          $begingroup$

          The direction of $nabla f$ is the direction of greatest increase of $f$. (This can be shown by writing out the directional derivative of $f$ using the chain rule, and comparing the result with a dot product of the direction vector with the gradient vector.) You want to go toward the direction of greatest decrease, so move along $-nabla f$.






          share|cite|improve this answer









          $endgroup$













          • $begingroup$
            Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
            $endgroup$
            – karthik A
            Sep 7 '12 at 5:34












          • $begingroup$
            Okay I figured it out. Thanks !!
            $endgroup$
            – karthik A
            Sep 7 '12 at 5:41



















          2












          $begingroup$

          enter image description here



          But still, why is it MINUS?



          Because your goal is to MINIMIZE J(θ).



          enter image description here



          So, in the maximization problem, you need to ADD alpha * slope.






          share|cite|improve this answer









          $endgroup$





















            0












            $begingroup$

            My understanding of this minus sign is about the assumption of SGD. The assumption is that the objective function $J$ is a convex function where has the optimal solution (global, local) at $theta_{*}$ where the partial derivatives are $0$, so that's why the parameters are updated by moving to the reverse direction of function "changing faster", because SGD wants $J$ changes slower and slower and gradually hits the convex point.






            share|cite|improve this answer











            $endgroup$









            • 1




              $begingroup$
              Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
              $endgroup$
              – dantopa
              Dec 27 '18 at 5:56











            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "69"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f192244%2fsubtraction-of-slope-in-gradient-descent%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            4












            $begingroup$

            The direction of $nabla f$ is the direction of greatest increase of $f$. (This can be shown by writing out the directional derivative of $f$ using the chain rule, and comparing the result with a dot product of the direction vector with the gradient vector.) You want to go toward the direction of greatest decrease, so move along $-nabla f$.






            share|cite|improve this answer









            $endgroup$













            • $begingroup$
              Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
              $endgroup$
              – karthik A
              Sep 7 '12 at 5:34












            • $begingroup$
              Okay I figured it out. Thanks !!
              $endgroup$
              – karthik A
              Sep 7 '12 at 5:41
















            4












            $begingroup$

            The direction of $nabla f$ is the direction of greatest increase of $f$. (This can be shown by writing out the directional derivative of $f$ using the chain rule, and comparing the result with a dot product of the direction vector with the gradient vector.) You want to go toward the direction of greatest decrease, so move along $-nabla f$.






            share|cite|improve this answer









            $endgroup$













            • $begingroup$
              Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
              $endgroup$
              – karthik A
              Sep 7 '12 at 5:34












            • $begingroup$
              Okay I figured it out. Thanks !!
              $endgroup$
              – karthik A
              Sep 7 '12 at 5:41














            4












            4








            4





            $begingroup$

            The direction of $nabla f$ is the direction of greatest increase of $f$. (This can be shown by writing out the directional derivative of $f$ using the chain rule, and comparing the result with a dot product of the direction vector with the gradient vector.) You want to go toward the direction of greatest decrease, so move along $-nabla f$.






            share|cite|improve this answer









            $endgroup$



            The direction of $nabla f$ is the direction of greatest increase of $f$. (This can be shown by writing out the directional derivative of $f$ using the chain rule, and comparing the result with a dot product of the direction vector with the gradient vector.) You want to go toward the direction of greatest decrease, so move along $-nabla f$.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered Sep 7 '12 at 5:28









            TunococTunococ

            8,2261931




            8,2261931












            • $begingroup$
              Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
              $endgroup$
              – karthik A
              Sep 7 '12 at 5:34












            • $begingroup$
              Okay I figured it out. Thanks !!
              $endgroup$
              – karthik A
              Sep 7 '12 at 5:41


















            • $begingroup$
              Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
              $endgroup$
              – karthik A
              Sep 7 '12 at 5:34












            • $begingroup$
              Okay I figured it out. Thanks !!
              $endgroup$
              – karthik A
              Sep 7 '12 at 5:41
















            $begingroup$
            Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
            $endgroup$
            – karthik A
            Sep 7 '12 at 5:34






            $begingroup$
            Hi yes , I was able to get the general Idea. So $nabla f$ gives us the equation of a straight line. And when we substitute the value of $X_i$ in that straight line equation we get a yi coordinate . So are we subtracting this yi coordinate from $X_i$ which is an X coordinate?
            $endgroup$
            – karthik A
            Sep 7 '12 at 5:34














            $begingroup$
            Okay I figured it out. Thanks !!
            $endgroup$
            – karthik A
            Sep 7 '12 at 5:41




            $begingroup$
            Okay I figured it out. Thanks !!
            $endgroup$
            – karthik A
            Sep 7 '12 at 5:41











            2












            $begingroup$

            enter image description here



            But still, why is it MINUS?



            Because your goal is to MINIMIZE J(θ).



            enter image description here



            So, in the maximization problem, you need to ADD alpha * slope.






            share|cite|improve this answer









            $endgroup$


















              2












              $begingroup$

              enter image description here



              But still, why is it MINUS?



              Because your goal is to MINIMIZE J(θ).



              enter image description here



              So, in the maximization problem, you need to ADD alpha * slope.






              share|cite|improve this answer









              $endgroup$
















                2












                2








                2





                $begingroup$

                enter image description here



                But still, why is it MINUS?



                Because your goal is to MINIMIZE J(θ).



                enter image description here



                So, in the maximization problem, you need to ADD alpha * slope.






                share|cite|improve this answer









                $endgroup$



                enter image description here



                But still, why is it MINUS?



                Because your goal is to MINIMIZE J(θ).



                enter image description here



                So, in the maximization problem, you need to ADD alpha * slope.







                share|cite|improve this answer












                share|cite|improve this answer



                share|cite|improve this answer










                answered Oct 20 '17 at 1:10









                AaronAaron

                1934




                1934























                    0












                    $begingroup$

                    My understanding of this minus sign is about the assumption of SGD. The assumption is that the objective function $J$ is a convex function where has the optimal solution (global, local) at $theta_{*}$ where the partial derivatives are $0$, so that's why the parameters are updated by moving to the reverse direction of function "changing faster", because SGD wants $J$ changes slower and slower and gradually hits the convex point.






                    share|cite|improve this answer











                    $endgroup$









                    • 1




                      $begingroup$
                      Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
                      $endgroup$
                      – dantopa
                      Dec 27 '18 at 5:56
















                    0












                    $begingroup$

                    My understanding of this minus sign is about the assumption of SGD. The assumption is that the objective function $J$ is a convex function where has the optimal solution (global, local) at $theta_{*}$ where the partial derivatives are $0$, so that's why the parameters are updated by moving to the reverse direction of function "changing faster", because SGD wants $J$ changes slower and slower and gradually hits the convex point.






                    share|cite|improve this answer











                    $endgroup$









                    • 1




                      $begingroup$
                      Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
                      $endgroup$
                      – dantopa
                      Dec 27 '18 at 5:56














                    0












                    0








                    0





                    $begingroup$

                    My understanding of this minus sign is about the assumption of SGD. The assumption is that the objective function $J$ is a convex function where has the optimal solution (global, local) at $theta_{*}$ where the partial derivatives are $0$, so that's why the parameters are updated by moving to the reverse direction of function "changing faster", because SGD wants $J$ changes slower and slower and gradually hits the convex point.






                    share|cite|improve this answer











                    $endgroup$



                    My understanding of this minus sign is about the assumption of SGD. The assumption is that the objective function $J$ is a convex function where has the optimal solution (global, local) at $theta_{*}$ where the partial derivatives are $0$, so that's why the parameters are updated by moving to the reverse direction of function "changing faster", because SGD wants $J$ changes slower and slower and gradually hits the convex point.







                    share|cite|improve this answer














                    share|cite|improve this answer



                    share|cite|improve this answer








                    edited Dec 27 '18 at 5:56









                    Avraham

                    2,5271131




                    2,5271131










                    answered Dec 27 '18 at 4:53









                    Charles ChowCharles Chow

                    1012




                    1012








                    • 1




                      $begingroup$
                      Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
                      $endgroup$
                      – dantopa
                      Dec 27 '18 at 5:56














                    • 1




                      $begingroup$
                      Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
                      $endgroup$
                      – dantopa
                      Dec 27 '18 at 5:56








                    1




                    1




                    $begingroup$
                    Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
                    $endgroup$
                    – dantopa
                    Dec 27 '18 at 5:56




                    $begingroup$
                    Welcome to Mathematics Stack Exchange community! The quick tour (math.stackexchange.com/tour) will help you get the most benefit from your time here. Also, please use MathJax for your equations. My favorite reference is math.meta.stackexchange.com/questions/5020/….
                    $endgroup$
                    – dantopa
                    Dec 27 '18 at 5:56


















                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Mathematics Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f192244%2fsubtraction-of-slope-in-gradient-descent%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How do I know what Microsoft account the skydrive app is syncing to?

                    When does type information flow backwards in C++?

                    Grease: Live!