Is my back propagation math correct?












2












$begingroup$


I have been working on programming a feed forward neural network that uses stochastic gradient descent and I am still a little confused on all the calculus. To make sure I have the math correct, I am using the following network as an example: Neural network image



The equation for the output of the network should be the following I believe ( s(x) is sigmoid ):



$$
sleft(xright) = frac{1}{1+e^{-x}}
$$

$$
O=sleft(sleft(sleft(Iw_1+1right)w_2+1right)w_4+sleft(sleft(Iw_1+1right)w_3+1right)w_5+1right)
$$



When I did the math to get the derivative of O with respect to w1, I got: ( d(x) is the derivative of sigmoid )
$$
dleft(xright)=sleft(xright)left(1-sleft(xright)right)
$$



$$
dleft(sleft(sleft(Iw_1+1right)w_2+1right)w_4+sleft(sleft(Iw_1+1right)w_3+1right)w_5+1right)cdotleft(left(dleft(sleft(Iw_1+1right)w_2+1right)cdot dleft(Iw_1+1right)cdot Iright)+left(dleft(sleft(Iw_1+1right)w_3+1right)cdot dleft(Iw_1+1right)cdot Iright)right)
$$



I used https://www.desmos.com/calculator to see if it got the same answer for the derivative of O with respect to w1. With all the weights set to 0.5, all the biases set to 1, and the input set to 1, desmos said that the derivative was 0.0014291881022. But my equation gave 0.00571675240882. Is there a mistake somewhere in my math or is some weird thing desmos does? Sorry if I did anything simple wrong or messed up the notation. I am still very new to calculus.










share|cite|improve this question









$endgroup$

















    2












    $begingroup$


    I have been working on programming a feed forward neural network that uses stochastic gradient descent and I am still a little confused on all the calculus. To make sure I have the math correct, I am using the following network as an example: Neural network image



    The equation for the output of the network should be the following I believe ( s(x) is sigmoid ):



    $$
    sleft(xright) = frac{1}{1+e^{-x}}
    $$

    $$
    O=sleft(sleft(sleft(Iw_1+1right)w_2+1right)w_4+sleft(sleft(Iw_1+1right)w_3+1right)w_5+1right)
    $$



    When I did the math to get the derivative of O with respect to w1, I got: ( d(x) is the derivative of sigmoid )
    $$
    dleft(xright)=sleft(xright)left(1-sleft(xright)right)
    $$



    $$
    dleft(sleft(sleft(Iw_1+1right)w_2+1right)w_4+sleft(sleft(Iw_1+1right)w_3+1right)w_5+1right)cdotleft(left(dleft(sleft(Iw_1+1right)w_2+1right)cdot dleft(Iw_1+1right)cdot Iright)+left(dleft(sleft(Iw_1+1right)w_3+1right)cdot dleft(Iw_1+1right)cdot Iright)right)
    $$



    I used https://www.desmos.com/calculator to see if it got the same answer for the derivative of O with respect to w1. With all the weights set to 0.5, all the biases set to 1, and the input set to 1, desmos said that the derivative was 0.0014291881022. But my equation gave 0.00571675240882. Is there a mistake somewhere in my math or is some weird thing desmos does? Sorry if I did anything simple wrong or messed up the notation. I am still very new to calculus.










    share|cite|improve this question









    $endgroup$















      2












      2








      2


      1



      $begingroup$


      I have been working on programming a feed forward neural network that uses stochastic gradient descent and I am still a little confused on all the calculus. To make sure I have the math correct, I am using the following network as an example: Neural network image



      The equation for the output of the network should be the following I believe ( s(x) is sigmoid ):



      $$
      sleft(xright) = frac{1}{1+e^{-x}}
      $$

      $$
      O=sleft(sleft(sleft(Iw_1+1right)w_2+1right)w_4+sleft(sleft(Iw_1+1right)w_3+1right)w_5+1right)
      $$



      When I did the math to get the derivative of O with respect to w1, I got: ( d(x) is the derivative of sigmoid )
      $$
      dleft(xright)=sleft(xright)left(1-sleft(xright)right)
      $$



      $$
      dleft(sleft(sleft(Iw_1+1right)w_2+1right)w_4+sleft(sleft(Iw_1+1right)w_3+1right)w_5+1right)cdotleft(left(dleft(sleft(Iw_1+1right)w_2+1right)cdot dleft(Iw_1+1right)cdot Iright)+left(dleft(sleft(Iw_1+1right)w_3+1right)cdot dleft(Iw_1+1right)cdot Iright)right)
      $$



      I used https://www.desmos.com/calculator to see if it got the same answer for the derivative of O with respect to w1. With all the weights set to 0.5, all the biases set to 1, and the input set to 1, desmos said that the derivative was 0.0014291881022. But my equation gave 0.00571675240882. Is there a mistake somewhere in my math or is some weird thing desmos does? Sorry if I did anything simple wrong or messed up the notation. I am still very new to calculus.










      share|cite|improve this question









      $endgroup$




      I have been working on programming a feed forward neural network that uses stochastic gradient descent and I am still a little confused on all the calculus. To make sure I have the math correct, I am using the following network as an example: Neural network image



      The equation for the output of the network should be the following I believe ( s(x) is sigmoid ):



      $$
      sleft(xright) = frac{1}{1+e^{-x}}
      $$

      $$
      O=sleft(sleft(sleft(Iw_1+1right)w_2+1right)w_4+sleft(sleft(Iw_1+1right)w_3+1right)w_5+1right)
      $$



      When I did the math to get the derivative of O with respect to w1, I got: ( d(x) is the derivative of sigmoid )
      $$
      dleft(xright)=sleft(xright)left(1-sleft(xright)right)
      $$



      $$
      dleft(sleft(sleft(Iw_1+1right)w_2+1right)w_4+sleft(sleft(Iw_1+1right)w_3+1right)w_5+1right)cdotleft(left(dleft(sleft(Iw_1+1right)w_2+1right)cdot dleft(Iw_1+1right)cdot Iright)+left(dleft(sleft(Iw_1+1right)w_3+1right)cdot dleft(Iw_1+1right)cdot Iright)right)
      $$



      I used https://www.desmos.com/calculator to see if it got the same answer for the derivative of O with respect to w1. With all the weights set to 0.5, all the biases set to 1, and the input set to 1, desmos said that the derivative was 0.0014291881022. But my equation gave 0.00571675240882. Is there a mistake somewhere in my math or is some weird thing desmos does? Sorry if I did anything simple wrong or messed up the notation. I am still very new to calculus.







      calculus partial-derivative neural-networks






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Dec 21 '18 at 5:10









      That_one_guyThat_one_guy

      132




      132






















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          Looks like you're missing some components in the second term. The full expression should be
          $$
          frac{partial O}{partial w_1} =d(s(s(Iw_1+1)w_2+1)w_4+s(s(Iw_1+1)w_3+1)w_5+1)⋅[(mathbf{w_4}d(s(Iw_1+1)w_2+1)⋅mathbf{w_2}d(Iw_1+1)⋅I)+(mathbf{w_5}d(s(Iw_1+1)w_3+1)⋅mathbf{w_3}d(Iw_1+1)⋅I)],
          $$

          with the missing terms in bold.



          Since you set all the weights to 0.5 in your check, you'll see that the missing terms $w_4w_2$ and $w_5w_3$ will both multiply to 0.25.
          And indeed, your answer is off by a factor of 0.25






          share|cite|improve this answer









          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "69"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3048198%2fis-my-back-propagation-math-correct%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1












            $begingroup$

            Looks like you're missing some components in the second term. The full expression should be
            $$
            frac{partial O}{partial w_1} =d(s(s(Iw_1+1)w_2+1)w_4+s(s(Iw_1+1)w_3+1)w_5+1)⋅[(mathbf{w_4}d(s(Iw_1+1)w_2+1)⋅mathbf{w_2}d(Iw_1+1)⋅I)+(mathbf{w_5}d(s(Iw_1+1)w_3+1)⋅mathbf{w_3}d(Iw_1+1)⋅I)],
            $$

            with the missing terms in bold.



            Since you set all the weights to 0.5 in your check, you'll see that the missing terms $w_4w_2$ and $w_5w_3$ will both multiply to 0.25.
            And indeed, your answer is off by a factor of 0.25






            share|cite|improve this answer









            $endgroup$


















              1












              $begingroup$

              Looks like you're missing some components in the second term. The full expression should be
              $$
              frac{partial O}{partial w_1} =d(s(s(Iw_1+1)w_2+1)w_4+s(s(Iw_1+1)w_3+1)w_5+1)⋅[(mathbf{w_4}d(s(Iw_1+1)w_2+1)⋅mathbf{w_2}d(Iw_1+1)⋅I)+(mathbf{w_5}d(s(Iw_1+1)w_3+1)⋅mathbf{w_3}d(Iw_1+1)⋅I)],
              $$

              with the missing terms in bold.



              Since you set all the weights to 0.5 in your check, you'll see that the missing terms $w_4w_2$ and $w_5w_3$ will both multiply to 0.25.
              And indeed, your answer is off by a factor of 0.25






              share|cite|improve this answer









              $endgroup$
















                1












                1








                1





                $begingroup$

                Looks like you're missing some components in the second term. The full expression should be
                $$
                frac{partial O}{partial w_1} =d(s(s(Iw_1+1)w_2+1)w_4+s(s(Iw_1+1)w_3+1)w_5+1)⋅[(mathbf{w_4}d(s(Iw_1+1)w_2+1)⋅mathbf{w_2}d(Iw_1+1)⋅I)+(mathbf{w_5}d(s(Iw_1+1)w_3+1)⋅mathbf{w_3}d(Iw_1+1)⋅I)],
                $$

                with the missing terms in bold.



                Since you set all the weights to 0.5 in your check, you'll see that the missing terms $w_4w_2$ and $w_5w_3$ will both multiply to 0.25.
                And indeed, your answer is off by a factor of 0.25






                share|cite|improve this answer









                $endgroup$



                Looks like you're missing some components in the second term. The full expression should be
                $$
                frac{partial O}{partial w_1} =d(s(s(Iw_1+1)w_2+1)w_4+s(s(Iw_1+1)w_3+1)w_5+1)⋅[(mathbf{w_4}d(s(Iw_1+1)w_2+1)⋅mathbf{w_2}d(Iw_1+1)⋅I)+(mathbf{w_5}d(s(Iw_1+1)w_3+1)⋅mathbf{w_3}d(Iw_1+1)⋅I)],
                $$

                with the missing terms in bold.



                Since you set all the weights to 0.5 in your check, you'll see that the missing terms $w_4w_2$ and $w_5w_3$ will both multiply to 0.25.
                And indeed, your answer is off by a factor of 0.25







                share|cite|improve this answer












                share|cite|improve this answer



                share|cite|improve this answer










                answered Dec 21 '18 at 5:42









                Ben LansdellBen Lansdell

                1388




                1388






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Mathematics Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3048198%2fis-my-back-propagation-math-correct%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Probability when a professor distributes a quiz and homework assignment to a class of n students.

                    Aardman Animations

                    Are they similar matrix