Chain rule for matrix - I'm confused











up vote
9
down vote

favorite
6












I googled around and searched inside the forum but I'm still confused about a problem.



I have 2 matrix functions $f,g : mathbb{R}^{n times n} times mathbb{R}^{a times b} rightarrow mathbb{R}^{n times n}$. Starting from this, I have the following expression:



$$ t(Q, X, Y) = text{tr}(f(g(Q, X),Y))$$



where $text{tr}$ is the trace operator and $X, Y in mathbb{R}^{a times b}$ and $Q in mathbb{R}^{n times n}$.



How do I evaluate $frac{partial t(Q,X, Y)}{partial X}$ and $frac{partial t(Q,X, Y)}{partial Y}$?



I mean, I would like to know how to correctly apply the chain rule.



* Addition *



I will try to give more information about my problem.
Suppose that $a = b = n$ and that $f(A,B) = AB$ and $g(A,B) = BA + AB$ (actually this is only an example of possible functions $f$ and $g$).
Then I have that:



$$f(g(Q,X),Y) = f(XQ + QX, Y) = XQY + QXY$$



Then, using matrix calculus (hoping there are no error!), I have that:



$$ frac{partial t(Q,X,Y)}{partial X} = QY + YQ\
frac{partial t(Q,X,Y)}{partial Y} = XQ + QX$$



I can easily compute the result if I know the form of $f$ and $g$. Notice that the derivatives I obtained are in a matrix form.
But actually I need to deal with generic functions. And for this reason I need to use the chain rule.
The problem is that the chain rule formulas I know are helpful to derive the derivative with respect to a certain element of the matrix $X$ (or $Y$). In this case, I'm not able to have a matrix form of the derivatives.



So, my question is... there is a chain rule formula I'm missing which let me describe these derivatives in a matrix form?



* Addition 2 *



The chain rule formulas that I know are reported here http://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix_identities (see the 7th row of the table)










share|cite|improve this question
























  • It might help to consider the function $h(Q,X,Y) = (g(Q,X),Y)$, whose differential is easily calculated. Then $t(Q,X,Y) = l circ f circ h (Q,X,Y)$, where $l$ is the trace
    – wspin
    Dec 17 '12 at 20:49












  • I know this formula (en.wikipedia.org/wiki/… - it is the 7th formula into the table). This is performed on each $X_{i,j}$ separately! I would like to know the formula with respect to all $X$.
    – the_candyman
    Dec 18 '12 at 14:34












  • I'm going to try to give more details in my question
    – the_candyman
    Dec 19 '12 at 20:11

















up vote
9
down vote

favorite
6












I googled around and searched inside the forum but I'm still confused about a problem.



I have 2 matrix functions $f,g : mathbb{R}^{n times n} times mathbb{R}^{a times b} rightarrow mathbb{R}^{n times n}$. Starting from this, I have the following expression:



$$ t(Q, X, Y) = text{tr}(f(g(Q, X),Y))$$



where $text{tr}$ is the trace operator and $X, Y in mathbb{R}^{a times b}$ and $Q in mathbb{R}^{n times n}$.



How do I evaluate $frac{partial t(Q,X, Y)}{partial X}$ and $frac{partial t(Q,X, Y)}{partial Y}$?



I mean, I would like to know how to correctly apply the chain rule.



* Addition *



I will try to give more information about my problem.
Suppose that $a = b = n$ and that $f(A,B) = AB$ and $g(A,B) = BA + AB$ (actually this is only an example of possible functions $f$ and $g$).
Then I have that:



$$f(g(Q,X),Y) = f(XQ + QX, Y) = XQY + QXY$$



Then, using matrix calculus (hoping there are no error!), I have that:



$$ frac{partial t(Q,X,Y)}{partial X} = QY + YQ\
frac{partial t(Q,X,Y)}{partial Y} = XQ + QX$$



I can easily compute the result if I know the form of $f$ and $g$. Notice that the derivatives I obtained are in a matrix form.
But actually I need to deal with generic functions. And for this reason I need to use the chain rule.
The problem is that the chain rule formulas I know are helpful to derive the derivative with respect to a certain element of the matrix $X$ (or $Y$). In this case, I'm not able to have a matrix form of the derivatives.



So, my question is... there is a chain rule formula I'm missing which let me describe these derivatives in a matrix form?



* Addition 2 *



The chain rule formulas that I know are reported here http://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix_identities (see the 7th row of the table)










share|cite|improve this question
























  • It might help to consider the function $h(Q,X,Y) = (g(Q,X),Y)$, whose differential is easily calculated. Then $t(Q,X,Y) = l circ f circ h (Q,X,Y)$, where $l$ is the trace
    – wspin
    Dec 17 '12 at 20:49












  • I know this formula (en.wikipedia.org/wiki/… - it is the 7th formula into the table). This is performed on each $X_{i,j}$ separately! I would like to know the formula with respect to all $X$.
    – the_candyman
    Dec 18 '12 at 14:34












  • I'm going to try to give more details in my question
    – the_candyman
    Dec 19 '12 at 20:11















up vote
9
down vote

favorite
6









up vote
9
down vote

favorite
6






6





I googled around and searched inside the forum but I'm still confused about a problem.



I have 2 matrix functions $f,g : mathbb{R}^{n times n} times mathbb{R}^{a times b} rightarrow mathbb{R}^{n times n}$. Starting from this, I have the following expression:



$$ t(Q, X, Y) = text{tr}(f(g(Q, X),Y))$$



where $text{tr}$ is the trace operator and $X, Y in mathbb{R}^{a times b}$ and $Q in mathbb{R}^{n times n}$.



How do I evaluate $frac{partial t(Q,X, Y)}{partial X}$ and $frac{partial t(Q,X, Y)}{partial Y}$?



I mean, I would like to know how to correctly apply the chain rule.



* Addition *



I will try to give more information about my problem.
Suppose that $a = b = n$ and that $f(A,B) = AB$ and $g(A,B) = BA + AB$ (actually this is only an example of possible functions $f$ and $g$).
Then I have that:



$$f(g(Q,X),Y) = f(XQ + QX, Y) = XQY + QXY$$



Then, using matrix calculus (hoping there are no error!), I have that:



$$ frac{partial t(Q,X,Y)}{partial X} = QY + YQ\
frac{partial t(Q,X,Y)}{partial Y} = XQ + QX$$



I can easily compute the result if I know the form of $f$ and $g$. Notice that the derivatives I obtained are in a matrix form.
But actually I need to deal with generic functions. And for this reason I need to use the chain rule.
The problem is that the chain rule formulas I know are helpful to derive the derivative with respect to a certain element of the matrix $X$ (or $Y$). In this case, I'm not able to have a matrix form of the derivatives.



So, my question is... there is a chain rule formula I'm missing which let me describe these derivatives in a matrix form?



* Addition 2 *



The chain rule formulas that I know are reported here http://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix_identities (see the 7th row of the table)










share|cite|improve this question















I googled around and searched inside the forum but I'm still confused about a problem.



I have 2 matrix functions $f,g : mathbb{R}^{n times n} times mathbb{R}^{a times b} rightarrow mathbb{R}^{n times n}$. Starting from this, I have the following expression:



$$ t(Q, X, Y) = text{tr}(f(g(Q, X),Y))$$



where $text{tr}$ is the trace operator and $X, Y in mathbb{R}^{a times b}$ and $Q in mathbb{R}^{n times n}$.



How do I evaluate $frac{partial t(Q,X, Y)}{partial X}$ and $frac{partial t(Q,X, Y)}{partial Y}$?



I mean, I would like to know how to correctly apply the chain rule.



* Addition *



I will try to give more information about my problem.
Suppose that $a = b = n$ and that $f(A,B) = AB$ and $g(A,B) = BA + AB$ (actually this is only an example of possible functions $f$ and $g$).
Then I have that:



$$f(g(Q,X),Y) = f(XQ + QX, Y) = XQY + QXY$$



Then, using matrix calculus (hoping there are no error!), I have that:



$$ frac{partial t(Q,X,Y)}{partial X} = QY + YQ\
frac{partial t(Q,X,Y)}{partial Y} = XQ + QX$$



I can easily compute the result if I know the form of $f$ and $g$. Notice that the derivatives I obtained are in a matrix form.
But actually I need to deal with generic functions. And for this reason I need to use the chain rule.
The problem is that the chain rule formulas I know are helpful to derive the derivative with respect to a certain element of the matrix $X$ (or $Y$). In this case, I'm not able to have a matrix form of the derivatives.



So, my question is... there is a chain rule formula I'm missing which let me describe these derivatives in a matrix form?



* Addition 2 *



The chain rule formulas that I know are reported here http://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix_identities (see the 7th row of the table)







matrices derivatives






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Nov 23 at 12:25

























asked Dec 17 '12 at 14:03









the_candyman

8,69822044




8,69822044












  • It might help to consider the function $h(Q,X,Y) = (g(Q,X),Y)$, whose differential is easily calculated. Then $t(Q,X,Y) = l circ f circ h (Q,X,Y)$, where $l$ is the trace
    – wspin
    Dec 17 '12 at 20:49












  • I know this formula (en.wikipedia.org/wiki/… - it is the 7th formula into the table). This is performed on each $X_{i,j}$ separately! I would like to know the formula with respect to all $X$.
    – the_candyman
    Dec 18 '12 at 14:34












  • I'm going to try to give more details in my question
    – the_candyman
    Dec 19 '12 at 20:11




















  • It might help to consider the function $h(Q,X,Y) = (g(Q,X),Y)$, whose differential is easily calculated. Then $t(Q,X,Y) = l circ f circ h (Q,X,Y)$, where $l$ is the trace
    – wspin
    Dec 17 '12 at 20:49












  • I know this formula (en.wikipedia.org/wiki/… - it is the 7th formula into the table). This is performed on each $X_{i,j}$ separately! I would like to know the formula with respect to all $X$.
    – the_candyman
    Dec 18 '12 at 14:34












  • I'm going to try to give more details in my question
    – the_candyman
    Dec 19 '12 at 20:11


















It might help to consider the function $h(Q,X,Y) = (g(Q,X),Y)$, whose differential is easily calculated. Then $t(Q,X,Y) = l circ f circ h (Q,X,Y)$, where $l$ is the trace
– wspin
Dec 17 '12 at 20:49






It might help to consider the function $h(Q,X,Y) = (g(Q,X),Y)$, whose differential is easily calculated. Then $t(Q,X,Y) = l circ f circ h (Q,X,Y)$, where $l$ is the trace
– wspin
Dec 17 '12 at 20:49














I know this formula (en.wikipedia.org/wiki/… - it is the 7th formula into the table). This is performed on each $X_{i,j}$ separately! I would like to know the formula with respect to all $X$.
– the_candyman
Dec 18 '12 at 14:34






I know this formula (en.wikipedia.org/wiki/… - it is the 7th formula into the table). This is performed on each $X_{i,j}$ separately! I would like to know the formula with respect to all $X$.
– the_candyman
Dec 18 '12 at 14:34














I'm going to try to give more details in my question
– the_candyman
Dec 19 '12 at 20:11






I'm going to try to give more details in my question
– the_candyman
Dec 19 '12 at 20:11












1 Answer
1






active

oldest

votes

















up vote
2
down vote













Let's use uppercase letters for the matrix variables, so they're easy to distinguish from the lowercase scalars
$$eqalign{
G &= G(Q,X) cr
F &= F(G,Y) cr
t &= {rm tr}(F) = I:F cr
dt &= I:dF cr
}$$
First, let's calculate the differential and gradient wrt $Y$
$$eqalign{
dt &= I:Big(frac{partial F}{partial Y}:dYBig) cr
frac{partial t}{partial Y} &= I:frac{partial F}{partial Y} cr
}$$
And now wrt $X$
$$eqalign{
dt &= I:Big(frac{partial F}{partial G}:frac{partial G}{partial X}:dXBig) cr
frac{partial t}{partial X} &= I:frac{partial F}{partial G}:frac{partial G}{partial X} crcr
}$$
Note that the matrix-by-matrix gradients are 4th order tensors. For example, here is one of the gradients in component form
$$eqalign{
Big(frac{partial G}{partial X}Big)_{ijkl} = frac{partial G_{ij}}{partial X_{kl}}crcr
}$$



Also note that colons are used to denote the double-contraction product, e.g.
$$Big(frac{partial F}{partial G}:frac{partial G}{partial X}Big)_{ijkl}
= frac{partial F_{ij}}{partial G_{mn}},frac{partial G_{mn}}{partial X_{kl}}$$






share|cite|improve this answer























    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f260710%2fchain-rule-for-matrix-im-confused%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote













    Let's use uppercase letters for the matrix variables, so they're easy to distinguish from the lowercase scalars
    $$eqalign{
    G &= G(Q,X) cr
    F &= F(G,Y) cr
    t &= {rm tr}(F) = I:F cr
    dt &= I:dF cr
    }$$
    First, let's calculate the differential and gradient wrt $Y$
    $$eqalign{
    dt &= I:Big(frac{partial F}{partial Y}:dYBig) cr
    frac{partial t}{partial Y} &= I:frac{partial F}{partial Y} cr
    }$$
    And now wrt $X$
    $$eqalign{
    dt &= I:Big(frac{partial F}{partial G}:frac{partial G}{partial X}:dXBig) cr
    frac{partial t}{partial X} &= I:frac{partial F}{partial G}:frac{partial G}{partial X} crcr
    }$$
    Note that the matrix-by-matrix gradients are 4th order tensors. For example, here is one of the gradients in component form
    $$eqalign{
    Big(frac{partial G}{partial X}Big)_{ijkl} = frac{partial G_{ij}}{partial X_{kl}}crcr
    }$$



    Also note that colons are used to denote the double-contraction product, e.g.
    $$Big(frac{partial F}{partial G}:frac{partial G}{partial X}Big)_{ijkl}
    = frac{partial F_{ij}}{partial G_{mn}},frac{partial G_{mn}}{partial X_{kl}}$$






    share|cite|improve this answer



























      up vote
      2
      down vote













      Let's use uppercase letters for the matrix variables, so they're easy to distinguish from the lowercase scalars
      $$eqalign{
      G &= G(Q,X) cr
      F &= F(G,Y) cr
      t &= {rm tr}(F) = I:F cr
      dt &= I:dF cr
      }$$
      First, let's calculate the differential and gradient wrt $Y$
      $$eqalign{
      dt &= I:Big(frac{partial F}{partial Y}:dYBig) cr
      frac{partial t}{partial Y} &= I:frac{partial F}{partial Y} cr
      }$$
      And now wrt $X$
      $$eqalign{
      dt &= I:Big(frac{partial F}{partial G}:frac{partial G}{partial X}:dXBig) cr
      frac{partial t}{partial X} &= I:frac{partial F}{partial G}:frac{partial G}{partial X} crcr
      }$$
      Note that the matrix-by-matrix gradients are 4th order tensors. For example, here is one of the gradients in component form
      $$eqalign{
      Big(frac{partial G}{partial X}Big)_{ijkl} = frac{partial G_{ij}}{partial X_{kl}}crcr
      }$$



      Also note that colons are used to denote the double-contraction product, e.g.
      $$Big(frac{partial F}{partial G}:frac{partial G}{partial X}Big)_{ijkl}
      = frac{partial F_{ij}}{partial G_{mn}},frac{partial G_{mn}}{partial X_{kl}}$$






      share|cite|improve this answer

























        up vote
        2
        down vote










        up vote
        2
        down vote









        Let's use uppercase letters for the matrix variables, so they're easy to distinguish from the lowercase scalars
        $$eqalign{
        G &= G(Q,X) cr
        F &= F(G,Y) cr
        t &= {rm tr}(F) = I:F cr
        dt &= I:dF cr
        }$$
        First, let's calculate the differential and gradient wrt $Y$
        $$eqalign{
        dt &= I:Big(frac{partial F}{partial Y}:dYBig) cr
        frac{partial t}{partial Y} &= I:frac{partial F}{partial Y} cr
        }$$
        And now wrt $X$
        $$eqalign{
        dt &= I:Big(frac{partial F}{partial G}:frac{partial G}{partial X}:dXBig) cr
        frac{partial t}{partial X} &= I:frac{partial F}{partial G}:frac{partial G}{partial X} crcr
        }$$
        Note that the matrix-by-matrix gradients are 4th order tensors. For example, here is one of the gradients in component form
        $$eqalign{
        Big(frac{partial G}{partial X}Big)_{ijkl} = frac{partial G_{ij}}{partial X_{kl}}crcr
        }$$



        Also note that colons are used to denote the double-contraction product, e.g.
        $$Big(frac{partial F}{partial G}:frac{partial G}{partial X}Big)_{ijkl}
        = frac{partial F_{ij}}{partial G_{mn}},frac{partial G_{mn}}{partial X_{kl}}$$






        share|cite|improve this answer














        Let's use uppercase letters for the matrix variables, so they're easy to distinguish from the lowercase scalars
        $$eqalign{
        G &= G(Q,X) cr
        F &= F(G,Y) cr
        t &= {rm tr}(F) = I:F cr
        dt &= I:dF cr
        }$$
        First, let's calculate the differential and gradient wrt $Y$
        $$eqalign{
        dt &= I:Big(frac{partial F}{partial Y}:dYBig) cr
        frac{partial t}{partial Y} &= I:frac{partial F}{partial Y} cr
        }$$
        And now wrt $X$
        $$eqalign{
        dt &= I:Big(frac{partial F}{partial G}:frac{partial G}{partial X}:dXBig) cr
        frac{partial t}{partial X} &= I:frac{partial F}{partial G}:frac{partial G}{partial X} crcr
        }$$
        Note that the matrix-by-matrix gradients are 4th order tensors. For example, here is one of the gradients in component form
        $$eqalign{
        Big(frac{partial G}{partial X}Big)_{ijkl} = frac{partial G_{ij}}{partial X_{kl}}crcr
        }$$



        Also note that colons are used to denote the double-contraction product, e.g.
        $$Big(frac{partial F}{partial G}:frac{partial G}{partial X}Big)_{ijkl}
        = frac{partial F_{ij}}{partial G_{mn}},frac{partial G_{mn}}{partial X_{kl}}$$







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited Aug 9 '17 at 12:49

























        answered Aug 7 '17 at 14:39









        greg

        7,4651721




        7,4651721






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f260710%2fchain-rule-for-matrix-im-confused%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Index of /

            Tribalistas

            Filisteus