Chain rule for matrix - I'm confused
up vote
9
down vote
favorite
I googled around and searched inside the forum but I'm still confused about a problem.
I have 2 matrix functions $f,g : mathbb{R}^{n times n} times mathbb{R}^{a times b} rightarrow mathbb{R}^{n times n}$. Starting from this, I have the following expression:
$$ t(Q, X, Y) = text{tr}(f(g(Q, X),Y))$$
where $text{tr}$ is the trace operator and $X, Y in mathbb{R}^{a times b}$ and $Q in mathbb{R}^{n times n}$.
How do I evaluate $frac{partial t(Q,X, Y)}{partial X}$ and $frac{partial t(Q,X, Y)}{partial Y}$?
I mean, I would like to know how to correctly apply the chain rule.
* Addition *
I will try to give more information about my problem.
Suppose that $a = b = n$ and that $f(A,B) = AB$ and $g(A,B) = BA + AB$ (actually this is only an example of possible functions $f$ and $g$).
Then I have that:
$$f(g(Q,X),Y) = f(XQ + QX, Y) = XQY + QXY$$
Then, using matrix calculus (hoping there are no error!), I have that:
$$ frac{partial t(Q,X,Y)}{partial X} = QY + YQ\
frac{partial t(Q,X,Y)}{partial Y} = XQ + QX$$
I can easily compute the result if I know the form of $f$ and $g$. Notice that the derivatives I obtained are in a matrix form.
But actually I need to deal with generic functions. And for this reason I need to use the chain rule.
The problem is that the chain rule formulas I know are helpful to derive the derivative with respect to a certain element of the matrix $X$ (or $Y$). In this case, I'm not able to have a matrix form of the derivatives.
So, my question is... there is a chain rule formula I'm missing which let me describe these derivatives in a matrix form?
* Addition 2 *
The chain rule formulas that I know are reported here http://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix_identities (see the 7th row of the table)
matrices derivatives
add a comment |
up vote
9
down vote
favorite
I googled around and searched inside the forum but I'm still confused about a problem.
I have 2 matrix functions $f,g : mathbb{R}^{n times n} times mathbb{R}^{a times b} rightarrow mathbb{R}^{n times n}$. Starting from this, I have the following expression:
$$ t(Q, X, Y) = text{tr}(f(g(Q, X),Y))$$
where $text{tr}$ is the trace operator and $X, Y in mathbb{R}^{a times b}$ and $Q in mathbb{R}^{n times n}$.
How do I evaluate $frac{partial t(Q,X, Y)}{partial X}$ and $frac{partial t(Q,X, Y)}{partial Y}$?
I mean, I would like to know how to correctly apply the chain rule.
* Addition *
I will try to give more information about my problem.
Suppose that $a = b = n$ and that $f(A,B) = AB$ and $g(A,B) = BA + AB$ (actually this is only an example of possible functions $f$ and $g$).
Then I have that:
$$f(g(Q,X),Y) = f(XQ + QX, Y) = XQY + QXY$$
Then, using matrix calculus (hoping there are no error!), I have that:
$$ frac{partial t(Q,X,Y)}{partial X} = QY + YQ\
frac{partial t(Q,X,Y)}{partial Y} = XQ + QX$$
I can easily compute the result if I know the form of $f$ and $g$. Notice that the derivatives I obtained are in a matrix form.
But actually I need to deal with generic functions. And for this reason I need to use the chain rule.
The problem is that the chain rule formulas I know are helpful to derive the derivative with respect to a certain element of the matrix $X$ (or $Y$). In this case, I'm not able to have a matrix form of the derivatives.
So, my question is... there is a chain rule formula I'm missing which let me describe these derivatives in a matrix form?
* Addition 2 *
The chain rule formulas that I know are reported here http://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix_identities (see the 7th row of the table)
matrices derivatives
It might help to consider the function $h(Q,X,Y) = (g(Q,X),Y)$, whose differential is easily calculated. Then $t(Q,X,Y) = l circ f circ h (Q,X,Y)$, where $l$ is the trace
– wspin
Dec 17 '12 at 20:49
I know this formula (en.wikipedia.org/wiki/… - it is the 7th formula into the table). This is performed on each $X_{i,j}$ separately! I would like to know the formula with respect to all $X$.
– the_candyman
Dec 18 '12 at 14:34
I'm going to try to give more details in my question
– the_candyman
Dec 19 '12 at 20:11
add a comment |
up vote
9
down vote
favorite
up vote
9
down vote
favorite
I googled around and searched inside the forum but I'm still confused about a problem.
I have 2 matrix functions $f,g : mathbb{R}^{n times n} times mathbb{R}^{a times b} rightarrow mathbb{R}^{n times n}$. Starting from this, I have the following expression:
$$ t(Q, X, Y) = text{tr}(f(g(Q, X),Y))$$
where $text{tr}$ is the trace operator and $X, Y in mathbb{R}^{a times b}$ and $Q in mathbb{R}^{n times n}$.
How do I evaluate $frac{partial t(Q,X, Y)}{partial X}$ and $frac{partial t(Q,X, Y)}{partial Y}$?
I mean, I would like to know how to correctly apply the chain rule.
* Addition *
I will try to give more information about my problem.
Suppose that $a = b = n$ and that $f(A,B) = AB$ and $g(A,B) = BA + AB$ (actually this is only an example of possible functions $f$ and $g$).
Then I have that:
$$f(g(Q,X),Y) = f(XQ + QX, Y) = XQY + QXY$$
Then, using matrix calculus (hoping there are no error!), I have that:
$$ frac{partial t(Q,X,Y)}{partial X} = QY + YQ\
frac{partial t(Q,X,Y)}{partial Y} = XQ + QX$$
I can easily compute the result if I know the form of $f$ and $g$. Notice that the derivatives I obtained are in a matrix form.
But actually I need to deal with generic functions. And for this reason I need to use the chain rule.
The problem is that the chain rule formulas I know are helpful to derive the derivative with respect to a certain element of the matrix $X$ (or $Y$). In this case, I'm not able to have a matrix form of the derivatives.
So, my question is... there is a chain rule formula I'm missing which let me describe these derivatives in a matrix form?
* Addition 2 *
The chain rule formulas that I know are reported here http://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix_identities (see the 7th row of the table)
matrices derivatives
I googled around and searched inside the forum but I'm still confused about a problem.
I have 2 matrix functions $f,g : mathbb{R}^{n times n} times mathbb{R}^{a times b} rightarrow mathbb{R}^{n times n}$. Starting from this, I have the following expression:
$$ t(Q, X, Y) = text{tr}(f(g(Q, X),Y))$$
where $text{tr}$ is the trace operator and $X, Y in mathbb{R}^{a times b}$ and $Q in mathbb{R}^{n times n}$.
How do I evaluate $frac{partial t(Q,X, Y)}{partial X}$ and $frac{partial t(Q,X, Y)}{partial Y}$?
I mean, I would like to know how to correctly apply the chain rule.
* Addition *
I will try to give more information about my problem.
Suppose that $a = b = n$ and that $f(A,B) = AB$ and $g(A,B) = BA + AB$ (actually this is only an example of possible functions $f$ and $g$).
Then I have that:
$$f(g(Q,X),Y) = f(XQ + QX, Y) = XQY + QXY$$
Then, using matrix calculus (hoping there are no error!), I have that:
$$ frac{partial t(Q,X,Y)}{partial X} = QY + YQ\
frac{partial t(Q,X,Y)}{partial Y} = XQ + QX$$
I can easily compute the result if I know the form of $f$ and $g$. Notice that the derivatives I obtained are in a matrix form.
But actually I need to deal with generic functions. And for this reason I need to use the chain rule.
The problem is that the chain rule formulas I know are helpful to derive the derivative with respect to a certain element of the matrix $X$ (or $Y$). In this case, I'm not able to have a matrix form of the derivatives.
So, my question is... there is a chain rule formula I'm missing which let me describe these derivatives in a matrix form?
* Addition 2 *
The chain rule formulas that I know are reported here http://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix_identities (see the 7th row of the table)
matrices derivatives
matrices derivatives
edited Nov 23 at 12:25
asked Dec 17 '12 at 14:03
the_candyman
8,69822044
8,69822044
It might help to consider the function $h(Q,X,Y) = (g(Q,X),Y)$, whose differential is easily calculated. Then $t(Q,X,Y) = l circ f circ h (Q,X,Y)$, where $l$ is the trace
– wspin
Dec 17 '12 at 20:49
I know this formula (en.wikipedia.org/wiki/… - it is the 7th formula into the table). This is performed on each $X_{i,j}$ separately! I would like to know the formula with respect to all $X$.
– the_candyman
Dec 18 '12 at 14:34
I'm going to try to give more details in my question
– the_candyman
Dec 19 '12 at 20:11
add a comment |
It might help to consider the function $h(Q,X,Y) = (g(Q,X),Y)$, whose differential is easily calculated. Then $t(Q,X,Y) = l circ f circ h (Q,X,Y)$, where $l$ is the trace
– wspin
Dec 17 '12 at 20:49
I know this formula (en.wikipedia.org/wiki/… - it is the 7th formula into the table). This is performed on each $X_{i,j}$ separately! I would like to know the formula with respect to all $X$.
– the_candyman
Dec 18 '12 at 14:34
I'm going to try to give more details in my question
– the_candyman
Dec 19 '12 at 20:11
It might help to consider the function $h(Q,X,Y) = (g(Q,X),Y)$, whose differential is easily calculated. Then $t(Q,X,Y) = l circ f circ h (Q,X,Y)$, where $l$ is the trace
– wspin
Dec 17 '12 at 20:49
It might help to consider the function $h(Q,X,Y) = (g(Q,X),Y)$, whose differential is easily calculated. Then $t(Q,X,Y) = l circ f circ h (Q,X,Y)$, where $l$ is the trace
– wspin
Dec 17 '12 at 20:49
I know this formula (en.wikipedia.org/wiki/… - it is the 7th formula into the table). This is performed on each $X_{i,j}$ separately! I would like to know the formula with respect to all $X$.
– the_candyman
Dec 18 '12 at 14:34
I know this formula (en.wikipedia.org/wiki/… - it is the 7th formula into the table). This is performed on each $X_{i,j}$ separately! I would like to know the formula with respect to all $X$.
– the_candyman
Dec 18 '12 at 14:34
I'm going to try to give more details in my question
– the_candyman
Dec 19 '12 at 20:11
I'm going to try to give more details in my question
– the_candyman
Dec 19 '12 at 20:11
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
Let's use uppercase letters for the matrix variables, so they're easy to distinguish from the lowercase scalars
$$eqalign{
G &= G(Q,X) cr
F &= F(G,Y) cr
t &= {rm tr}(F) = I:F cr
dt &= I:dF cr
}$$
First, let's calculate the differential and gradient wrt $Y$
$$eqalign{
dt &= I:Big(frac{partial F}{partial Y}:dYBig) cr
frac{partial t}{partial Y} &= I:frac{partial F}{partial Y} cr
}$$
And now wrt $X$
$$eqalign{
dt &= I:Big(frac{partial F}{partial G}:frac{partial G}{partial X}:dXBig) cr
frac{partial t}{partial X} &= I:frac{partial F}{partial G}:frac{partial G}{partial X} crcr
}$$
Note that the matrix-by-matrix gradients are 4th order tensors. For example, here is one of the gradients in component form
$$eqalign{
Big(frac{partial G}{partial X}Big)_{ijkl} = frac{partial G_{ij}}{partial X_{kl}}crcr
}$$
Also note that colons are used to denote the double-contraction product, e.g.
$$Big(frac{partial F}{partial G}:frac{partial G}{partial X}Big)_{ijkl}
= frac{partial F_{ij}}{partial G_{mn}},frac{partial G_{mn}}{partial X_{kl}}$$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f260710%2fchain-rule-for-matrix-im-confused%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
Let's use uppercase letters for the matrix variables, so they're easy to distinguish from the lowercase scalars
$$eqalign{
G &= G(Q,X) cr
F &= F(G,Y) cr
t &= {rm tr}(F) = I:F cr
dt &= I:dF cr
}$$
First, let's calculate the differential and gradient wrt $Y$
$$eqalign{
dt &= I:Big(frac{partial F}{partial Y}:dYBig) cr
frac{partial t}{partial Y} &= I:frac{partial F}{partial Y} cr
}$$
And now wrt $X$
$$eqalign{
dt &= I:Big(frac{partial F}{partial G}:frac{partial G}{partial X}:dXBig) cr
frac{partial t}{partial X} &= I:frac{partial F}{partial G}:frac{partial G}{partial X} crcr
}$$
Note that the matrix-by-matrix gradients are 4th order tensors. For example, here is one of the gradients in component form
$$eqalign{
Big(frac{partial G}{partial X}Big)_{ijkl} = frac{partial G_{ij}}{partial X_{kl}}crcr
}$$
Also note that colons are used to denote the double-contraction product, e.g.
$$Big(frac{partial F}{partial G}:frac{partial G}{partial X}Big)_{ijkl}
= frac{partial F_{ij}}{partial G_{mn}},frac{partial G_{mn}}{partial X_{kl}}$$
add a comment |
up vote
2
down vote
Let's use uppercase letters for the matrix variables, so they're easy to distinguish from the lowercase scalars
$$eqalign{
G &= G(Q,X) cr
F &= F(G,Y) cr
t &= {rm tr}(F) = I:F cr
dt &= I:dF cr
}$$
First, let's calculate the differential and gradient wrt $Y$
$$eqalign{
dt &= I:Big(frac{partial F}{partial Y}:dYBig) cr
frac{partial t}{partial Y} &= I:frac{partial F}{partial Y} cr
}$$
And now wrt $X$
$$eqalign{
dt &= I:Big(frac{partial F}{partial G}:frac{partial G}{partial X}:dXBig) cr
frac{partial t}{partial X} &= I:frac{partial F}{partial G}:frac{partial G}{partial X} crcr
}$$
Note that the matrix-by-matrix gradients are 4th order tensors. For example, here is one of the gradients in component form
$$eqalign{
Big(frac{partial G}{partial X}Big)_{ijkl} = frac{partial G_{ij}}{partial X_{kl}}crcr
}$$
Also note that colons are used to denote the double-contraction product, e.g.
$$Big(frac{partial F}{partial G}:frac{partial G}{partial X}Big)_{ijkl}
= frac{partial F_{ij}}{partial G_{mn}},frac{partial G_{mn}}{partial X_{kl}}$$
add a comment |
up vote
2
down vote
up vote
2
down vote
Let's use uppercase letters for the matrix variables, so they're easy to distinguish from the lowercase scalars
$$eqalign{
G &= G(Q,X) cr
F &= F(G,Y) cr
t &= {rm tr}(F) = I:F cr
dt &= I:dF cr
}$$
First, let's calculate the differential and gradient wrt $Y$
$$eqalign{
dt &= I:Big(frac{partial F}{partial Y}:dYBig) cr
frac{partial t}{partial Y} &= I:frac{partial F}{partial Y} cr
}$$
And now wrt $X$
$$eqalign{
dt &= I:Big(frac{partial F}{partial G}:frac{partial G}{partial X}:dXBig) cr
frac{partial t}{partial X} &= I:frac{partial F}{partial G}:frac{partial G}{partial X} crcr
}$$
Note that the matrix-by-matrix gradients are 4th order tensors. For example, here is one of the gradients in component form
$$eqalign{
Big(frac{partial G}{partial X}Big)_{ijkl} = frac{partial G_{ij}}{partial X_{kl}}crcr
}$$
Also note that colons are used to denote the double-contraction product, e.g.
$$Big(frac{partial F}{partial G}:frac{partial G}{partial X}Big)_{ijkl}
= frac{partial F_{ij}}{partial G_{mn}},frac{partial G_{mn}}{partial X_{kl}}$$
Let's use uppercase letters for the matrix variables, so they're easy to distinguish from the lowercase scalars
$$eqalign{
G &= G(Q,X) cr
F &= F(G,Y) cr
t &= {rm tr}(F) = I:F cr
dt &= I:dF cr
}$$
First, let's calculate the differential and gradient wrt $Y$
$$eqalign{
dt &= I:Big(frac{partial F}{partial Y}:dYBig) cr
frac{partial t}{partial Y} &= I:frac{partial F}{partial Y} cr
}$$
And now wrt $X$
$$eqalign{
dt &= I:Big(frac{partial F}{partial G}:frac{partial G}{partial X}:dXBig) cr
frac{partial t}{partial X} &= I:frac{partial F}{partial G}:frac{partial G}{partial X} crcr
}$$
Note that the matrix-by-matrix gradients are 4th order tensors. For example, here is one of the gradients in component form
$$eqalign{
Big(frac{partial G}{partial X}Big)_{ijkl} = frac{partial G_{ij}}{partial X_{kl}}crcr
}$$
Also note that colons are used to denote the double-contraction product, e.g.
$$Big(frac{partial F}{partial G}:frac{partial G}{partial X}Big)_{ijkl}
= frac{partial F_{ij}}{partial G_{mn}},frac{partial G_{mn}}{partial X_{kl}}$$
edited Aug 9 '17 at 12:49
answered Aug 7 '17 at 14:39
greg
7,4651721
7,4651721
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f260710%2fchain-rule-for-matrix-im-confused%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It might help to consider the function $h(Q,X,Y) = (g(Q,X),Y)$, whose differential is easily calculated. Then $t(Q,X,Y) = l circ f circ h (Q,X,Y)$, where $l$ is the trace
– wspin
Dec 17 '12 at 20:49
I know this formula (en.wikipedia.org/wiki/… - it is the 7th formula into the table). This is performed on each $X_{i,j}$ separately! I would like to know the formula with respect to all $X$.
– the_candyman
Dec 18 '12 at 14:34
I'm going to try to give more details in my question
– the_candyman
Dec 19 '12 at 20:11