What is the derivative function used in backpropagration?
I'm learning AI, but this confuses me. The derivative function used in backpropagation is the derivative of activation function or the derivative of loss function?
These terms are confusing: derivative of act. function, partial derivative wrt. loss function??
I'm still not getting it correct.
backpropagation activation-function loss-functions
add a comment |
I'm learning AI, but this confuses me. The derivative function used in backpropagation is the derivative of activation function or the derivative of loss function?
These terms are confusing: derivative of act. function, partial derivative wrt. loss function??
I'm still not getting it correct.
backpropagation activation-function loss-functions
add a comment |
I'm learning AI, but this confuses me. The derivative function used in backpropagation is the derivative of activation function or the derivative of loss function?
These terms are confusing: derivative of act. function, partial derivative wrt. loss function??
I'm still not getting it correct.
backpropagation activation-function loss-functions
I'm learning AI, but this confuses me. The derivative function used in backpropagation is the derivative of activation function or the derivative of loss function?
These terms are confusing: derivative of act. function, partial derivative wrt. loss function??
I'm still not getting it correct.
backpropagation activation-function loss-functions
backpropagation activation-function loss-functions
asked Dec 18 '18 at 7:59
datdinhquoc
1154
1154
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
In back propagation, both the derivative of the loss function as well as the derivative of the activation function are used for error minimization.
The derivative of the loss function is used to compute the compute the gradients between the last hidden layer and output layer.
The derivative of the activation function is used to compute the gradients of all layers except the output layer.
The weights from a layer get activated in the next layer. Hence in this scenario, the derivative of the activation function will be used.
The weights from the last hidden layer get activated in the output layer. Hence, here the derivative of the loss function is used since the output layer utilizes the loss function.
2
Please learn the math and correct the answer. It is too important a question to leave incorrect.
– FauChristian
Dec 18 '18 at 10:53
2
@FauChristian I can't even understand the question and the answer is also incomprehensible...What exactly is the OP trying to mean or know??
– DuttaA
Dec 18 '18 at 11:14
1
@datdinhquoc wants to know the difference between the partial derivative of the loss function and the activation function which are used in back propogation.
– Shubham Panchal
Dec 18 '18 at 13:45
add a comment |
Overview
The derivatives of functions are used to determine what changes to input parameters correspond to what desired change in output for any given point in the forward propagation and cost, loss, or error evaluation &mdash whatever it is conceptually the learning process is attempting to minimize. This is the conceptual and algebraic inverse of maximizing valuation, yield, or accuracy.
Back-propagation estimates the next best step toward the objective quantified in the cost function in a search. The result of the search is a set of parameter matrices, each element of which represents what is sometimes called a connection weight. The improvement of the values of the elements in the pursuit of minimal cost is artificial networking's basic approach to learning.
Each step is an estimation because the cost function is a finite difference, where as the partial derivatives express the slope of a hyper-plane normal to surfaces that represent functions that comprise forward propagation. The goal is to set up circumstances so that successive approximations approach the ideal represented by minimization of the cost function.
Back-propagation Theory
Back-propagation is a scheme for distribution of a correction signal arising from cost evaluation after each sample or mini-batch of them. With a form of Einsteinian notation, the current convention for distributive, incremental parameter improvement can be expressed concisely.
$$ Delta P = dfrac {c(vec{o}, vec{ell}) ; alpha} {big[ prod^+ ! P big] ; big[ prod^+ !a'(vec{s} , P + vec{z}) big] ; big[ c'(vec{o}, vec{ell}) big]} $$
The plus sign in $prod^+!$ designates that the factors multiplied must be downstream in the forward signal flow from the parameter matrix being updated.
In sentence form, $Delta P$ at any layer shall be the quotient of cost function $c$ (given label vector $vec{ell}$ and network output signal $vec{o}$), attenuated by learning rate $alpha$, over the product of all the derivatives leading up to the cost evaluation. The multiplication of these derivatives arise through the recursive application of the chain rule.
It is because the chain rule is a core method for feedback signal evaluation that partial derivatives must be used. All variables must be bound except for one dependent and one independent variable for the chain rule to apply.
The derivatives include three types.
- All layer input factors, the weights in the parameter matrix used to attenuate the signal during forward propagation, which are equal to the derivatives of those signal paths
- All the derivatives of activation functions $a$ evaluated at the sum of the matrix vector product of the parameters and signal at that layer plus the bias vector
- The derivative of the cost function $c$ evaluated at the current output value $vec{o}$ with the label $vec{ell}$
Answer to the Question
Note that, as a consequence of the above, the derivatives of both the cost (or loss or error) function and any activation functions are necessary.
Redundant Operation Removal for an Efficient Algorithm Design
Actual back propagation algorithms save computing resources and time using three techniques.
- Temporary storage of the value used for evaluation of the derivative (since it was already calculated during forward propagation)
- Temporary storage of products to avoid redundant multiplication operations (a form of reverse mode automatic differentiation)
- Use of reciprocals because division is more costly than multiplication at a hardware level
In addition to these practical principles of algorithm design, other algorithm features arise from extensions of basic back-propagation. Mini-batch SGD (stochastic gradient descent) applies averaging to improve convergence reliability and accuracy in most cases, provided hyper-parameters and initial parameter states are well chosen. Gradual reduction of learning rates, momentum, and various other techniques are often used to further improve outcomes in deeper artificial networks.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "658"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f9578%2fwhat-is-the-derivative-function-used-in-backpropagration%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
In back propagation, both the derivative of the loss function as well as the derivative of the activation function are used for error minimization.
The derivative of the loss function is used to compute the compute the gradients between the last hidden layer and output layer.
The derivative of the activation function is used to compute the gradients of all layers except the output layer.
The weights from a layer get activated in the next layer. Hence in this scenario, the derivative of the activation function will be used.
The weights from the last hidden layer get activated in the output layer. Hence, here the derivative of the loss function is used since the output layer utilizes the loss function.
2
Please learn the math and correct the answer. It is too important a question to leave incorrect.
– FauChristian
Dec 18 '18 at 10:53
2
@FauChristian I can't even understand the question and the answer is also incomprehensible...What exactly is the OP trying to mean or know??
– DuttaA
Dec 18 '18 at 11:14
1
@datdinhquoc wants to know the difference between the partial derivative of the loss function and the activation function which are used in back propogation.
– Shubham Panchal
Dec 18 '18 at 13:45
add a comment |
In back propagation, both the derivative of the loss function as well as the derivative of the activation function are used for error minimization.
The derivative of the loss function is used to compute the compute the gradients between the last hidden layer and output layer.
The derivative of the activation function is used to compute the gradients of all layers except the output layer.
The weights from a layer get activated in the next layer. Hence in this scenario, the derivative of the activation function will be used.
The weights from the last hidden layer get activated in the output layer. Hence, here the derivative of the loss function is used since the output layer utilizes the loss function.
2
Please learn the math and correct the answer. It is too important a question to leave incorrect.
– FauChristian
Dec 18 '18 at 10:53
2
@FauChristian I can't even understand the question and the answer is also incomprehensible...What exactly is the OP trying to mean or know??
– DuttaA
Dec 18 '18 at 11:14
1
@datdinhquoc wants to know the difference between the partial derivative of the loss function and the activation function which are used in back propogation.
– Shubham Panchal
Dec 18 '18 at 13:45
add a comment |
In back propagation, both the derivative of the loss function as well as the derivative of the activation function are used for error minimization.
The derivative of the loss function is used to compute the compute the gradients between the last hidden layer and output layer.
The derivative of the activation function is used to compute the gradients of all layers except the output layer.
The weights from a layer get activated in the next layer. Hence in this scenario, the derivative of the activation function will be used.
The weights from the last hidden layer get activated in the output layer. Hence, here the derivative of the loss function is used since the output layer utilizes the loss function.
In back propagation, both the derivative of the loss function as well as the derivative of the activation function are used for error minimization.
The derivative of the loss function is used to compute the compute the gradients between the last hidden layer and output layer.
The derivative of the activation function is used to compute the gradients of all layers except the output layer.
The weights from a layer get activated in the next layer. Hence in this scenario, the derivative of the activation function will be used.
The weights from the last hidden layer get activated in the output layer. Hence, here the derivative of the loss function is used since the output layer utilizes the loss function.
edited Dec 18 '18 at 18:26
jazib jamil
183
183
answered Dec 18 '18 at 8:36
Shubham Panchal
35519
35519
2
Please learn the math and correct the answer. It is too important a question to leave incorrect.
– FauChristian
Dec 18 '18 at 10:53
2
@FauChristian I can't even understand the question and the answer is also incomprehensible...What exactly is the OP trying to mean or know??
– DuttaA
Dec 18 '18 at 11:14
1
@datdinhquoc wants to know the difference between the partial derivative of the loss function and the activation function which are used in back propogation.
– Shubham Panchal
Dec 18 '18 at 13:45
add a comment |
2
Please learn the math and correct the answer. It is too important a question to leave incorrect.
– FauChristian
Dec 18 '18 at 10:53
2
@FauChristian I can't even understand the question and the answer is also incomprehensible...What exactly is the OP trying to mean or know??
– DuttaA
Dec 18 '18 at 11:14
1
@datdinhquoc wants to know the difference between the partial derivative of the loss function and the activation function which are used in back propogation.
– Shubham Panchal
Dec 18 '18 at 13:45
2
2
Please learn the math and correct the answer. It is too important a question to leave incorrect.
– FauChristian
Dec 18 '18 at 10:53
Please learn the math and correct the answer. It is too important a question to leave incorrect.
– FauChristian
Dec 18 '18 at 10:53
2
2
@FauChristian I can't even understand the question and the answer is also incomprehensible...What exactly is the OP trying to mean or know??
– DuttaA
Dec 18 '18 at 11:14
@FauChristian I can't even understand the question and the answer is also incomprehensible...What exactly is the OP trying to mean or know??
– DuttaA
Dec 18 '18 at 11:14
1
1
@datdinhquoc wants to know the difference between the partial derivative of the loss function and the activation function which are used in back propogation.
– Shubham Panchal
Dec 18 '18 at 13:45
@datdinhquoc wants to know the difference between the partial derivative of the loss function and the activation function which are used in back propogation.
– Shubham Panchal
Dec 18 '18 at 13:45
add a comment |
Overview
The derivatives of functions are used to determine what changes to input parameters correspond to what desired change in output for any given point in the forward propagation and cost, loss, or error evaluation &mdash whatever it is conceptually the learning process is attempting to minimize. This is the conceptual and algebraic inverse of maximizing valuation, yield, or accuracy.
Back-propagation estimates the next best step toward the objective quantified in the cost function in a search. The result of the search is a set of parameter matrices, each element of which represents what is sometimes called a connection weight. The improvement of the values of the elements in the pursuit of minimal cost is artificial networking's basic approach to learning.
Each step is an estimation because the cost function is a finite difference, where as the partial derivatives express the slope of a hyper-plane normal to surfaces that represent functions that comprise forward propagation. The goal is to set up circumstances so that successive approximations approach the ideal represented by minimization of the cost function.
Back-propagation Theory
Back-propagation is a scheme for distribution of a correction signal arising from cost evaluation after each sample or mini-batch of them. With a form of Einsteinian notation, the current convention for distributive, incremental parameter improvement can be expressed concisely.
$$ Delta P = dfrac {c(vec{o}, vec{ell}) ; alpha} {big[ prod^+ ! P big] ; big[ prod^+ !a'(vec{s} , P + vec{z}) big] ; big[ c'(vec{o}, vec{ell}) big]} $$
The plus sign in $prod^+!$ designates that the factors multiplied must be downstream in the forward signal flow from the parameter matrix being updated.
In sentence form, $Delta P$ at any layer shall be the quotient of cost function $c$ (given label vector $vec{ell}$ and network output signal $vec{o}$), attenuated by learning rate $alpha$, over the product of all the derivatives leading up to the cost evaluation. The multiplication of these derivatives arise through the recursive application of the chain rule.
It is because the chain rule is a core method for feedback signal evaluation that partial derivatives must be used. All variables must be bound except for one dependent and one independent variable for the chain rule to apply.
The derivatives include three types.
- All layer input factors, the weights in the parameter matrix used to attenuate the signal during forward propagation, which are equal to the derivatives of those signal paths
- All the derivatives of activation functions $a$ evaluated at the sum of the matrix vector product of the parameters and signal at that layer plus the bias vector
- The derivative of the cost function $c$ evaluated at the current output value $vec{o}$ with the label $vec{ell}$
Answer to the Question
Note that, as a consequence of the above, the derivatives of both the cost (or loss or error) function and any activation functions are necessary.
Redundant Operation Removal for an Efficient Algorithm Design
Actual back propagation algorithms save computing resources and time using three techniques.
- Temporary storage of the value used for evaluation of the derivative (since it was already calculated during forward propagation)
- Temporary storage of products to avoid redundant multiplication operations (a form of reverse mode automatic differentiation)
- Use of reciprocals because division is more costly than multiplication at a hardware level
In addition to these practical principles of algorithm design, other algorithm features arise from extensions of basic back-propagation. Mini-batch SGD (stochastic gradient descent) applies averaging to improve convergence reliability and accuracy in most cases, provided hyper-parameters and initial parameter states are well chosen. Gradual reduction of learning rates, momentum, and various other techniques are often used to further improve outcomes in deeper artificial networks.
add a comment |
Overview
The derivatives of functions are used to determine what changes to input parameters correspond to what desired change in output for any given point in the forward propagation and cost, loss, or error evaluation &mdash whatever it is conceptually the learning process is attempting to minimize. This is the conceptual and algebraic inverse of maximizing valuation, yield, or accuracy.
Back-propagation estimates the next best step toward the objective quantified in the cost function in a search. The result of the search is a set of parameter matrices, each element of which represents what is sometimes called a connection weight. The improvement of the values of the elements in the pursuit of minimal cost is artificial networking's basic approach to learning.
Each step is an estimation because the cost function is a finite difference, where as the partial derivatives express the slope of a hyper-plane normal to surfaces that represent functions that comprise forward propagation. The goal is to set up circumstances so that successive approximations approach the ideal represented by minimization of the cost function.
Back-propagation Theory
Back-propagation is a scheme for distribution of a correction signal arising from cost evaluation after each sample or mini-batch of them. With a form of Einsteinian notation, the current convention for distributive, incremental parameter improvement can be expressed concisely.
$$ Delta P = dfrac {c(vec{o}, vec{ell}) ; alpha} {big[ prod^+ ! P big] ; big[ prod^+ !a'(vec{s} , P + vec{z}) big] ; big[ c'(vec{o}, vec{ell}) big]} $$
The plus sign in $prod^+!$ designates that the factors multiplied must be downstream in the forward signal flow from the parameter matrix being updated.
In sentence form, $Delta P$ at any layer shall be the quotient of cost function $c$ (given label vector $vec{ell}$ and network output signal $vec{o}$), attenuated by learning rate $alpha$, over the product of all the derivatives leading up to the cost evaluation. The multiplication of these derivatives arise through the recursive application of the chain rule.
It is because the chain rule is a core method for feedback signal evaluation that partial derivatives must be used. All variables must be bound except for one dependent and one independent variable for the chain rule to apply.
The derivatives include three types.
- All layer input factors, the weights in the parameter matrix used to attenuate the signal during forward propagation, which are equal to the derivatives of those signal paths
- All the derivatives of activation functions $a$ evaluated at the sum of the matrix vector product of the parameters and signal at that layer plus the bias vector
- The derivative of the cost function $c$ evaluated at the current output value $vec{o}$ with the label $vec{ell}$
Answer to the Question
Note that, as a consequence of the above, the derivatives of both the cost (or loss or error) function and any activation functions are necessary.
Redundant Operation Removal for an Efficient Algorithm Design
Actual back propagation algorithms save computing resources and time using three techniques.
- Temporary storage of the value used for evaluation of the derivative (since it was already calculated during forward propagation)
- Temporary storage of products to avoid redundant multiplication operations (a form of reverse mode automatic differentiation)
- Use of reciprocals because division is more costly than multiplication at a hardware level
In addition to these practical principles of algorithm design, other algorithm features arise from extensions of basic back-propagation. Mini-batch SGD (stochastic gradient descent) applies averaging to improve convergence reliability and accuracy in most cases, provided hyper-parameters and initial parameter states are well chosen. Gradual reduction of learning rates, momentum, and various other techniques are often used to further improve outcomes in deeper artificial networks.
add a comment |
Overview
The derivatives of functions are used to determine what changes to input parameters correspond to what desired change in output for any given point in the forward propagation and cost, loss, or error evaluation &mdash whatever it is conceptually the learning process is attempting to minimize. This is the conceptual and algebraic inverse of maximizing valuation, yield, or accuracy.
Back-propagation estimates the next best step toward the objective quantified in the cost function in a search. The result of the search is a set of parameter matrices, each element of which represents what is sometimes called a connection weight. The improvement of the values of the elements in the pursuit of minimal cost is artificial networking's basic approach to learning.
Each step is an estimation because the cost function is a finite difference, where as the partial derivatives express the slope of a hyper-plane normal to surfaces that represent functions that comprise forward propagation. The goal is to set up circumstances so that successive approximations approach the ideal represented by minimization of the cost function.
Back-propagation Theory
Back-propagation is a scheme for distribution of a correction signal arising from cost evaluation after each sample or mini-batch of them. With a form of Einsteinian notation, the current convention for distributive, incremental parameter improvement can be expressed concisely.
$$ Delta P = dfrac {c(vec{o}, vec{ell}) ; alpha} {big[ prod^+ ! P big] ; big[ prod^+ !a'(vec{s} , P + vec{z}) big] ; big[ c'(vec{o}, vec{ell}) big]} $$
The plus sign in $prod^+!$ designates that the factors multiplied must be downstream in the forward signal flow from the parameter matrix being updated.
In sentence form, $Delta P$ at any layer shall be the quotient of cost function $c$ (given label vector $vec{ell}$ and network output signal $vec{o}$), attenuated by learning rate $alpha$, over the product of all the derivatives leading up to the cost evaluation. The multiplication of these derivatives arise through the recursive application of the chain rule.
It is because the chain rule is a core method for feedback signal evaluation that partial derivatives must be used. All variables must be bound except for one dependent and one independent variable for the chain rule to apply.
The derivatives include three types.
- All layer input factors, the weights in the parameter matrix used to attenuate the signal during forward propagation, which are equal to the derivatives of those signal paths
- All the derivatives of activation functions $a$ evaluated at the sum of the matrix vector product of the parameters and signal at that layer plus the bias vector
- The derivative of the cost function $c$ evaluated at the current output value $vec{o}$ with the label $vec{ell}$
Answer to the Question
Note that, as a consequence of the above, the derivatives of both the cost (or loss or error) function and any activation functions are necessary.
Redundant Operation Removal for an Efficient Algorithm Design
Actual back propagation algorithms save computing resources and time using three techniques.
- Temporary storage of the value used for evaluation of the derivative (since it was already calculated during forward propagation)
- Temporary storage of products to avoid redundant multiplication operations (a form of reverse mode automatic differentiation)
- Use of reciprocals because division is more costly than multiplication at a hardware level
In addition to these practical principles of algorithm design, other algorithm features arise from extensions of basic back-propagation. Mini-batch SGD (stochastic gradient descent) applies averaging to improve convergence reliability and accuracy in most cases, provided hyper-parameters and initial parameter states are well chosen. Gradual reduction of learning rates, momentum, and various other techniques are often used to further improve outcomes in deeper artificial networks.
Overview
The derivatives of functions are used to determine what changes to input parameters correspond to what desired change in output for any given point in the forward propagation and cost, loss, or error evaluation &mdash whatever it is conceptually the learning process is attempting to minimize. This is the conceptual and algebraic inverse of maximizing valuation, yield, or accuracy.
Back-propagation estimates the next best step toward the objective quantified in the cost function in a search. The result of the search is a set of parameter matrices, each element of which represents what is sometimes called a connection weight. The improvement of the values of the elements in the pursuit of minimal cost is artificial networking's basic approach to learning.
Each step is an estimation because the cost function is a finite difference, where as the partial derivatives express the slope of a hyper-plane normal to surfaces that represent functions that comprise forward propagation. The goal is to set up circumstances so that successive approximations approach the ideal represented by minimization of the cost function.
Back-propagation Theory
Back-propagation is a scheme for distribution of a correction signal arising from cost evaluation after each sample or mini-batch of them. With a form of Einsteinian notation, the current convention for distributive, incremental parameter improvement can be expressed concisely.
$$ Delta P = dfrac {c(vec{o}, vec{ell}) ; alpha} {big[ prod^+ ! P big] ; big[ prod^+ !a'(vec{s} , P + vec{z}) big] ; big[ c'(vec{o}, vec{ell}) big]} $$
The plus sign in $prod^+!$ designates that the factors multiplied must be downstream in the forward signal flow from the parameter matrix being updated.
In sentence form, $Delta P$ at any layer shall be the quotient of cost function $c$ (given label vector $vec{ell}$ and network output signal $vec{o}$), attenuated by learning rate $alpha$, over the product of all the derivatives leading up to the cost evaluation. The multiplication of these derivatives arise through the recursive application of the chain rule.
It is because the chain rule is a core method for feedback signal evaluation that partial derivatives must be used. All variables must be bound except for one dependent and one independent variable for the chain rule to apply.
The derivatives include three types.
- All layer input factors, the weights in the parameter matrix used to attenuate the signal during forward propagation, which are equal to the derivatives of those signal paths
- All the derivatives of activation functions $a$ evaluated at the sum of the matrix vector product of the parameters and signal at that layer plus the bias vector
- The derivative of the cost function $c$ evaluated at the current output value $vec{o}$ with the label $vec{ell}$
Answer to the Question
Note that, as a consequence of the above, the derivatives of both the cost (or loss or error) function and any activation functions are necessary.
Redundant Operation Removal for an Efficient Algorithm Design
Actual back propagation algorithms save computing resources and time using three techniques.
- Temporary storage of the value used for evaluation of the derivative (since it was already calculated during forward propagation)
- Temporary storage of products to avoid redundant multiplication operations (a form of reverse mode automatic differentiation)
- Use of reciprocals because division is more costly than multiplication at a hardware level
In addition to these practical principles of algorithm design, other algorithm features arise from extensions of basic back-propagation. Mini-batch SGD (stochastic gradient descent) applies averaging to improve convergence reliability and accuracy in most cases, provided hyper-parameters and initial parameter states are well chosen. Gradual reduction of learning rates, momentum, and various other techniques are often used to further improve outcomes in deeper artificial networks.
edited Dec 22 '18 at 14:45
answered Dec 22 '18 at 13:00
Douglas Daseeco
4,416837
4,416837
add a comment |
add a comment |
Thanks for contributing an answer to Artificial Intelligence Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f9578%2fwhat-is-the-derivative-function-used-in-backpropagration%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown