Best estimate for random values
$begingroup$
Due to work related issues I can't discuss the exact question I want to ask, but I thought of a silly little example that conveys the same idea.
Lets say the number of candy that comes in a package is a random variable with mean $mu$ and a standard deviation $s$, after about 2 months of data gathering we've got about 100000 measurements and a pretty good estimate of $mu$ and $s$.
Lets say that said candy comes in 5 flavours that are NOT identically distributed (we know the mean and standard deviation for each flavor, lets call them $mu_1$ through $mu_5$ and $s_1$ trough $s_5$).
Lets say that next month we will get a new batch (several packages) of candy from our supplier and we would like to estimate the amount of candy we will get for each flavour. Is there a better way than simply assuming that we'll get "around" the mean for each flavour taking into account that the amount of candy we'll get is around $mu$?
I have access to all the measurements made, so if anything is needed (higher order moments, other relevant data, etc.) I can compute it and update the question as needed.
Cheers and thanks!
statistics random estimation
$endgroup$
add a comment |
$begingroup$
Due to work related issues I can't discuss the exact question I want to ask, but I thought of a silly little example that conveys the same idea.
Lets say the number of candy that comes in a package is a random variable with mean $mu$ and a standard deviation $s$, after about 2 months of data gathering we've got about 100000 measurements and a pretty good estimate of $mu$ and $s$.
Lets say that said candy comes in 5 flavours that are NOT identically distributed (we know the mean and standard deviation for each flavor, lets call them $mu_1$ through $mu_5$ and $s_1$ trough $s_5$).
Lets say that next month we will get a new batch (several packages) of candy from our supplier and we would like to estimate the amount of candy we will get for each flavour. Is there a better way than simply assuming that we'll get "around" the mean for each flavour taking into account that the amount of candy we'll get is around $mu$?
I have access to all the measurements made, so if anything is needed (higher order moments, other relevant data, etc.) I can compute it and update the question as needed.
Cheers and thanks!
statistics random estimation
$endgroup$
$begingroup$
Is "batch" synonymous with "package"? If so, the question would be considerably clearer if you used the same word again. If not, please clarify the difference.
$endgroup$
– joriki
Apr 2 '13 at 21:34
$begingroup$
I'm sorry, I forgot to add that a batch is several packages, I don't think it matters as we'll evaluate per "package"
$endgroup$
– Zegpi
Apr 2 '13 at 21:52
add a comment |
$begingroup$
Due to work related issues I can't discuss the exact question I want to ask, but I thought of a silly little example that conveys the same idea.
Lets say the number of candy that comes in a package is a random variable with mean $mu$ and a standard deviation $s$, after about 2 months of data gathering we've got about 100000 measurements and a pretty good estimate of $mu$ and $s$.
Lets say that said candy comes in 5 flavours that are NOT identically distributed (we know the mean and standard deviation for each flavor, lets call them $mu_1$ through $mu_5$ and $s_1$ trough $s_5$).
Lets say that next month we will get a new batch (several packages) of candy from our supplier and we would like to estimate the amount of candy we will get for each flavour. Is there a better way than simply assuming that we'll get "around" the mean for each flavour taking into account that the amount of candy we'll get is around $mu$?
I have access to all the measurements made, so if anything is needed (higher order moments, other relevant data, etc.) I can compute it and update the question as needed.
Cheers and thanks!
statistics random estimation
$endgroup$
Due to work related issues I can't discuss the exact question I want to ask, but I thought of a silly little example that conveys the same idea.
Lets say the number of candy that comes in a package is a random variable with mean $mu$ and a standard deviation $s$, after about 2 months of data gathering we've got about 100000 measurements and a pretty good estimate of $mu$ and $s$.
Lets say that said candy comes in 5 flavours that are NOT identically distributed (we know the mean and standard deviation for each flavor, lets call them $mu_1$ through $mu_5$ and $s_1$ trough $s_5$).
Lets say that next month we will get a new batch (several packages) of candy from our supplier and we would like to estimate the amount of candy we will get for each flavour. Is there a better way than simply assuming that we'll get "around" the mean for each flavour taking into account that the amount of candy we'll get is around $mu$?
I have access to all the measurements made, so if anything is needed (higher order moments, other relevant data, etc.) I can compute it and update the question as needed.
Cheers and thanks!
statistics random estimation
statistics random estimation
edited Apr 2 '13 at 21:51
Zegpi
asked Apr 2 '13 at 21:08
ZegpiZegpi
263
263
$begingroup$
Is "batch" synonymous with "package"? If so, the question would be considerably clearer if you used the same word again. If not, please clarify the difference.
$endgroup$
– joriki
Apr 2 '13 at 21:34
$begingroup$
I'm sorry, I forgot to add that a batch is several packages, I don't think it matters as we'll evaluate per "package"
$endgroup$
– Zegpi
Apr 2 '13 at 21:52
add a comment |
$begingroup$
Is "batch" synonymous with "package"? If so, the question would be considerably clearer if you used the same word again. If not, please clarify the difference.
$endgroup$
– joriki
Apr 2 '13 at 21:34
$begingroup$
I'm sorry, I forgot to add that a batch is several packages, I don't think it matters as we'll evaluate per "package"
$endgroup$
– Zegpi
Apr 2 '13 at 21:52
$begingroup$
Is "batch" synonymous with "package"? If so, the question would be considerably clearer if you used the same word again. If not, please clarify the difference.
$endgroup$
– joriki
Apr 2 '13 at 21:34
$begingroup$
Is "batch" synonymous with "package"? If so, the question would be considerably clearer if you used the same word again. If not, please clarify the difference.
$endgroup$
– joriki
Apr 2 '13 at 21:34
$begingroup$
I'm sorry, I forgot to add that a batch is several packages, I don't think it matters as we'll evaluate per "package"
$endgroup$
– Zegpi
Apr 2 '13 at 21:52
$begingroup$
I'm sorry, I forgot to add that a batch is several packages, I don't think it matters as we'll evaluate per "package"
$endgroup$
– Zegpi
Apr 2 '13 at 21:52
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
I find the question a bit unclear but since you have an estimate of the mean and standard deviation for a single package, assuming the candy in each package is independent why not use, $E(X_1+X_2)=E(X_1)+E(X_2)$ and $Var(X_1+X_2) = Var(X_1) + Var(X_2) + 2*cov(X_1,X_2)$ ?
$endgroup$
$begingroup$
Since we want to get an estimate for each "flavour" we are now using $mu_1$ through $mu_5$. I would like to know if there is a better method to estimate how many of each "flavour" we'll get.
$endgroup$
– Zegpi
Apr 2 '13 at 22:00
$begingroup$
I do no think that you can do beter then using the distirubtion of each flavour. If for example you would know that $sigma_1=sigma_2$ you could use a pooled variance estimator to get a better estimate $s_{pooled}$ for $sigma_1$ and $sigma_2$. The same reasoning holds for $mu$.
$endgroup$
– MrOperator
Apr 3 '13 at 10:14
$begingroup$
That's what I thought, but I hoped someone could have a better idea.
$endgroup$
– Zegpi
Apr 3 '13 at 11:46
add a comment |
$begingroup$
If the batches are independent, there is nothing you can do other than taking the mean, which is (probably) an unbiaised estimator. It is a common misconception that in random phenomena, the past influences the future and allows predictions (such as one million tails in a row have more chance to be followed by a head). This is false.
$endgroup$
add a comment |
$begingroup$
It depends on your definition of "better."
You need to define your risk function. If your risk function is MSE, you can do better than simply using the sample means. The idea is to use shrinkage, which as the name suggests means to shrink all your $mu_i$ estimates slightly towards 0. The amount of shrinkage should be proportional to the sample variance $s^2$ of your data (noisier data calls for more shrinkage) and inversely proportional to the number of data points $n$ that you collect. Note that the James-Stein estimator is only better for $m ge 3$ flavors. In general, some form of regularization is always wise in empirical problems.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f349483%2fbest-estimate-for-random-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I find the question a bit unclear but since you have an estimate of the mean and standard deviation for a single package, assuming the candy in each package is independent why not use, $E(X_1+X_2)=E(X_1)+E(X_2)$ and $Var(X_1+X_2) = Var(X_1) + Var(X_2) + 2*cov(X_1,X_2)$ ?
$endgroup$
$begingroup$
Since we want to get an estimate for each "flavour" we are now using $mu_1$ through $mu_5$. I would like to know if there is a better method to estimate how many of each "flavour" we'll get.
$endgroup$
– Zegpi
Apr 2 '13 at 22:00
$begingroup$
I do no think that you can do beter then using the distirubtion of each flavour. If for example you would know that $sigma_1=sigma_2$ you could use a pooled variance estimator to get a better estimate $s_{pooled}$ for $sigma_1$ and $sigma_2$. The same reasoning holds for $mu$.
$endgroup$
– MrOperator
Apr 3 '13 at 10:14
$begingroup$
That's what I thought, but I hoped someone could have a better idea.
$endgroup$
– Zegpi
Apr 3 '13 at 11:46
add a comment |
$begingroup$
I find the question a bit unclear but since you have an estimate of the mean and standard deviation for a single package, assuming the candy in each package is independent why not use, $E(X_1+X_2)=E(X_1)+E(X_2)$ and $Var(X_1+X_2) = Var(X_1) + Var(X_2) + 2*cov(X_1,X_2)$ ?
$endgroup$
$begingroup$
Since we want to get an estimate for each "flavour" we are now using $mu_1$ through $mu_5$. I would like to know if there is a better method to estimate how many of each "flavour" we'll get.
$endgroup$
– Zegpi
Apr 2 '13 at 22:00
$begingroup$
I do no think that you can do beter then using the distirubtion of each flavour. If for example you would know that $sigma_1=sigma_2$ you could use a pooled variance estimator to get a better estimate $s_{pooled}$ for $sigma_1$ and $sigma_2$. The same reasoning holds for $mu$.
$endgroup$
– MrOperator
Apr 3 '13 at 10:14
$begingroup$
That's what I thought, but I hoped someone could have a better idea.
$endgroup$
– Zegpi
Apr 3 '13 at 11:46
add a comment |
$begingroup$
I find the question a bit unclear but since you have an estimate of the mean and standard deviation for a single package, assuming the candy in each package is independent why not use, $E(X_1+X_2)=E(X_1)+E(X_2)$ and $Var(X_1+X_2) = Var(X_1) + Var(X_2) + 2*cov(X_1,X_2)$ ?
$endgroup$
I find the question a bit unclear but since you have an estimate of the mean and standard deviation for a single package, assuming the candy in each package is independent why not use, $E(X_1+X_2)=E(X_1)+E(X_2)$ and $Var(X_1+X_2) = Var(X_1) + Var(X_2) + 2*cov(X_1,X_2)$ ?
answered Apr 2 '13 at 21:47
MrOperatorMrOperator
1886
1886
$begingroup$
Since we want to get an estimate for each "flavour" we are now using $mu_1$ through $mu_5$. I would like to know if there is a better method to estimate how many of each "flavour" we'll get.
$endgroup$
– Zegpi
Apr 2 '13 at 22:00
$begingroup$
I do no think that you can do beter then using the distirubtion of each flavour. If for example you would know that $sigma_1=sigma_2$ you could use a pooled variance estimator to get a better estimate $s_{pooled}$ for $sigma_1$ and $sigma_2$. The same reasoning holds for $mu$.
$endgroup$
– MrOperator
Apr 3 '13 at 10:14
$begingroup$
That's what I thought, but I hoped someone could have a better idea.
$endgroup$
– Zegpi
Apr 3 '13 at 11:46
add a comment |
$begingroup$
Since we want to get an estimate for each "flavour" we are now using $mu_1$ through $mu_5$. I would like to know if there is a better method to estimate how many of each "flavour" we'll get.
$endgroup$
– Zegpi
Apr 2 '13 at 22:00
$begingroup$
I do no think that you can do beter then using the distirubtion of each flavour. If for example you would know that $sigma_1=sigma_2$ you could use a pooled variance estimator to get a better estimate $s_{pooled}$ for $sigma_1$ and $sigma_2$. The same reasoning holds for $mu$.
$endgroup$
– MrOperator
Apr 3 '13 at 10:14
$begingroup$
That's what I thought, but I hoped someone could have a better idea.
$endgroup$
– Zegpi
Apr 3 '13 at 11:46
$begingroup$
Since we want to get an estimate for each "flavour" we are now using $mu_1$ through $mu_5$. I would like to know if there is a better method to estimate how many of each "flavour" we'll get.
$endgroup$
– Zegpi
Apr 2 '13 at 22:00
$begingroup$
Since we want to get an estimate for each "flavour" we are now using $mu_1$ through $mu_5$. I would like to know if there is a better method to estimate how many of each "flavour" we'll get.
$endgroup$
– Zegpi
Apr 2 '13 at 22:00
$begingroup$
I do no think that you can do beter then using the distirubtion of each flavour. If for example you would know that $sigma_1=sigma_2$ you could use a pooled variance estimator to get a better estimate $s_{pooled}$ for $sigma_1$ and $sigma_2$. The same reasoning holds for $mu$.
$endgroup$
– MrOperator
Apr 3 '13 at 10:14
$begingroup$
I do no think that you can do beter then using the distirubtion of each flavour. If for example you would know that $sigma_1=sigma_2$ you could use a pooled variance estimator to get a better estimate $s_{pooled}$ for $sigma_1$ and $sigma_2$. The same reasoning holds for $mu$.
$endgroup$
– MrOperator
Apr 3 '13 at 10:14
$begingroup$
That's what I thought, but I hoped someone could have a better idea.
$endgroup$
– Zegpi
Apr 3 '13 at 11:46
$begingroup$
That's what I thought, but I hoped someone could have a better idea.
$endgroup$
– Zegpi
Apr 3 '13 at 11:46
add a comment |
$begingroup$
If the batches are independent, there is nothing you can do other than taking the mean, which is (probably) an unbiaised estimator. It is a common misconception that in random phenomena, the past influences the future and allows predictions (such as one million tails in a row have more chance to be followed by a head). This is false.
$endgroup$
add a comment |
$begingroup$
If the batches are independent, there is nothing you can do other than taking the mean, which is (probably) an unbiaised estimator. It is a common misconception that in random phenomena, the past influences the future and allows predictions (such as one million tails in a row have more chance to be followed by a head). This is false.
$endgroup$
add a comment |
$begingroup$
If the batches are independent, there is nothing you can do other than taking the mean, which is (probably) an unbiaised estimator. It is a common misconception that in random phenomena, the past influences the future and allows predictions (such as one million tails in a row have more chance to be followed by a head). This is false.
$endgroup$
If the batches are independent, there is nothing you can do other than taking the mean, which is (probably) an unbiaised estimator. It is a common misconception that in random phenomena, the past influences the future and allows predictions (such as one million tails in a row have more chance to be followed by a head). This is false.
answered Dec 13 '18 at 20:18
Yves DaoustYves Daoust
128k674226
128k674226
add a comment |
add a comment |
$begingroup$
It depends on your definition of "better."
You need to define your risk function. If your risk function is MSE, you can do better than simply using the sample means. The idea is to use shrinkage, which as the name suggests means to shrink all your $mu_i$ estimates slightly towards 0. The amount of shrinkage should be proportional to the sample variance $s^2$ of your data (noisier data calls for more shrinkage) and inversely proportional to the number of data points $n$ that you collect. Note that the James-Stein estimator is only better for $m ge 3$ flavors. In general, some form of regularization is always wise in empirical problems.
$endgroup$
add a comment |
$begingroup$
It depends on your definition of "better."
You need to define your risk function. If your risk function is MSE, you can do better than simply using the sample means. The idea is to use shrinkage, which as the name suggests means to shrink all your $mu_i$ estimates slightly towards 0. The amount of shrinkage should be proportional to the sample variance $s^2$ of your data (noisier data calls for more shrinkage) and inversely proportional to the number of data points $n$ that you collect. Note that the James-Stein estimator is only better for $m ge 3$ flavors. In general, some form of regularization is always wise in empirical problems.
$endgroup$
add a comment |
$begingroup$
It depends on your definition of "better."
You need to define your risk function. If your risk function is MSE, you can do better than simply using the sample means. The idea is to use shrinkage, which as the name suggests means to shrink all your $mu_i$ estimates slightly towards 0. The amount of shrinkage should be proportional to the sample variance $s^2$ of your data (noisier data calls for more shrinkage) and inversely proportional to the number of data points $n$ that you collect. Note that the James-Stein estimator is only better for $m ge 3$ flavors. In general, some form of regularization is always wise in empirical problems.
$endgroup$
It depends on your definition of "better."
You need to define your risk function. If your risk function is MSE, you can do better than simply using the sample means. The idea is to use shrinkage, which as the name suggests means to shrink all your $mu_i$ estimates slightly towards 0. The amount of shrinkage should be proportional to the sample variance $s^2$ of your data (noisier data calls for more shrinkage) and inversely proportional to the number of data points $n$ that you collect. Note that the James-Stein estimator is only better for $m ge 3$ flavors. In general, some form of regularization is always wise in empirical problems.
edited Dec 13 '18 at 20:28
answered Dec 13 '18 at 20:22
zoidbergzoidberg
1,070113
1,070113
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f349483%2fbest-estimate-for-random-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Is "batch" synonymous with "package"? If so, the question would be considerably clearer if you used the same word again. If not, please clarify the difference.
$endgroup$
– joriki
Apr 2 '13 at 21:34
$begingroup$
I'm sorry, I forgot to add that a batch is several packages, I don't think it matters as we'll evaluate per "package"
$endgroup$
– Zegpi
Apr 2 '13 at 21:52