Infer the variance of a distribution
$begingroup$
Assume that the distribution of firms' income is log-normal, and that we know the sum of the all the revenues and the number of firms in the market (so the mean of the revenue distribution is known).
Unfortunately, we are unable to observe any individual firm's revenue, but we know how many firms within a certain range of revenue. The bar plot is this: Firm's Revenue - Bar Plot
The following is the table showing how much firms in each bin of revenue:
Revenue Number of Firms
$0 - 999 | 480
$1,000 - 1,999 | 2000
$2,000 - 3,999 | 1600
$4,000 - 5,999 | 1200
$6,000 - 7,999 | 800
$8,000 - 9,999 | 680
$10,000 - 14,999 | 300
$15,000 + | 120
Is there any way that we can infer the variance of the firm revenue distribution based on this information? Thank you very much!
statistics estimation
$endgroup$
add a comment |
$begingroup$
Assume that the distribution of firms' income is log-normal, and that we know the sum of the all the revenues and the number of firms in the market (so the mean of the revenue distribution is known).
Unfortunately, we are unable to observe any individual firm's revenue, but we know how many firms within a certain range of revenue. The bar plot is this: Firm's Revenue - Bar Plot
The following is the table showing how much firms in each bin of revenue:
Revenue Number of Firms
$0 - 999 | 480
$1,000 - 1,999 | 2000
$2,000 - 3,999 | 1600
$4,000 - 5,999 | 1200
$6,000 - 7,999 | 800
$8,000 - 9,999 | 680
$10,000 - 14,999 | 300
$15,000 + | 120
Is there any way that we can infer the variance of the firm revenue distribution based on this information? Thank you very much!
statistics estimation
$endgroup$
$begingroup$
You can always fit the distribution to the data then read off the variance. Maybe there is a smarter way.
$endgroup$
– user121049
Dec 21 '18 at 8:35
$begingroup$
Can you explain in a little bit more detail? Thank you.
$endgroup$
– Andy
Dec 23 '18 at 3:20
add a comment |
$begingroup$
Assume that the distribution of firms' income is log-normal, and that we know the sum of the all the revenues and the number of firms in the market (so the mean of the revenue distribution is known).
Unfortunately, we are unable to observe any individual firm's revenue, but we know how many firms within a certain range of revenue. The bar plot is this: Firm's Revenue - Bar Plot
The following is the table showing how much firms in each bin of revenue:
Revenue Number of Firms
$0 - 999 | 480
$1,000 - 1,999 | 2000
$2,000 - 3,999 | 1600
$4,000 - 5,999 | 1200
$6,000 - 7,999 | 800
$8,000 - 9,999 | 680
$10,000 - 14,999 | 300
$15,000 + | 120
Is there any way that we can infer the variance of the firm revenue distribution based on this information? Thank you very much!
statistics estimation
$endgroup$
Assume that the distribution of firms' income is log-normal, and that we know the sum of the all the revenues and the number of firms in the market (so the mean of the revenue distribution is known).
Unfortunately, we are unable to observe any individual firm's revenue, but we know how many firms within a certain range of revenue. The bar plot is this: Firm's Revenue - Bar Plot
The following is the table showing how much firms in each bin of revenue:
Revenue Number of Firms
$0 - 999 | 480
$1,000 - 1,999 | 2000
$2,000 - 3,999 | 1600
$4,000 - 5,999 | 1200
$6,000 - 7,999 | 800
$8,000 - 9,999 | 680
$10,000 - 14,999 | 300
$15,000 + | 120
Is there any way that we can infer the variance of the firm revenue distribution based on this information? Thank you very much!
statistics estimation
statistics estimation
asked Dec 21 '18 at 5:26
AndyAndy
54
54
$begingroup$
You can always fit the distribution to the data then read off the variance. Maybe there is a smarter way.
$endgroup$
– user121049
Dec 21 '18 at 8:35
$begingroup$
Can you explain in a little bit more detail? Thank you.
$endgroup$
– Andy
Dec 23 '18 at 3:20
add a comment |
$begingroup$
You can always fit the distribution to the data then read off the variance. Maybe there is a smarter way.
$endgroup$
– user121049
Dec 21 '18 at 8:35
$begingroup$
Can you explain in a little bit more detail? Thank you.
$endgroup$
– Andy
Dec 23 '18 at 3:20
$begingroup$
You can always fit the distribution to the data then read off the variance. Maybe there is a smarter way.
$endgroup$
– user121049
Dec 21 '18 at 8:35
$begingroup$
You can always fit the distribution to the data then read off the variance. Maybe there is a smarter way.
$endgroup$
– user121049
Dec 21 '18 at 8:35
$begingroup$
Can you explain in a little bit more detail? Thank you.
$endgroup$
– Andy
Dec 23 '18 at 3:20
$begingroup$
Can you explain in a little bit more detail? Thank you.
$endgroup$
– Andy
Dec 23 '18 at 3:20
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
You know it is log normal and you know the mean so you are just need to know the parameter $sigma$. When you know this you can calculate the variance. See the formula in the Wikipedia page https://en.wikipedia.org/wiki/Log-normal_distribution.
The below method is a quick and easy way of estimating $sigma$ using a spreadsheet.
A histogram is just an approximation to the probability density function. If you knew $sigma$ you could reproduce the histogram as each bar is given by integral of the PDF over the size of the histogram bar. This integral can be calculated using the cumulative density function (see Wiki page again). Essentially Bar(a to b)$=CDF(b)-CDF(a)$. Excel can compute $erf(x)$. See https://support.office.com/en-us/article/ERF-function-C53C7E7B-5482-4B6C-883E-56DF3C9AF349.
[Edit: I had forgotten that the mean of a log normal depends on $sigma$. That is OK, you just need to set the parameter $mu=log{(ObservedMean)}-frac{sigma^2}{2}$]
A way of estimating $sigma$ is to do a sort of regression fit. First take a guess for $sigma$ and calculate the histogram it gives you as above. Now take the difference between each calculated bar and the actual bar and square it. Add this number up for all the bars.
This is a measure of the error between your estimated histogram and the actual one. You want to minimise this which you can do by changing the value of $sigma$. You can use either Excel's Goal Seek or Excel's Solver to do this for you.
You will now have a value of $sigma$ in which case the two histograms should look more of less the same. If they don't then either your mean is wrong or the distribution is not log normal.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3048206%2finfer-the-variance-of-a-distribution%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You know it is log normal and you know the mean so you are just need to know the parameter $sigma$. When you know this you can calculate the variance. See the formula in the Wikipedia page https://en.wikipedia.org/wiki/Log-normal_distribution.
The below method is a quick and easy way of estimating $sigma$ using a spreadsheet.
A histogram is just an approximation to the probability density function. If you knew $sigma$ you could reproduce the histogram as each bar is given by integral of the PDF over the size of the histogram bar. This integral can be calculated using the cumulative density function (see Wiki page again). Essentially Bar(a to b)$=CDF(b)-CDF(a)$. Excel can compute $erf(x)$. See https://support.office.com/en-us/article/ERF-function-C53C7E7B-5482-4B6C-883E-56DF3C9AF349.
[Edit: I had forgotten that the mean of a log normal depends on $sigma$. That is OK, you just need to set the parameter $mu=log{(ObservedMean)}-frac{sigma^2}{2}$]
A way of estimating $sigma$ is to do a sort of regression fit. First take a guess for $sigma$ and calculate the histogram it gives you as above. Now take the difference between each calculated bar and the actual bar and square it. Add this number up for all the bars.
This is a measure of the error between your estimated histogram and the actual one. You want to minimise this which you can do by changing the value of $sigma$. You can use either Excel's Goal Seek or Excel's Solver to do this for you.
You will now have a value of $sigma$ in which case the two histograms should look more of less the same. If they don't then either your mean is wrong or the distribution is not log normal.
$endgroup$
add a comment |
$begingroup$
You know it is log normal and you know the mean so you are just need to know the parameter $sigma$. When you know this you can calculate the variance. See the formula in the Wikipedia page https://en.wikipedia.org/wiki/Log-normal_distribution.
The below method is a quick and easy way of estimating $sigma$ using a spreadsheet.
A histogram is just an approximation to the probability density function. If you knew $sigma$ you could reproduce the histogram as each bar is given by integral of the PDF over the size of the histogram bar. This integral can be calculated using the cumulative density function (see Wiki page again). Essentially Bar(a to b)$=CDF(b)-CDF(a)$. Excel can compute $erf(x)$. See https://support.office.com/en-us/article/ERF-function-C53C7E7B-5482-4B6C-883E-56DF3C9AF349.
[Edit: I had forgotten that the mean of a log normal depends on $sigma$. That is OK, you just need to set the parameter $mu=log{(ObservedMean)}-frac{sigma^2}{2}$]
A way of estimating $sigma$ is to do a sort of regression fit. First take a guess for $sigma$ and calculate the histogram it gives you as above. Now take the difference between each calculated bar and the actual bar and square it. Add this number up for all the bars.
This is a measure of the error between your estimated histogram and the actual one. You want to minimise this which you can do by changing the value of $sigma$. You can use either Excel's Goal Seek or Excel's Solver to do this for you.
You will now have a value of $sigma$ in which case the two histograms should look more of less the same. If they don't then either your mean is wrong or the distribution is not log normal.
$endgroup$
add a comment |
$begingroup$
You know it is log normal and you know the mean so you are just need to know the parameter $sigma$. When you know this you can calculate the variance. See the formula in the Wikipedia page https://en.wikipedia.org/wiki/Log-normal_distribution.
The below method is a quick and easy way of estimating $sigma$ using a spreadsheet.
A histogram is just an approximation to the probability density function. If you knew $sigma$ you could reproduce the histogram as each bar is given by integral of the PDF over the size of the histogram bar. This integral can be calculated using the cumulative density function (see Wiki page again). Essentially Bar(a to b)$=CDF(b)-CDF(a)$. Excel can compute $erf(x)$. See https://support.office.com/en-us/article/ERF-function-C53C7E7B-5482-4B6C-883E-56DF3C9AF349.
[Edit: I had forgotten that the mean of a log normal depends on $sigma$. That is OK, you just need to set the parameter $mu=log{(ObservedMean)}-frac{sigma^2}{2}$]
A way of estimating $sigma$ is to do a sort of regression fit. First take a guess for $sigma$ and calculate the histogram it gives you as above. Now take the difference between each calculated bar and the actual bar and square it. Add this number up for all the bars.
This is a measure of the error between your estimated histogram and the actual one. You want to minimise this which you can do by changing the value of $sigma$. You can use either Excel's Goal Seek or Excel's Solver to do this for you.
You will now have a value of $sigma$ in which case the two histograms should look more of less the same. If they don't then either your mean is wrong or the distribution is not log normal.
$endgroup$
You know it is log normal and you know the mean so you are just need to know the parameter $sigma$. When you know this you can calculate the variance. See the formula in the Wikipedia page https://en.wikipedia.org/wiki/Log-normal_distribution.
The below method is a quick and easy way of estimating $sigma$ using a spreadsheet.
A histogram is just an approximation to the probability density function. If you knew $sigma$ you could reproduce the histogram as each bar is given by integral of the PDF over the size of the histogram bar. This integral can be calculated using the cumulative density function (see Wiki page again). Essentially Bar(a to b)$=CDF(b)-CDF(a)$. Excel can compute $erf(x)$. See https://support.office.com/en-us/article/ERF-function-C53C7E7B-5482-4B6C-883E-56DF3C9AF349.
[Edit: I had forgotten that the mean of a log normal depends on $sigma$. That is OK, you just need to set the parameter $mu=log{(ObservedMean)}-frac{sigma^2}{2}$]
A way of estimating $sigma$ is to do a sort of regression fit. First take a guess for $sigma$ and calculate the histogram it gives you as above. Now take the difference between each calculated bar and the actual bar and square it. Add this number up for all the bars.
This is a measure of the error between your estimated histogram and the actual one. You want to minimise this which you can do by changing the value of $sigma$. You can use either Excel's Goal Seek or Excel's Solver to do this for you.
You will now have a value of $sigma$ in which case the two histograms should look more of less the same. If they don't then either your mean is wrong or the distribution is not log normal.
edited Dec 24 '18 at 8:06
answered Dec 23 '18 at 9:51
user121049user121049
1,362174
1,362174
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3048206%2finfer-the-variance-of-a-distribution%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
You can always fit the distribution to the data then read off the variance. Maybe there is a smarter way.
$endgroup$
– user121049
Dec 21 '18 at 8:35
$begingroup$
Can you explain in a little bit more detail? Thank you.
$endgroup$
– Andy
Dec 23 '18 at 3:20