Infer the variance of a distribution












0












$begingroup$


Assume that the distribution of firms' income is log-normal, and that we know the sum of the all the revenues and the number of firms in the market (so the mean of the revenue distribution is known).



Unfortunately, we are unable to observe any individual firm's revenue, but we know how many firms within a certain range of revenue. The bar plot is this: Firm's Revenue - Bar Plot



The following is the table showing how much firms in each bin of revenue:



     Revenue              Number of Firms
$0 - 999 | 480
$
1,000 - 1,999 | 2000
$2,000 - 3,999 | 1600
$
4,000 - 5,999 | 1200
$6,000 - 7,999 | 800
$
8,000 - 9,999 | 680
$10,000 - 14,999 | 300
$
15,000 + | 120


Is there any way that we can infer the variance of the firm revenue distribution based on this information? Thank you very much!










share|cite|improve this question









$endgroup$












  • $begingroup$
    You can always fit the distribution to the data then read off the variance. Maybe there is a smarter way.
    $endgroup$
    – user121049
    Dec 21 '18 at 8:35










  • $begingroup$
    Can you explain in a little bit more detail? Thank you.
    $endgroup$
    – Andy
    Dec 23 '18 at 3:20
















0












$begingroup$


Assume that the distribution of firms' income is log-normal, and that we know the sum of the all the revenues and the number of firms in the market (so the mean of the revenue distribution is known).



Unfortunately, we are unable to observe any individual firm's revenue, but we know how many firms within a certain range of revenue. The bar plot is this: Firm's Revenue - Bar Plot



The following is the table showing how much firms in each bin of revenue:



     Revenue              Number of Firms
$0 - 999 | 480
$
1,000 - 1,999 | 2000
$2,000 - 3,999 | 1600
$
4,000 - 5,999 | 1200
$6,000 - 7,999 | 800
$
8,000 - 9,999 | 680
$10,000 - 14,999 | 300
$
15,000 + | 120


Is there any way that we can infer the variance of the firm revenue distribution based on this information? Thank you very much!










share|cite|improve this question









$endgroup$












  • $begingroup$
    You can always fit the distribution to the data then read off the variance. Maybe there is a smarter way.
    $endgroup$
    – user121049
    Dec 21 '18 at 8:35










  • $begingroup$
    Can you explain in a little bit more detail? Thank you.
    $endgroup$
    – Andy
    Dec 23 '18 at 3:20














0












0








0


0



$begingroup$


Assume that the distribution of firms' income is log-normal, and that we know the sum of the all the revenues and the number of firms in the market (so the mean of the revenue distribution is known).



Unfortunately, we are unable to observe any individual firm's revenue, but we know how many firms within a certain range of revenue. The bar plot is this: Firm's Revenue - Bar Plot



The following is the table showing how much firms in each bin of revenue:



     Revenue              Number of Firms
$0 - 999 | 480
$
1,000 - 1,999 | 2000
$2,000 - 3,999 | 1600
$
4,000 - 5,999 | 1200
$6,000 - 7,999 | 800
$
8,000 - 9,999 | 680
$10,000 - 14,999 | 300
$
15,000 + | 120


Is there any way that we can infer the variance of the firm revenue distribution based on this information? Thank you very much!










share|cite|improve this question









$endgroup$




Assume that the distribution of firms' income is log-normal, and that we know the sum of the all the revenues and the number of firms in the market (so the mean of the revenue distribution is known).



Unfortunately, we are unable to observe any individual firm's revenue, but we know how many firms within a certain range of revenue. The bar plot is this: Firm's Revenue - Bar Plot



The following is the table showing how much firms in each bin of revenue:



     Revenue              Number of Firms
$0 - 999 | 480
$
1,000 - 1,999 | 2000
$2,000 - 3,999 | 1600
$
4,000 - 5,999 | 1200
$6,000 - 7,999 | 800
$
8,000 - 9,999 | 680
$10,000 - 14,999 | 300
$
15,000 + | 120


Is there any way that we can infer the variance of the firm revenue distribution based on this information? Thank you very much!







statistics estimation






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Dec 21 '18 at 5:26









AndyAndy

54




54












  • $begingroup$
    You can always fit the distribution to the data then read off the variance. Maybe there is a smarter way.
    $endgroup$
    – user121049
    Dec 21 '18 at 8:35










  • $begingroup$
    Can you explain in a little bit more detail? Thank you.
    $endgroup$
    – Andy
    Dec 23 '18 at 3:20


















  • $begingroup$
    You can always fit the distribution to the data then read off the variance. Maybe there is a smarter way.
    $endgroup$
    – user121049
    Dec 21 '18 at 8:35










  • $begingroup$
    Can you explain in a little bit more detail? Thank you.
    $endgroup$
    – Andy
    Dec 23 '18 at 3:20
















$begingroup$
You can always fit the distribution to the data then read off the variance. Maybe there is a smarter way.
$endgroup$
– user121049
Dec 21 '18 at 8:35




$begingroup$
You can always fit the distribution to the data then read off the variance. Maybe there is a smarter way.
$endgroup$
– user121049
Dec 21 '18 at 8:35












$begingroup$
Can you explain in a little bit more detail? Thank you.
$endgroup$
– Andy
Dec 23 '18 at 3:20




$begingroup$
Can you explain in a little bit more detail? Thank you.
$endgroup$
– Andy
Dec 23 '18 at 3:20










1 Answer
1






active

oldest

votes


















0












$begingroup$

You know it is log normal and you know the mean so you are just need to know the parameter $sigma$. When you know this you can calculate the variance. See the formula in the Wikipedia page https://en.wikipedia.org/wiki/Log-normal_distribution.



The below method is a quick and easy way of estimating $sigma$ using a spreadsheet.



A histogram is just an approximation to the probability density function. If you knew $sigma$ you could reproduce the histogram as each bar is given by integral of the PDF over the size of the histogram bar. This integral can be calculated using the cumulative density function (see Wiki page again). Essentially Bar(a to b)$=CDF(b)-CDF(a)$. Excel can compute $erf(x)$. See https://support.office.com/en-us/article/ERF-function-C53C7E7B-5482-4B6C-883E-56DF3C9AF349.



[Edit: I had forgotten that the mean of a log normal depends on $sigma$. That is OK, you just need to set the parameter $mu=log{(ObservedMean)}-frac{sigma^2}{2}$]



A way of estimating $sigma$ is to do a sort of regression fit. First take a guess for $sigma$ and calculate the histogram it gives you as above. Now take the difference between each calculated bar and the actual bar and square it. Add this number up for all the bars.



This is a measure of the error between your estimated histogram and the actual one. You want to minimise this which you can do by changing the value of $sigma$. You can use either Excel's Goal Seek or Excel's Solver to do this for you.



You will now have a value of $sigma$ in which case the two histograms should look more of less the same. If they don't then either your mean is wrong or the distribution is not log normal.






share|cite|improve this answer











$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3048206%2finfer-the-variance-of-a-distribution%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    You know it is log normal and you know the mean so you are just need to know the parameter $sigma$. When you know this you can calculate the variance. See the formula in the Wikipedia page https://en.wikipedia.org/wiki/Log-normal_distribution.



    The below method is a quick and easy way of estimating $sigma$ using a spreadsheet.



    A histogram is just an approximation to the probability density function. If you knew $sigma$ you could reproduce the histogram as each bar is given by integral of the PDF over the size of the histogram bar. This integral can be calculated using the cumulative density function (see Wiki page again). Essentially Bar(a to b)$=CDF(b)-CDF(a)$. Excel can compute $erf(x)$. See https://support.office.com/en-us/article/ERF-function-C53C7E7B-5482-4B6C-883E-56DF3C9AF349.



    [Edit: I had forgotten that the mean of a log normal depends on $sigma$. That is OK, you just need to set the parameter $mu=log{(ObservedMean)}-frac{sigma^2}{2}$]



    A way of estimating $sigma$ is to do a sort of regression fit. First take a guess for $sigma$ and calculate the histogram it gives you as above. Now take the difference between each calculated bar and the actual bar and square it. Add this number up for all the bars.



    This is a measure of the error between your estimated histogram and the actual one. You want to minimise this which you can do by changing the value of $sigma$. You can use either Excel's Goal Seek or Excel's Solver to do this for you.



    You will now have a value of $sigma$ in which case the two histograms should look more of less the same. If they don't then either your mean is wrong or the distribution is not log normal.






    share|cite|improve this answer











    $endgroup$


















      0












      $begingroup$

      You know it is log normal and you know the mean so you are just need to know the parameter $sigma$. When you know this you can calculate the variance. See the formula in the Wikipedia page https://en.wikipedia.org/wiki/Log-normal_distribution.



      The below method is a quick and easy way of estimating $sigma$ using a spreadsheet.



      A histogram is just an approximation to the probability density function. If you knew $sigma$ you could reproduce the histogram as each bar is given by integral of the PDF over the size of the histogram bar. This integral can be calculated using the cumulative density function (see Wiki page again). Essentially Bar(a to b)$=CDF(b)-CDF(a)$. Excel can compute $erf(x)$. See https://support.office.com/en-us/article/ERF-function-C53C7E7B-5482-4B6C-883E-56DF3C9AF349.



      [Edit: I had forgotten that the mean of a log normal depends on $sigma$. That is OK, you just need to set the parameter $mu=log{(ObservedMean)}-frac{sigma^2}{2}$]



      A way of estimating $sigma$ is to do a sort of regression fit. First take a guess for $sigma$ and calculate the histogram it gives you as above. Now take the difference between each calculated bar and the actual bar and square it. Add this number up for all the bars.



      This is a measure of the error between your estimated histogram and the actual one. You want to minimise this which you can do by changing the value of $sigma$. You can use either Excel's Goal Seek or Excel's Solver to do this for you.



      You will now have a value of $sigma$ in which case the two histograms should look more of less the same. If they don't then either your mean is wrong or the distribution is not log normal.






      share|cite|improve this answer











      $endgroup$
















        0












        0








        0





        $begingroup$

        You know it is log normal and you know the mean so you are just need to know the parameter $sigma$. When you know this you can calculate the variance. See the formula in the Wikipedia page https://en.wikipedia.org/wiki/Log-normal_distribution.



        The below method is a quick and easy way of estimating $sigma$ using a spreadsheet.



        A histogram is just an approximation to the probability density function. If you knew $sigma$ you could reproduce the histogram as each bar is given by integral of the PDF over the size of the histogram bar. This integral can be calculated using the cumulative density function (see Wiki page again). Essentially Bar(a to b)$=CDF(b)-CDF(a)$. Excel can compute $erf(x)$. See https://support.office.com/en-us/article/ERF-function-C53C7E7B-5482-4B6C-883E-56DF3C9AF349.



        [Edit: I had forgotten that the mean of a log normal depends on $sigma$. That is OK, you just need to set the parameter $mu=log{(ObservedMean)}-frac{sigma^2}{2}$]



        A way of estimating $sigma$ is to do a sort of regression fit. First take a guess for $sigma$ and calculate the histogram it gives you as above. Now take the difference between each calculated bar and the actual bar and square it. Add this number up for all the bars.



        This is a measure of the error between your estimated histogram and the actual one. You want to minimise this which you can do by changing the value of $sigma$. You can use either Excel's Goal Seek or Excel's Solver to do this for you.



        You will now have a value of $sigma$ in which case the two histograms should look more of less the same. If they don't then either your mean is wrong or the distribution is not log normal.






        share|cite|improve this answer











        $endgroup$



        You know it is log normal and you know the mean so you are just need to know the parameter $sigma$. When you know this you can calculate the variance. See the formula in the Wikipedia page https://en.wikipedia.org/wiki/Log-normal_distribution.



        The below method is a quick and easy way of estimating $sigma$ using a spreadsheet.



        A histogram is just an approximation to the probability density function. If you knew $sigma$ you could reproduce the histogram as each bar is given by integral of the PDF over the size of the histogram bar. This integral can be calculated using the cumulative density function (see Wiki page again). Essentially Bar(a to b)$=CDF(b)-CDF(a)$. Excel can compute $erf(x)$. See https://support.office.com/en-us/article/ERF-function-C53C7E7B-5482-4B6C-883E-56DF3C9AF349.



        [Edit: I had forgotten that the mean of a log normal depends on $sigma$. That is OK, you just need to set the parameter $mu=log{(ObservedMean)}-frac{sigma^2}{2}$]



        A way of estimating $sigma$ is to do a sort of regression fit. First take a guess for $sigma$ and calculate the histogram it gives you as above. Now take the difference between each calculated bar and the actual bar and square it. Add this number up for all the bars.



        This is a measure of the error between your estimated histogram and the actual one. You want to minimise this which you can do by changing the value of $sigma$. You can use either Excel's Goal Seek or Excel's Solver to do this for you.



        You will now have a value of $sigma$ in which case the two histograms should look more of less the same. If they don't then either your mean is wrong or the distribution is not log normal.







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited Dec 24 '18 at 8:06

























        answered Dec 23 '18 at 9:51









        user121049user121049

        1,362174




        1,362174






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3048206%2finfer-the-variance-of-a-distribution%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Probability when a professor distributes a quiz and homework assignment to a class of n students.

            Aardman Animations

            Are they similar matrix