Empirical distribution of sorted Gaussian numbers












1












$begingroup$


I wrote a small program that does the following :




  1. Pick $N$ independent standard Gaussian numbers (expected value : 0, standard deviation : 1). Call that list $L={y_1, ldots, y_N}$.

  2. Sort that list in increasing order : $tilde{L}=mathrm{sort}(L)$.

  3. Plot that list on the $[-1,1]$ interval using a regularly distributed grid $x_i=-1+frac{2i}{N-1}$, with $i=0,ldots,N-1$.


I found that the plot was similar to that of the inverse error function, only differing by a multiplicative factor $a>0$.



Inverse error function & sorted Gaussian numbers plot, with N=10^5



I made a linear regression to find an approximate value of $1.42104$ for $a$. The two functions are very close for $N=10^5$ :
Fitted inverse error function and sorted Gaussian numbers for N=10^5



I have two questions :




  1. What is the exact value of $a$ ?

  2. How to prove that the limit function is indeed $a*mathrm{inverf}$ as $Nto infty$ ?










share|cite|improve this question











$endgroup$








  • 1




    $begingroup$
    What about $sqrt 2$ ?
    $endgroup$
    – Claude Leibovici
    Jan 4 at 9:07
















1












$begingroup$


I wrote a small program that does the following :




  1. Pick $N$ independent standard Gaussian numbers (expected value : 0, standard deviation : 1). Call that list $L={y_1, ldots, y_N}$.

  2. Sort that list in increasing order : $tilde{L}=mathrm{sort}(L)$.

  3. Plot that list on the $[-1,1]$ interval using a regularly distributed grid $x_i=-1+frac{2i}{N-1}$, with $i=0,ldots,N-1$.


I found that the plot was similar to that of the inverse error function, only differing by a multiplicative factor $a>0$.



Inverse error function & sorted Gaussian numbers plot, with N=10^5



I made a linear regression to find an approximate value of $1.42104$ for $a$. The two functions are very close for $N=10^5$ :
Fitted inverse error function and sorted Gaussian numbers for N=10^5



I have two questions :




  1. What is the exact value of $a$ ?

  2. How to prove that the limit function is indeed $a*mathrm{inverf}$ as $Nto infty$ ?










share|cite|improve this question











$endgroup$








  • 1




    $begingroup$
    What about $sqrt 2$ ?
    $endgroup$
    – Claude Leibovici
    Jan 4 at 9:07














1












1








1





$begingroup$


I wrote a small program that does the following :




  1. Pick $N$ independent standard Gaussian numbers (expected value : 0, standard deviation : 1). Call that list $L={y_1, ldots, y_N}$.

  2. Sort that list in increasing order : $tilde{L}=mathrm{sort}(L)$.

  3. Plot that list on the $[-1,1]$ interval using a regularly distributed grid $x_i=-1+frac{2i}{N-1}$, with $i=0,ldots,N-1$.


I found that the plot was similar to that of the inverse error function, only differing by a multiplicative factor $a>0$.



Inverse error function & sorted Gaussian numbers plot, with N=10^5



I made a linear regression to find an approximate value of $1.42104$ for $a$. The two functions are very close for $N=10^5$ :
Fitted inverse error function and sorted Gaussian numbers for N=10^5



I have two questions :




  1. What is the exact value of $a$ ?

  2. How to prove that the limit function is indeed $a*mathrm{inverf}$ as $Nto infty$ ?










share|cite|improve this question











$endgroup$




I wrote a small program that does the following :




  1. Pick $N$ independent standard Gaussian numbers (expected value : 0, standard deviation : 1). Call that list $L={y_1, ldots, y_N}$.

  2. Sort that list in increasing order : $tilde{L}=mathrm{sort}(L)$.

  3. Plot that list on the $[-1,1]$ interval using a regularly distributed grid $x_i=-1+frac{2i}{N-1}$, with $i=0,ldots,N-1$.


I found that the plot was similar to that of the inverse error function, only differing by a multiplicative factor $a>0$.



Inverse error function & sorted Gaussian numbers plot, with N=10^5



I made a linear regression to find an approximate value of $1.42104$ for $a$. The two functions are very close for $N=10^5$ :
Fitted inverse error function and sorted Gaussian numbers for N=10^5



I have two questions :




  1. What is the exact value of $a$ ?

  2. How to prove that the limit function is indeed $a*mathrm{inverf}$ as $Nto infty$ ?







probability normal-distribution probability-limit-theorems sorting






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jan 3 at 13:22







Florian Omnès

















asked Jan 3 at 10:32









Florian OmnèsFlorian Omnès

266




266








  • 1




    $begingroup$
    What about $sqrt 2$ ?
    $endgroup$
    – Claude Leibovici
    Jan 4 at 9:07














  • 1




    $begingroup$
    What about $sqrt 2$ ?
    $endgroup$
    – Claude Leibovici
    Jan 4 at 9:07








1




1




$begingroup$
What about $sqrt 2$ ?
$endgroup$
– Claude Leibovici
Jan 4 at 9:07




$begingroup$
What about $sqrt 2$ ?
$endgroup$
– Claude Leibovici
Jan 4 at 9:07










1 Answer
1






active

oldest

votes


















1












$begingroup$

What you observe is the convergence of the empirical distribution function to cumulative distribution function of the sample - more accurately, of the empirical quantiles to theoretical quantiles (= values of inverse cumulative distribution function).



Specifically, for a continuous increasing cdf, theoretical quantiles are given by
$$
x_q = F^{-1}(q) = sup{xinmathbb R: F(x)<q}, qin(0,1).
$$

The definition of empirical quantiles varies. For a sample $X_1,dots,X_n$ of iid variables they can e.g. be defined by
$$
hat x_q = X_{(lfloor nqrfloor +1)}, qin (0,1),
$$

where $X_{(1)}le dotsle X_{(n)}$ is the sorted sample. It is known that whenever the cdf $F$ is strictly increasing, $hat x_q to x_q$ for all $qin (0,1)$ with probability $1$ as the sample size $ntoinfty$.



In your case, $F(x) = Phi(x)$ is the standart normal cdf, which is related to the error function by $$Phi(x) = frac{1+mathrm{Erf}(x/sqrt{2})}{2},$$
so
$$
Phi^{-1}(y) = sqrt{2}operatorname{Erf}^{-1}(2y-1), yin (0,1).
$$

What you are doing is applying a similar transformation to the empirical quantiles, so the convergence to $sqrt{2}operatorname{Erf}^{-1} approx 1.4142 operatorname{Erf}^{-1}$ is not surprising.






share|cite|improve this answer









$endgroup$














    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3060431%2fempirical-distribution-of-sorted-gaussian-numbers%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1












    $begingroup$

    What you observe is the convergence of the empirical distribution function to cumulative distribution function of the sample - more accurately, of the empirical quantiles to theoretical quantiles (= values of inverse cumulative distribution function).



    Specifically, for a continuous increasing cdf, theoretical quantiles are given by
    $$
    x_q = F^{-1}(q) = sup{xinmathbb R: F(x)<q}, qin(0,1).
    $$

    The definition of empirical quantiles varies. For a sample $X_1,dots,X_n$ of iid variables they can e.g. be defined by
    $$
    hat x_q = X_{(lfloor nqrfloor +1)}, qin (0,1),
    $$

    where $X_{(1)}le dotsle X_{(n)}$ is the sorted sample. It is known that whenever the cdf $F$ is strictly increasing, $hat x_q to x_q$ for all $qin (0,1)$ with probability $1$ as the sample size $ntoinfty$.



    In your case, $F(x) = Phi(x)$ is the standart normal cdf, which is related to the error function by $$Phi(x) = frac{1+mathrm{Erf}(x/sqrt{2})}{2},$$
    so
    $$
    Phi^{-1}(y) = sqrt{2}operatorname{Erf}^{-1}(2y-1), yin (0,1).
    $$

    What you are doing is applying a similar transformation to the empirical quantiles, so the convergence to $sqrt{2}operatorname{Erf}^{-1} approx 1.4142 operatorname{Erf}^{-1}$ is not surprising.






    share|cite|improve this answer









    $endgroup$


















      1












      $begingroup$

      What you observe is the convergence of the empirical distribution function to cumulative distribution function of the sample - more accurately, of the empirical quantiles to theoretical quantiles (= values of inverse cumulative distribution function).



      Specifically, for a continuous increasing cdf, theoretical quantiles are given by
      $$
      x_q = F^{-1}(q) = sup{xinmathbb R: F(x)<q}, qin(0,1).
      $$

      The definition of empirical quantiles varies. For a sample $X_1,dots,X_n$ of iid variables they can e.g. be defined by
      $$
      hat x_q = X_{(lfloor nqrfloor +1)}, qin (0,1),
      $$

      where $X_{(1)}le dotsle X_{(n)}$ is the sorted sample. It is known that whenever the cdf $F$ is strictly increasing, $hat x_q to x_q$ for all $qin (0,1)$ with probability $1$ as the sample size $ntoinfty$.



      In your case, $F(x) = Phi(x)$ is the standart normal cdf, which is related to the error function by $$Phi(x) = frac{1+mathrm{Erf}(x/sqrt{2})}{2},$$
      so
      $$
      Phi^{-1}(y) = sqrt{2}operatorname{Erf}^{-1}(2y-1), yin (0,1).
      $$

      What you are doing is applying a similar transformation to the empirical quantiles, so the convergence to $sqrt{2}operatorname{Erf}^{-1} approx 1.4142 operatorname{Erf}^{-1}$ is not surprising.






      share|cite|improve this answer









      $endgroup$
















        1












        1








        1





        $begingroup$

        What you observe is the convergence of the empirical distribution function to cumulative distribution function of the sample - more accurately, of the empirical quantiles to theoretical quantiles (= values of inverse cumulative distribution function).



        Specifically, for a continuous increasing cdf, theoretical quantiles are given by
        $$
        x_q = F^{-1}(q) = sup{xinmathbb R: F(x)<q}, qin(0,1).
        $$

        The definition of empirical quantiles varies. For a sample $X_1,dots,X_n$ of iid variables they can e.g. be defined by
        $$
        hat x_q = X_{(lfloor nqrfloor +1)}, qin (0,1),
        $$

        where $X_{(1)}le dotsle X_{(n)}$ is the sorted sample. It is known that whenever the cdf $F$ is strictly increasing, $hat x_q to x_q$ for all $qin (0,1)$ with probability $1$ as the sample size $ntoinfty$.



        In your case, $F(x) = Phi(x)$ is the standart normal cdf, which is related to the error function by $$Phi(x) = frac{1+mathrm{Erf}(x/sqrt{2})}{2},$$
        so
        $$
        Phi^{-1}(y) = sqrt{2}operatorname{Erf}^{-1}(2y-1), yin (0,1).
        $$

        What you are doing is applying a similar transformation to the empirical quantiles, so the convergence to $sqrt{2}operatorname{Erf}^{-1} approx 1.4142 operatorname{Erf}^{-1}$ is not surprising.






        share|cite|improve this answer









        $endgroup$



        What you observe is the convergence of the empirical distribution function to cumulative distribution function of the sample - more accurately, of the empirical quantiles to theoretical quantiles (= values of inverse cumulative distribution function).



        Specifically, for a continuous increasing cdf, theoretical quantiles are given by
        $$
        x_q = F^{-1}(q) = sup{xinmathbb R: F(x)<q}, qin(0,1).
        $$

        The definition of empirical quantiles varies. For a sample $X_1,dots,X_n$ of iid variables they can e.g. be defined by
        $$
        hat x_q = X_{(lfloor nqrfloor +1)}, qin (0,1),
        $$

        where $X_{(1)}le dotsle X_{(n)}$ is the sorted sample. It is known that whenever the cdf $F$ is strictly increasing, $hat x_q to x_q$ for all $qin (0,1)$ with probability $1$ as the sample size $ntoinfty$.



        In your case, $F(x) = Phi(x)$ is the standart normal cdf, which is related to the error function by $$Phi(x) = frac{1+mathrm{Erf}(x/sqrt{2})}{2},$$
        so
        $$
        Phi^{-1}(y) = sqrt{2}operatorname{Erf}^{-1}(2y-1), yin (0,1).
        $$

        What you are doing is applying a similar transformation to the empirical quantiles, so the convergence to $sqrt{2}operatorname{Erf}^{-1} approx 1.4142 operatorname{Erf}^{-1}$ is not surprising.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Jan 4 at 9:22









        zhorasterzhoraster

        16k21853




        16k21853






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3060431%2fempirical-distribution-of-sorted-gaussian-numbers%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How do I know what Microsoft account the skydrive app is syncing to?

            When does type information flow backwards in C++?

            Grease: Live!