Apply doubly stochastic matrix M to a probability vector, then entropy increases?











up vote
1
down vote

favorite












Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.



Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.



$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$



Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).



Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?





Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.










share|cite|improve this question


























    up vote
    1
    down vote

    favorite












    Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
    and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.



    Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.



    $$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$



    Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).



    Question 3 What are generalizations,
    in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?





    Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.










    share|cite|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
      and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.



      Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.



      $$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$



      Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).



      Question 3 What are generalizations,
      in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?





      Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.










      share|cite|improve this question













      Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
      and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.



      Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.



      $$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$



      Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).



      Question 3 What are generalizations,
      in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?





      Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.







      co.combinatorics pr.probability it.information-theory entropy stochastic-matrices






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Dec 1 at 17:44









      Alexander Chervov

      11k1260139




      11k1260139






















          4 Answers
          4






          active

          oldest

          votes

















          up vote
          3
          down vote













          This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.



          Now, use the fact that $f=-H$ is convex.



          For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).



          Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
          $$frac1{1-alpha}logsum_jp_j^alpha.$$
          The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.






          share|cite|improve this answer























          • Thank you. That seems to cover Renyi entropy also, is not it?
            – Alexander Chervov
            Dec 1 at 19:32










          • @AlexanderChervov. Yes, see my edit.
            – Denis Serre
            Dec 2 at 7:23


















          up vote
          1
          down vote













          This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)



          Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.






          share|cite|improve this answer




























            up vote
            1
            down vote













            This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
            $$
            D(alpha|beta) ge D(alpha P|beta P) ;.
            $$

            In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
            $$
            D(alpha | m_X) = log text{card} X - H(alpha) ;.
            $$






            share|cite|improve this answer




























              up vote
              1
              down vote













              The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.



              The proof (Question 2) is quite simple:



              Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
              begin{align}
              mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
              end{align}

              Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.



              Now,
              begin{align}
              H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
              H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
              end{align}

              which implies
              begin{align}
              H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
              &= H(mathbf{x}) + I(mathbf{y};pmb{pi})
              end{align}

              where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.



              For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.






              share|cite|improve this answer





















                Your Answer





                StackExchange.ifUsing("editor", function () {
                return StackExchange.using("mathjaxEditing", function () {
                StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
                StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                });
                });
                }, "mathjax-editing");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "504"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                noCode: true, onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f316661%2fapply-doubly-stochastic-matrix-m-to-a-probability-vector-then-entropy-increases%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                4 Answers
                4






                active

                oldest

                votes








                4 Answers
                4






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                3
                down vote













                This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.



                Now, use the fact that $f=-H$ is convex.



                For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).



                Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
                $$frac1{1-alpha}logsum_jp_j^alpha.$$
                The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.






                share|cite|improve this answer























                • Thank you. That seems to cover Renyi entropy also, is not it?
                  – Alexander Chervov
                  Dec 1 at 19:32










                • @AlexanderChervov. Yes, see my edit.
                  – Denis Serre
                  Dec 2 at 7:23















                up vote
                3
                down vote













                This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.



                Now, use the fact that $f=-H$ is convex.



                For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).



                Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
                $$frac1{1-alpha}logsum_jp_j^alpha.$$
                The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.






                share|cite|improve this answer























                • Thank you. That seems to cover Renyi entropy also, is not it?
                  – Alexander Chervov
                  Dec 1 at 19:32










                • @AlexanderChervov. Yes, see my edit.
                  – Denis Serre
                  Dec 2 at 7:23













                up vote
                3
                down vote










                up vote
                3
                down vote









                This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.



                Now, use the fact that $f=-H$ is convex.



                For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).



                Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
                $$frac1{1-alpha}logsum_jp_j^alpha.$$
                The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.






                share|cite|improve this answer














                This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.



                Now, use the fact that $f=-H$ is convex.



                For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).



                Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
                $$frac1{1-alpha}logsum_jp_j^alpha.$$
                The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited Dec 2 at 7:29

























                answered Dec 1 at 19:10









                Denis Serre

                28.9k791195




                28.9k791195












                • Thank you. That seems to cover Renyi entropy also, is not it?
                  – Alexander Chervov
                  Dec 1 at 19:32










                • @AlexanderChervov. Yes, see my edit.
                  – Denis Serre
                  Dec 2 at 7:23


















                • Thank you. That seems to cover Renyi entropy also, is not it?
                  – Alexander Chervov
                  Dec 1 at 19:32










                • @AlexanderChervov. Yes, see my edit.
                  – Denis Serre
                  Dec 2 at 7:23
















                Thank you. That seems to cover Renyi entropy also, is not it?
                – Alexander Chervov
                Dec 1 at 19:32




                Thank you. That seems to cover Renyi entropy also, is not it?
                – Alexander Chervov
                Dec 1 at 19:32












                @AlexanderChervov. Yes, see my edit.
                – Denis Serre
                Dec 2 at 7:23




                @AlexanderChervov. Yes, see my edit.
                – Denis Serre
                Dec 2 at 7:23










                up vote
                1
                down vote













                This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)



                Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.






                share|cite|improve this answer

























                  up vote
                  1
                  down vote













                  This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)



                  Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.






                  share|cite|improve this answer























                    up vote
                    1
                    down vote










                    up vote
                    1
                    down vote









                    This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)



                    Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.






                    share|cite|improve this answer












                    This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)



                    Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.







                    share|cite|improve this answer












                    share|cite|improve this answer



                    share|cite|improve this answer










                    answered Dec 1 at 18:39









                    puck

                    836




                    836






















                        up vote
                        1
                        down vote













                        This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
                        $$
                        D(alpha|beta) ge D(alpha P|beta P) ;.
                        $$

                        In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
                        $$
                        D(alpha | m_X) = log text{card} X - H(alpha) ;.
                        $$






                        share|cite|improve this answer

























                          up vote
                          1
                          down vote













                          This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
                          $$
                          D(alpha|beta) ge D(alpha P|beta P) ;.
                          $$

                          In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
                          $$
                          D(alpha | m_X) = log text{card} X - H(alpha) ;.
                          $$






                          share|cite|improve this answer























                            up vote
                            1
                            down vote










                            up vote
                            1
                            down vote









                            This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
                            $$
                            D(alpha|beta) ge D(alpha P|beta P) ;.
                            $$

                            In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
                            $$
                            D(alpha | m_X) = log text{card} X - H(alpha) ;.
                            $$






                            share|cite|improve this answer












                            This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
                            $$
                            D(alpha|beta) ge D(alpha P|beta P) ;.
                            $$

                            In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
                            $$
                            D(alpha | m_X) = log text{card} X - H(alpha) ;.
                            $$







                            share|cite|improve this answer












                            share|cite|improve this answer



                            share|cite|improve this answer










                            answered Dec 1 at 18:55









                            R W

                            10.1k21946




                            10.1k21946






















                                up vote
                                1
                                down vote













                                The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.



                                The proof (Question 2) is quite simple:



                                Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
                                begin{align}
                                mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
                                end{align}

                                Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.



                                Now,
                                begin{align}
                                H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
                                H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
                                end{align}

                                which implies
                                begin{align}
                                H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
                                &= H(mathbf{x}) + I(mathbf{y};pmb{pi})
                                end{align}

                                where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.



                                For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.






                                share|cite|improve this answer

























                                  up vote
                                  1
                                  down vote













                                  The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.



                                  The proof (Question 2) is quite simple:



                                  Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
                                  begin{align}
                                  mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
                                  end{align}

                                  Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.



                                  Now,
                                  begin{align}
                                  H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
                                  H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
                                  end{align}

                                  which implies
                                  begin{align}
                                  H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
                                  &= H(mathbf{x}) + I(mathbf{y};pmb{pi})
                                  end{align}

                                  where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.



                                  For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.






                                  share|cite|improve this answer























                                    up vote
                                    1
                                    down vote










                                    up vote
                                    1
                                    down vote









                                    The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.



                                    The proof (Question 2) is quite simple:



                                    Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
                                    begin{align}
                                    mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
                                    end{align}

                                    Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.



                                    Now,
                                    begin{align}
                                    H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
                                    H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
                                    end{align}

                                    which implies
                                    begin{align}
                                    H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
                                    &= H(mathbf{x}) + I(mathbf{y};pmb{pi})
                                    end{align}

                                    where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.



                                    For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.






                                    share|cite|improve this answer












                                    The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.



                                    The proof (Question 2) is quite simple:



                                    Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
                                    begin{align}
                                    mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
                                    end{align}

                                    Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.



                                    Now,
                                    begin{align}
                                    H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
                                    H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
                                    end{align}

                                    which implies
                                    begin{align}
                                    H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
                                    &= H(mathbf{x}) + I(mathbf{y};pmb{pi})
                                    end{align}

                                    where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.



                                    For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.







                                    share|cite|improve this answer












                                    share|cite|improve this answer



                                    share|cite|improve this answer










                                    answered Dec 1 at 19:25









                                    Algernon

                                    9591712




                                    9591712






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to MathOverflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        Use MathJax to format equations. MathJax reference.


                                        To learn more, see our tips on writing great answers.





                                        Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                        Please pay close attention to the following guidance:


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f316661%2fapply-doubly-stochastic-matrix-m-to-a-probability-vector-then-entropy-increases%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Probability when a professor distributes a quiz and homework assignment to a class of n students.

                                        Aardman Animations

                                        Are they similar matrix