Apply doubly stochastic matrix M to a probability vector, then entropy increases?
up vote
1
down vote
favorite
Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.
Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.
$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$
Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).
Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?
Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.
co.combinatorics pr.probability it.information-theory entropy stochastic-matrices
add a comment |
up vote
1
down vote
favorite
Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.
Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.
$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$
Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).
Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?
Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.
co.combinatorics pr.probability it.information-theory entropy stochastic-matrices
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.
Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.
$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$
Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).
Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?
Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.
co.combinatorics pr.probability it.information-theory entropy stochastic-matrices
Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.
Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.
$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$
Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).
Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?
Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.
co.combinatorics pr.probability it.information-theory entropy stochastic-matrices
co.combinatorics pr.probability it.information-theory entropy stochastic-matrices
asked Dec 1 at 17:44
Alexander Chervov
11k1260139
11k1260139
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
up vote
3
down vote
This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.
Now, use the fact that $f=-H$ is convex.
For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).
Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
$$frac1{1-alpha}logsum_jp_j^alpha.$$
The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
add a comment |
up vote
1
down vote
This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)
Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.
add a comment |
up vote
1
down vote
This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
$$
D(alpha|beta) ge D(alpha P|beta P) ;.
$$
In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
$$
D(alpha | m_X) = log text{card} X - H(alpha) ;.
$$
add a comment |
up vote
1
down vote
The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.
The proof (Question 2) is quite simple:
Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
begin{align}
mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
end{align}
Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.
Now,
begin{align}
H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
end{align}
which implies
begin{align}
H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
&= H(mathbf{x}) + I(mathbf{y};pmb{pi})
end{align}
where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.
For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.
Now, use the fact that $f=-H$ is convex.
For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).
Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
$$frac1{1-alpha}logsum_jp_j^alpha.$$
The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
add a comment |
up vote
3
down vote
This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.
Now, use the fact that $f=-H$ is convex.
For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).
Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
$$frac1{1-alpha}logsum_jp_j^alpha.$$
The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
add a comment |
up vote
3
down vote
up vote
3
down vote
This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.
Now, use the fact that $f=-H$ is convex.
For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).
Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
$$frac1{1-alpha}logsum_jp_j^alpha.$$
The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.
This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.
Now, use the fact that $f=-H$ is convex.
For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).
Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
$$frac1{1-alpha}logsum_jp_j^alpha.$$
The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.
edited Dec 2 at 7:29
answered Dec 1 at 19:10
Denis Serre
28.9k791195
28.9k791195
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
add a comment |
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
add a comment |
up vote
1
down vote
This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)
Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.
add a comment |
up vote
1
down vote
This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)
Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.
add a comment |
up vote
1
down vote
up vote
1
down vote
This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)
Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.
This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)
Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.
answered Dec 1 at 18:39
puck
836
836
add a comment |
add a comment |
up vote
1
down vote
This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
$$
D(alpha|beta) ge D(alpha P|beta P) ;.
$$
In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
$$
D(alpha | m_X) = log text{card} X - H(alpha) ;.
$$
add a comment |
up vote
1
down vote
This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
$$
D(alpha|beta) ge D(alpha P|beta P) ;.
$$
In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
$$
D(alpha | m_X) = log text{card} X - H(alpha) ;.
$$
add a comment |
up vote
1
down vote
up vote
1
down vote
This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
$$
D(alpha|beta) ge D(alpha P|beta P) ;.
$$
In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
$$
D(alpha | m_X) = log text{card} X - H(alpha) ;.
$$
This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
$$
D(alpha|beta) ge D(alpha P|beta P) ;.
$$
In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
$$
D(alpha | m_X) = log text{card} X - H(alpha) ;.
$$
answered Dec 1 at 18:55
R W
10.1k21946
10.1k21946
add a comment |
add a comment |
up vote
1
down vote
The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.
The proof (Question 2) is quite simple:
Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
begin{align}
mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
end{align}
Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.
Now,
begin{align}
H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
end{align}
which implies
begin{align}
H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
&= H(mathbf{x}) + I(mathbf{y};pmb{pi})
end{align}
where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.
For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.
add a comment |
up vote
1
down vote
The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.
The proof (Question 2) is quite simple:
Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
begin{align}
mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
end{align}
Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.
Now,
begin{align}
H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
end{align}
which implies
begin{align}
H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
&= H(mathbf{x}) + I(mathbf{y};pmb{pi})
end{align}
where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.
For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.
add a comment |
up vote
1
down vote
up vote
1
down vote
The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.
The proof (Question 2) is quite simple:
Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
begin{align}
mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
end{align}
Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.
Now,
begin{align}
H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
end{align}
which implies
begin{align}
H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
&= H(mathbf{x}) + I(mathbf{y};pmb{pi})
end{align}
where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.
For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.
The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.
The proof (Question 2) is quite simple:
Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
begin{align}
mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
end{align}
Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.
Now,
begin{align}
H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
end{align}
which implies
begin{align}
H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
&= H(mathbf{x}) + I(mathbf{y};pmb{pi})
end{align}
where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.
For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.
answered Dec 1 at 19:25
Algernon
9591712
9591712
add a comment |
add a comment |
Thanks for contributing an answer to MathOverflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f316661%2fapply-doubly-stochastic-matrix-m-to-a-probability-vector-then-entropy-increases%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown