What does conditioning on a random variable mean?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







4












$begingroup$


What does onditioning on a random variable mean?



For example: in p(X|Y), X and Y are the random variables, so does the conditioning on Y mean Y is fixed (or non-random)?










share|cite|improve this question











$endgroup$












  • $begingroup$
    wondering what a conditional random variable means, normally a conditioned random variable means known value, such as P(X|Y=1), but I also noticed sometimes Y is unspecified as P(X|Y=y), so in this case, what does a condition really mean?
    $endgroup$
    – Yneedtobeserious
    Mar 3 at 5:43


















4












$begingroup$


What does onditioning on a random variable mean?



For example: in p(X|Y), X and Y are the random variables, so does the conditioning on Y mean Y is fixed (or non-random)?










share|cite|improve this question











$endgroup$












  • $begingroup$
    wondering what a conditional random variable means, normally a conditioned random variable means known value, such as P(X|Y=1), but I also noticed sometimes Y is unspecified as P(X|Y=y), so in this case, what does a condition really mean?
    $endgroup$
    – Yneedtobeserious
    Mar 3 at 5:43














4












4








4


0



$begingroup$


What does onditioning on a random variable mean?



For example: in p(X|Y), X and Y are the random variables, so does the conditioning on Y mean Y is fixed (or non-random)?










share|cite|improve this question











$endgroup$




What does onditioning on a random variable mean?



For example: in p(X|Y), X and Y are the random variables, so does the conditioning on Y mean Y is fixed (or non-random)?







mathematical-statistics






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Mar 3 at 12:29









Peter Mortensen

20328




20328










asked Mar 3 at 3:50









YneedtobeseriousYneedtobeserious

235




235












  • $begingroup$
    wondering what a conditional random variable means, normally a conditioned random variable means known value, such as P(X|Y=1), but I also noticed sometimes Y is unspecified as P(X|Y=y), so in this case, what does a condition really mean?
    $endgroup$
    – Yneedtobeserious
    Mar 3 at 5:43


















  • $begingroup$
    wondering what a conditional random variable means, normally a conditioned random variable means known value, such as P(X|Y=1), but I also noticed sometimes Y is unspecified as P(X|Y=y), so in this case, what does a condition really mean?
    $endgroup$
    – Yneedtobeserious
    Mar 3 at 5:43
















$begingroup$
wondering what a conditional random variable means, normally a conditioned random variable means known value, such as P(X|Y=1), but I also noticed sometimes Y is unspecified as P(X|Y=y), so in this case, what does a condition really mean?
$endgroup$
– Yneedtobeserious
Mar 3 at 5:43




$begingroup$
wondering what a conditional random variable means, normally a conditioned random variable means known value, such as P(X|Y=1), but I also noticed sometimes Y is unspecified as P(X|Y=y), so in this case, what does a condition really mean?
$endgroup$
– Yneedtobeserious
Mar 3 at 5:43










3 Answers
3






active

oldest

votes


















3












$begingroup$

Conditioning on an event (such as a particular specification of a random variable) means that this event is treated as being known to have occurred. This still allows us to specify conditioning on an event ${ Y=y }$ where the actual value $y$ is an algebraic variable that falls within some range.$^dagger$ For example, we might specify the conditional density:



$$p_{X|Y}(x|y) = p(X=x | Y=y) = {y choose x} frac{1}{2^y}
quad quad quad text{for all integers } 0 leqslant x leqslant y.$$



This refers to the probability density for the random variable $X$ conditional on the known event ${ Y=y }$, where we are free to set any $y in mathbb{N}$. The use of the variable $y$ in this formulation simply means that the conditional distribution has a form that allows us to substitute a range of values for this variable, so we write it as a function of the conditioning value as well as the argument value for the random variable $X$. Regardless of which particular value $y$ we choose, the resulting density is conditional on that event being treated as known ---i.e., no longer random.



As I have stated in another answer here, it is also worth noting that many theories of probability regard all probability to be conditional on implicit information. This idea is most famously associated with the axiomatic approach of the mathematician Alfréd Rényi (see e.g., Kaminski 1984). Rényi argued that every probability measure must be interpreted as being conditional on some underlying information, and that reference to marginal probabilities was merely a reference to probability where the underlying conditions are implicit, rather than explicit.





$^dagger$ Technically, it's worth noting that if we conditioning on the value of a continuous random variable (an event with probability zero) then there is an extended definition of the conditional probability. Essentially this is just a function that satisfies the required integral statement for the marginal probability. In the present answer we will stick to discrete random variables to keep things simple.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    The expression for the conditional pdf should depend on x in some way, like $phi(x-y)$.
    $endgroup$
    – p.s.
    Mar 3 at 17:05










  • $begingroup$
    @p.s. Thanks - fixed.
    $endgroup$
    – Ben
    Mar 3 at 21:02



















6












$begingroup$

Conditioning on a random variable is much more subtle than conditioning on an event.



Conditioning on an Event



Recall that for an event $B$ with $P(B) > 0$ we define the conditional probability given $B$ by
$$
P(A mid B) = frac{P(A cap B)}{P(B)}
$$

for every event $A$. This defines a new probability measure $P( cdotmid B)$ on the underlying probability space, and if $X$ is a random variable which is either non-negative or $P$-integrable on $A$, then we have
$$
E[X mid B]
= int X , dP( cdotmid B)
= frac{1}{P(B)} int X mathbf{1}_B , dP.
$$

The intuitive interpretation is that $E[X mid B]$ is the "best guess" for what value $X$ takes, knowing that the event $B$ actually happens.
This intuition is justified by the last integral above: we integrate $X$ with respect to $P$, but only on the event $B$ (and dividing by $P(B)$ is due to us concentrating all our attention on $B$ and hence re-weighting $B$ to have probability $1$).



That's the easy case. To understand conditioning on a random variable, we need the more general idea of conditioning on information. A probability measure by itself gives us prior probabilities for all possible events. But probabilities that certain events happen change if we know that certain other events do or do not happen. That is, when we have information about whether certain events happen or not, we can update our probabilities for the remaining events.



Conditioning on a Collection of Events



Formally, suppose $mathcal{G}$ is a $sigma$-algebra of events. Assume that it is known whether each event in $mathcal{G}$ happens or not.
We want to define the conditional probability $P( cdotmid mathcal{G})$ and the conditional expectation $E[ cdotmid mathcal{G}]$.
The conditional probability $P(A mid mathcal{G})$ should reflect our updated probability of an event $A$ after knowing the information contained in $mathcal{G}$, and $E[X midmathcal{G}]$ should be our "best guess" for the value of a random variable $X$ using the information contained in $mathcal{G}$.



(NB: Why should $mathcal{G}$ be a $sigma$-algebra and not a more general collection of events? Because if $mathcal{G}$ weren't a $sigma$ algebra but we know whether each event in $mathcal{G}$ happens or not, then we would know whether each event in the $sigma$-algebra generated by $mathcal{G}$ happens or not, so we might as well replace $mathcal{G}$ with $sigma(mathcal{G})$.)



Conditional Probability



Here's where things get interesting. $P(A midmathcal{G})$ is no longer just a number: it is a random variable!. We define $P(A midmathcal{G})$ to be any $mathcal{G}$-measurable random variable $X$ such that
$$
E[X mathbf{1}_B] = P(A cap B)
$$

for every event $B in mathcal{G}$.
Moreover, if $X$ and $X^prime$ are two random variables satisfying this definition, then $X = X^prime$ almost surely.
That is pretty abstract stuff, so hopefully an example can shed some light on the abstraction.



Example.
Let $(Omega, mathcal{F}, P)$ be a probability space, and let $B in mathcal{F}$ be an event with $0 < P(B) < 1$.
Suppose $mathcal{G} = {emptyset, B, B^c, Omega}$.
That is, $mathcal{G}$ is the $sigma$-algebra containing all the information about whether $B$ happens or not.
Then for any event $A in mathcal{F}$ we have
$$
P(A mid mathcal{G})
= P(A mid B) mathbf{1}_B + P(A mid B^c) mathbf{1}_{B^c}.
$$

That is, for an outcome $omega in Omega$, we have
$$
P(A mid mathcal{G})(omega) = P(A mid B)
$$

if $omega in B$ (i.e., if $B$ happens), and
$$
P(A mid mathcal{G})(omega) = P(A mid B^c)
$$

if $omega notin B$ (i.e., if $B$ doesn't happen).
It is easy to check that this random variable actually satisfies the definition of the conditional probability $P(A mid mathcal{G})$ defined above.



Conditional Expectation



I mentioned already that conditional probabilities aren't unique, but they are unique almost surely.
It turns out that if $X$ is a nonnegative or integrable random variable, $mathcal{G}$ is a $sigma$-algebra of events, and $Q$ is the distribution of $X$ (a Borel probability measure on $mathbb{R}$) then it is possible to choose versions of conditional probabilities $Q(B mid mathcal{G})$ for all Borel subsets $B$ of $mathbb{R}$ such that $Q( cdot mid mathcal{G})(omega)$ is a probability measure for each outcome $omega$. Given this possibility, we may define
$$
E[Xmidmathcal{G}]=int_{mathbb{R}} x , Q(dxmidmathcal{G}),
$$

which is again a random variable.
It can be shown that this is the almost surely unique random variable $Y$ which is $mathcal{G}$-measurable and satisfies
$$
E[Y mathbf{1}_A] = E[X mathbf{1}_A]
$$

for all $A in mathcal{G}$.



Conditioning on a Random Variable



Given the general definitions of conditional probability and conditional expectation given above, we may easily define what it means to condition on a random variable $Y$: it means conditioning on the $sigma$-algebra generated by $Y$:
$$
sigma(Y)
= big{{Y in B} : text{$B$ is a Borel subset of $mathbb{R}$}big}.
$$

I said "easy to define," but I am aware that that doesn't mean "easy to understand."
But at least we can now say what an expression like $E[X mid Y]$ means: it is a random variable that satisfies
$$
E[E[X mid Y] mathbf{1}_A] = E[X mathbf{1}_A]
$$

for every event $A$ of the form $A = {Y in B}$ for some Borel subset $B$ of $mathbb{R}$.
Wow, that's abstract!
Fortunately, there are easy ways to work with $E[X mid Y]$ if $Y$ is discrete or absolutely continuous.




$Y$ Discrete



Suppose $Y$ takes values in a countable set $S subseteq mathbb{R}$.
Then it can be shown that
$$
P(A mid Y)(omega) = P(A mid Y = Y(omega))
$$

for each outcome $omega$.
The right-hand side above is shorthand for the more verbose
$$
P(A mid {Y = Y(omega)})
$$

where ${Y = Y(omega)}$ is the event
$$
{Y = Y(omega)}
= {omega^prime : Y(omega^prime) = Y(omega)}.
$$

That is, if our outcome is $omega$, and $Y(omega) = k$, then
$$
P(A mid Y)(omega) = P(A mid Y = k) = frac{P(A cap {Y = k})}{P(Y = k)}.
$$

Similarly, if $X$ is another random variable taking values in $S$, then we have
$$
E[X mid Y](omega) = E[X mid Y = Y(omega)] = sum_{x in S} x P(X = x mid Y = Y(omega))
$$




$Y$ Absolutely Continuous



Suppose now that $Y$ is absolutely continuous with density $f_Y$.
Let $X$ be another absolutely continuous random variable, with density $f_X$.
Let $f_{X, Y}$ be the joint density of $X$ and $Y$.
Then we define the conditional density of $X$ given $Y = y$ by
$$
f_{Xmid Y}(x mid y) = frac{f_{X, Y}(x, y)}{f_Y(y)}
= frac{f_{X, Y}(x, y)}{int_{mathbb{R}} x^prime f_{X, Y}(x^prime, y) , dx^prime}.
$$

Now we may define a function $g : mathbb{R} to mathbb{R}$ given by
$$
g(y)
= E[X mid Y = y]
= int_{mathbb{R}} x f_{X mid Y}(x mid y) , dx.
$$

In particular, $g(y) = E[X mid Y = y]$ is a real number for each $y$.
Using this $g$, we can show that
$$
E[X mid Y] = g(Y),
$$

meaning that
$$
E[X mid Y](omega) = g(Y(omega)) = E[X mid Y = Y(omega)]
$$

for each outcome $omega$.



This is just scratching the surface of the theory of conditioning.
For a great reference, see chapters 21 and 23 of A Modern Approach to Probability by Fristedt and Gray.




Some Takeaways




  1. Conditioning on a random variable is different from conditioning on an event.

  2. Expressions like $P(A mid Y)$ and $E[X mid Y]$ are random variables

  3. Expressions like $P(A mid Y = y)$ and $E[X mid Y = y]$ are real numbers.







share|cite|improve this answer











$endgroup$





















    2












    $begingroup$

    It means that the value of the random variable Y is known. For example, suppose $E(X|Y)=10+Y^2$. Then if $Y=2, $E(X|Y=2)=14.$






    share|cite|improve this answer









    $endgroup$













    • $begingroup$
      Thanks for your explanation, I found that E(X|Y=y) is also possible, so here the y is still unspecified and E(X|Y=y)=10+y^2, in this case, do the condition only remove the randomness of the Y?
      $endgroup$
      – Yneedtobeserious
      Mar 3 at 4:07










    • $begingroup$
      $E(X|Y=y)$ is just a different notation. If you are conditioning on $Y$, that means $Y$ is known. $E(X|Y=y)=10+y^2$ is a function that gives expected value of $X$ conditional on $Y=y$ for an arbitrary value of $Y$.
      $endgroup$
      – user239680
      Mar 4 at 5:07










    • $begingroup$
      my understanding is that if E(X|Y=y), the Y is still not a known value, but a non-random unknown variable, which means the condition Y=y only removed the randomness of random variable Y. In summary, the conditioning random variable is either a specific value(like Y=5) or a unspecified non-random variable(like Y=y), but in either case, the Y is not treated as random anymore in terms of expectation of X. Am I right to think this way?
      $endgroup$
      – Yneedtobeserious
      Mar 4 at 7:38










    • $begingroup$
      Yes, that's fair. Our different explanations our just semantics.
      $endgroup$
      – user239680
      Mar 4 at 14:05












    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "65"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f395310%2fwhat-does-conditioning-on-a-random-variable-mean%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3












    $begingroup$

    Conditioning on an event (such as a particular specification of a random variable) means that this event is treated as being known to have occurred. This still allows us to specify conditioning on an event ${ Y=y }$ where the actual value $y$ is an algebraic variable that falls within some range.$^dagger$ For example, we might specify the conditional density:



    $$p_{X|Y}(x|y) = p(X=x | Y=y) = {y choose x} frac{1}{2^y}
    quad quad quad text{for all integers } 0 leqslant x leqslant y.$$



    This refers to the probability density for the random variable $X$ conditional on the known event ${ Y=y }$, where we are free to set any $y in mathbb{N}$. The use of the variable $y$ in this formulation simply means that the conditional distribution has a form that allows us to substitute a range of values for this variable, so we write it as a function of the conditioning value as well as the argument value for the random variable $X$. Regardless of which particular value $y$ we choose, the resulting density is conditional on that event being treated as known ---i.e., no longer random.



    As I have stated in another answer here, it is also worth noting that many theories of probability regard all probability to be conditional on implicit information. This idea is most famously associated with the axiomatic approach of the mathematician Alfréd Rényi (see e.g., Kaminski 1984). Rényi argued that every probability measure must be interpreted as being conditional on some underlying information, and that reference to marginal probabilities was merely a reference to probability where the underlying conditions are implicit, rather than explicit.





    $^dagger$ Technically, it's worth noting that if we conditioning on the value of a continuous random variable (an event with probability zero) then there is an extended definition of the conditional probability. Essentially this is just a function that satisfies the required integral statement for the marginal probability. In the present answer we will stick to discrete random variables to keep things simple.






    share|cite|improve this answer











    $endgroup$













    • $begingroup$
      The expression for the conditional pdf should depend on x in some way, like $phi(x-y)$.
      $endgroup$
      – p.s.
      Mar 3 at 17:05










    • $begingroup$
      @p.s. Thanks - fixed.
      $endgroup$
      – Ben
      Mar 3 at 21:02
















    3












    $begingroup$

    Conditioning on an event (such as a particular specification of a random variable) means that this event is treated as being known to have occurred. This still allows us to specify conditioning on an event ${ Y=y }$ where the actual value $y$ is an algebraic variable that falls within some range.$^dagger$ For example, we might specify the conditional density:



    $$p_{X|Y}(x|y) = p(X=x | Y=y) = {y choose x} frac{1}{2^y}
    quad quad quad text{for all integers } 0 leqslant x leqslant y.$$



    This refers to the probability density for the random variable $X$ conditional on the known event ${ Y=y }$, where we are free to set any $y in mathbb{N}$. The use of the variable $y$ in this formulation simply means that the conditional distribution has a form that allows us to substitute a range of values for this variable, so we write it as a function of the conditioning value as well as the argument value for the random variable $X$. Regardless of which particular value $y$ we choose, the resulting density is conditional on that event being treated as known ---i.e., no longer random.



    As I have stated in another answer here, it is also worth noting that many theories of probability regard all probability to be conditional on implicit information. This idea is most famously associated with the axiomatic approach of the mathematician Alfréd Rényi (see e.g., Kaminski 1984). Rényi argued that every probability measure must be interpreted as being conditional on some underlying information, and that reference to marginal probabilities was merely a reference to probability where the underlying conditions are implicit, rather than explicit.





    $^dagger$ Technically, it's worth noting that if we conditioning on the value of a continuous random variable (an event with probability zero) then there is an extended definition of the conditional probability. Essentially this is just a function that satisfies the required integral statement for the marginal probability. In the present answer we will stick to discrete random variables to keep things simple.






    share|cite|improve this answer











    $endgroup$













    • $begingroup$
      The expression for the conditional pdf should depend on x in some way, like $phi(x-y)$.
      $endgroup$
      – p.s.
      Mar 3 at 17:05










    • $begingroup$
      @p.s. Thanks - fixed.
      $endgroup$
      – Ben
      Mar 3 at 21:02














    3












    3








    3





    $begingroup$

    Conditioning on an event (such as a particular specification of a random variable) means that this event is treated as being known to have occurred. This still allows us to specify conditioning on an event ${ Y=y }$ where the actual value $y$ is an algebraic variable that falls within some range.$^dagger$ For example, we might specify the conditional density:



    $$p_{X|Y}(x|y) = p(X=x | Y=y) = {y choose x} frac{1}{2^y}
    quad quad quad text{for all integers } 0 leqslant x leqslant y.$$



    This refers to the probability density for the random variable $X$ conditional on the known event ${ Y=y }$, where we are free to set any $y in mathbb{N}$. The use of the variable $y$ in this formulation simply means that the conditional distribution has a form that allows us to substitute a range of values for this variable, so we write it as a function of the conditioning value as well as the argument value for the random variable $X$. Regardless of which particular value $y$ we choose, the resulting density is conditional on that event being treated as known ---i.e., no longer random.



    As I have stated in another answer here, it is also worth noting that many theories of probability regard all probability to be conditional on implicit information. This idea is most famously associated with the axiomatic approach of the mathematician Alfréd Rényi (see e.g., Kaminski 1984). Rényi argued that every probability measure must be interpreted as being conditional on some underlying information, and that reference to marginal probabilities was merely a reference to probability where the underlying conditions are implicit, rather than explicit.





    $^dagger$ Technically, it's worth noting that if we conditioning on the value of a continuous random variable (an event with probability zero) then there is an extended definition of the conditional probability. Essentially this is just a function that satisfies the required integral statement for the marginal probability. In the present answer we will stick to discrete random variables to keep things simple.






    share|cite|improve this answer











    $endgroup$



    Conditioning on an event (such as a particular specification of a random variable) means that this event is treated as being known to have occurred. This still allows us to specify conditioning on an event ${ Y=y }$ where the actual value $y$ is an algebraic variable that falls within some range.$^dagger$ For example, we might specify the conditional density:



    $$p_{X|Y}(x|y) = p(X=x | Y=y) = {y choose x} frac{1}{2^y}
    quad quad quad text{for all integers } 0 leqslant x leqslant y.$$



    This refers to the probability density for the random variable $X$ conditional on the known event ${ Y=y }$, where we are free to set any $y in mathbb{N}$. The use of the variable $y$ in this formulation simply means that the conditional distribution has a form that allows us to substitute a range of values for this variable, so we write it as a function of the conditioning value as well as the argument value for the random variable $X$. Regardless of which particular value $y$ we choose, the resulting density is conditional on that event being treated as known ---i.e., no longer random.



    As I have stated in another answer here, it is also worth noting that many theories of probability regard all probability to be conditional on implicit information. This idea is most famously associated with the axiomatic approach of the mathematician Alfréd Rényi (see e.g., Kaminski 1984). Rényi argued that every probability measure must be interpreted as being conditional on some underlying information, and that reference to marginal probabilities was merely a reference to probability where the underlying conditions are implicit, rather than explicit.





    $^dagger$ Technically, it's worth noting that if we conditioning on the value of a continuous random variable (an event with probability zero) then there is an extended definition of the conditional probability. Essentially this is just a function that satisfies the required integral statement for the marginal probability. In the present answer we will stick to discrete random variables to keep things simple.







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Mar 4 at 1:58

























    answered Mar 3 at 6:45









    BenBen

    27.5k233126




    27.5k233126












    • $begingroup$
      The expression for the conditional pdf should depend on x in some way, like $phi(x-y)$.
      $endgroup$
      – p.s.
      Mar 3 at 17:05










    • $begingroup$
      @p.s. Thanks - fixed.
      $endgroup$
      – Ben
      Mar 3 at 21:02


















    • $begingroup$
      The expression for the conditional pdf should depend on x in some way, like $phi(x-y)$.
      $endgroup$
      – p.s.
      Mar 3 at 17:05










    • $begingroup$
      @p.s. Thanks - fixed.
      $endgroup$
      – Ben
      Mar 3 at 21:02
















    $begingroup$
    The expression for the conditional pdf should depend on x in some way, like $phi(x-y)$.
    $endgroup$
    – p.s.
    Mar 3 at 17:05




    $begingroup$
    The expression for the conditional pdf should depend on x in some way, like $phi(x-y)$.
    $endgroup$
    – p.s.
    Mar 3 at 17:05












    $begingroup$
    @p.s. Thanks - fixed.
    $endgroup$
    – Ben
    Mar 3 at 21:02




    $begingroup$
    @p.s. Thanks - fixed.
    $endgroup$
    – Ben
    Mar 3 at 21:02













    6












    $begingroup$

    Conditioning on a random variable is much more subtle than conditioning on an event.



    Conditioning on an Event



    Recall that for an event $B$ with $P(B) > 0$ we define the conditional probability given $B$ by
    $$
    P(A mid B) = frac{P(A cap B)}{P(B)}
    $$

    for every event $A$. This defines a new probability measure $P( cdotmid B)$ on the underlying probability space, and if $X$ is a random variable which is either non-negative or $P$-integrable on $A$, then we have
    $$
    E[X mid B]
    = int X , dP( cdotmid B)
    = frac{1}{P(B)} int X mathbf{1}_B , dP.
    $$

    The intuitive interpretation is that $E[X mid B]$ is the "best guess" for what value $X$ takes, knowing that the event $B$ actually happens.
    This intuition is justified by the last integral above: we integrate $X$ with respect to $P$, but only on the event $B$ (and dividing by $P(B)$ is due to us concentrating all our attention on $B$ and hence re-weighting $B$ to have probability $1$).



    That's the easy case. To understand conditioning on a random variable, we need the more general idea of conditioning on information. A probability measure by itself gives us prior probabilities for all possible events. But probabilities that certain events happen change if we know that certain other events do or do not happen. That is, when we have information about whether certain events happen or not, we can update our probabilities for the remaining events.



    Conditioning on a Collection of Events



    Formally, suppose $mathcal{G}$ is a $sigma$-algebra of events. Assume that it is known whether each event in $mathcal{G}$ happens or not.
    We want to define the conditional probability $P( cdotmid mathcal{G})$ and the conditional expectation $E[ cdotmid mathcal{G}]$.
    The conditional probability $P(A mid mathcal{G})$ should reflect our updated probability of an event $A$ after knowing the information contained in $mathcal{G}$, and $E[X midmathcal{G}]$ should be our "best guess" for the value of a random variable $X$ using the information contained in $mathcal{G}$.



    (NB: Why should $mathcal{G}$ be a $sigma$-algebra and not a more general collection of events? Because if $mathcal{G}$ weren't a $sigma$ algebra but we know whether each event in $mathcal{G}$ happens or not, then we would know whether each event in the $sigma$-algebra generated by $mathcal{G}$ happens or not, so we might as well replace $mathcal{G}$ with $sigma(mathcal{G})$.)



    Conditional Probability



    Here's where things get interesting. $P(A midmathcal{G})$ is no longer just a number: it is a random variable!. We define $P(A midmathcal{G})$ to be any $mathcal{G}$-measurable random variable $X$ such that
    $$
    E[X mathbf{1}_B] = P(A cap B)
    $$

    for every event $B in mathcal{G}$.
    Moreover, if $X$ and $X^prime$ are two random variables satisfying this definition, then $X = X^prime$ almost surely.
    That is pretty abstract stuff, so hopefully an example can shed some light on the abstraction.



    Example.
    Let $(Omega, mathcal{F}, P)$ be a probability space, and let $B in mathcal{F}$ be an event with $0 < P(B) < 1$.
    Suppose $mathcal{G} = {emptyset, B, B^c, Omega}$.
    That is, $mathcal{G}$ is the $sigma$-algebra containing all the information about whether $B$ happens or not.
    Then for any event $A in mathcal{F}$ we have
    $$
    P(A mid mathcal{G})
    = P(A mid B) mathbf{1}_B + P(A mid B^c) mathbf{1}_{B^c}.
    $$

    That is, for an outcome $omega in Omega$, we have
    $$
    P(A mid mathcal{G})(omega) = P(A mid B)
    $$

    if $omega in B$ (i.e., if $B$ happens), and
    $$
    P(A mid mathcal{G})(omega) = P(A mid B^c)
    $$

    if $omega notin B$ (i.e., if $B$ doesn't happen).
    It is easy to check that this random variable actually satisfies the definition of the conditional probability $P(A mid mathcal{G})$ defined above.



    Conditional Expectation



    I mentioned already that conditional probabilities aren't unique, but they are unique almost surely.
    It turns out that if $X$ is a nonnegative or integrable random variable, $mathcal{G}$ is a $sigma$-algebra of events, and $Q$ is the distribution of $X$ (a Borel probability measure on $mathbb{R}$) then it is possible to choose versions of conditional probabilities $Q(B mid mathcal{G})$ for all Borel subsets $B$ of $mathbb{R}$ such that $Q( cdot mid mathcal{G})(omega)$ is a probability measure for each outcome $omega$. Given this possibility, we may define
    $$
    E[Xmidmathcal{G}]=int_{mathbb{R}} x , Q(dxmidmathcal{G}),
    $$

    which is again a random variable.
    It can be shown that this is the almost surely unique random variable $Y$ which is $mathcal{G}$-measurable and satisfies
    $$
    E[Y mathbf{1}_A] = E[X mathbf{1}_A]
    $$

    for all $A in mathcal{G}$.



    Conditioning on a Random Variable



    Given the general definitions of conditional probability and conditional expectation given above, we may easily define what it means to condition on a random variable $Y$: it means conditioning on the $sigma$-algebra generated by $Y$:
    $$
    sigma(Y)
    = big{{Y in B} : text{$B$ is a Borel subset of $mathbb{R}$}big}.
    $$

    I said "easy to define," but I am aware that that doesn't mean "easy to understand."
    But at least we can now say what an expression like $E[X mid Y]$ means: it is a random variable that satisfies
    $$
    E[E[X mid Y] mathbf{1}_A] = E[X mathbf{1}_A]
    $$

    for every event $A$ of the form $A = {Y in B}$ for some Borel subset $B$ of $mathbb{R}$.
    Wow, that's abstract!
    Fortunately, there are easy ways to work with $E[X mid Y]$ if $Y$ is discrete or absolutely continuous.




    $Y$ Discrete



    Suppose $Y$ takes values in a countable set $S subseteq mathbb{R}$.
    Then it can be shown that
    $$
    P(A mid Y)(omega) = P(A mid Y = Y(omega))
    $$

    for each outcome $omega$.
    The right-hand side above is shorthand for the more verbose
    $$
    P(A mid {Y = Y(omega)})
    $$

    where ${Y = Y(omega)}$ is the event
    $$
    {Y = Y(omega)}
    = {omega^prime : Y(omega^prime) = Y(omega)}.
    $$

    That is, if our outcome is $omega$, and $Y(omega) = k$, then
    $$
    P(A mid Y)(omega) = P(A mid Y = k) = frac{P(A cap {Y = k})}{P(Y = k)}.
    $$

    Similarly, if $X$ is another random variable taking values in $S$, then we have
    $$
    E[X mid Y](omega) = E[X mid Y = Y(omega)] = sum_{x in S} x P(X = x mid Y = Y(omega))
    $$




    $Y$ Absolutely Continuous



    Suppose now that $Y$ is absolutely continuous with density $f_Y$.
    Let $X$ be another absolutely continuous random variable, with density $f_X$.
    Let $f_{X, Y}$ be the joint density of $X$ and $Y$.
    Then we define the conditional density of $X$ given $Y = y$ by
    $$
    f_{Xmid Y}(x mid y) = frac{f_{X, Y}(x, y)}{f_Y(y)}
    = frac{f_{X, Y}(x, y)}{int_{mathbb{R}} x^prime f_{X, Y}(x^prime, y) , dx^prime}.
    $$

    Now we may define a function $g : mathbb{R} to mathbb{R}$ given by
    $$
    g(y)
    = E[X mid Y = y]
    = int_{mathbb{R}} x f_{X mid Y}(x mid y) , dx.
    $$

    In particular, $g(y) = E[X mid Y = y]$ is a real number for each $y$.
    Using this $g$, we can show that
    $$
    E[X mid Y] = g(Y),
    $$

    meaning that
    $$
    E[X mid Y](omega) = g(Y(omega)) = E[X mid Y = Y(omega)]
    $$

    for each outcome $omega$.



    This is just scratching the surface of the theory of conditioning.
    For a great reference, see chapters 21 and 23 of A Modern Approach to Probability by Fristedt and Gray.




    Some Takeaways




    1. Conditioning on a random variable is different from conditioning on an event.

    2. Expressions like $P(A mid Y)$ and $E[X mid Y]$ are random variables

    3. Expressions like $P(A mid Y = y)$ and $E[X mid Y = y]$ are real numbers.







    share|cite|improve this answer











    $endgroup$


















      6












      $begingroup$

      Conditioning on a random variable is much more subtle than conditioning on an event.



      Conditioning on an Event



      Recall that for an event $B$ with $P(B) > 0$ we define the conditional probability given $B$ by
      $$
      P(A mid B) = frac{P(A cap B)}{P(B)}
      $$

      for every event $A$. This defines a new probability measure $P( cdotmid B)$ on the underlying probability space, and if $X$ is a random variable which is either non-negative or $P$-integrable on $A$, then we have
      $$
      E[X mid B]
      = int X , dP( cdotmid B)
      = frac{1}{P(B)} int X mathbf{1}_B , dP.
      $$

      The intuitive interpretation is that $E[X mid B]$ is the "best guess" for what value $X$ takes, knowing that the event $B$ actually happens.
      This intuition is justified by the last integral above: we integrate $X$ with respect to $P$, but only on the event $B$ (and dividing by $P(B)$ is due to us concentrating all our attention on $B$ and hence re-weighting $B$ to have probability $1$).



      That's the easy case. To understand conditioning on a random variable, we need the more general idea of conditioning on information. A probability measure by itself gives us prior probabilities for all possible events. But probabilities that certain events happen change if we know that certain other events do or do not happen. That is, when we have information about whether certain events happen or not, we can update our probabilities for the remaining events.



      Conditioning on a Collection of Events



      Formally, suppose $mathcal{G}$ is a $sigma$-algebra of events. Assume that it is known whether each event in $mathcal{G}$ happens or not.
      We want to define the conditional probability $P( cdotmid mathcal{G})$ and the conditional expectation $E[ cdotmid mathcal{G}]$.
      The conditional probability $P(A mid mathcal{G})$ should reflect our updated probability of an event $A$ after knowing the information contained in $mathcal{G}$, and $E[X midmathcal{G}]$ should be our "best guess" for the value of a random variable $X$ using the information contained in $mathcal{G}$.



      (NB: Why should $mathcal{G}$ be a $sigma$-algebra and not a more general collection of events? Because if $mathcal{G}$ weren't a $sigma$ algebra but we know whether each event in $mathcal{G}$ happens or not, then we would know whether each event in the $sigma$-algebra generated by $mathcal{G}$ happens or not, so we might as well replace $mathcal{G}$ with $sigma(mathcal{G})$.)



      Conditional Probability



      Here's where things get interesting. $P(A midmathcal{G})$ is no longer just a number: it is a random variable!. We define $P(A midmathcal{G})$ to be any $mathcal{G}$-measurable random variable $X$ such that
      $$
      E[X mathbf{1}_B] = P(A cap B)
      $$

      for every event $B in mathcal{G}$.
      Moreover, if $X$ and $X^prime$ are two random variables satisfying this definition, then $X = X^prime$ almost surely.
      That is pretty abstract stuff, so hopefully an example can shed some light on the abstraction.



      Example.
      Let $(Omega, mathcal{F}, P)$ be a probability space, and let $B in mathcal{F}$ be an event with $0 < P(B) < 1$.
      Suppose $mathcal{G} = {emptyset, B, B^c, Omega}$.
      That is, $mathcal{G}$ is the $sigma$-algebra containing all the information about whether $B$ happens or not.
      Then for any event $A in mathcal{F}$ we have
      $$
      P(A mid mathcal{G})
      = P(A mid B) mathbf{1}_B + P(A mid B^c) mathbf{1}_{B^c}.
      $$

      That is, for an outcome $omega in Omega$, we have
      $$
      P(A mid mathcal{G})(omega) = P(A mid B)
      $$

      if $omega in B$ (i.e., if $B$ happens), and
      $$
      P(A mid mathcal{G})(omega) = P(A mid B^c)
      $$

      if $omega notin B$ (i.e., if $B$ doesn't happen).
      It is easy to check that this random variable actually satisfies the definition of the conditional probability $P(A mid mathcal{G})$ defined above.



      Conditional Expectation



      I mentioned already that conditional probabilities aren't unique, but they are unique almost surely.
      It turns out that if $X$ is a nonnegative or integrable random variable, $mathcal{G}$ is a $sigma$-algebra of events, and $Q$ is the distribution of $X$ (a Borel probability measure on $mathbb{R}$) then it is possible to choose versions of conditional probabilities $Q(B mid mathcal{G})$ for all Borel subsets $B$ of $mathbb{R}$ such that $Q( cdot mid mathcal{G})(omega)$ is a probability measure for each outcome $omega$. Given this possibility, we may define
      $$
      E[Xmidmathcal{G}]=int_{mathbb{R}} x , Q(dxmidmathcal{G}),
      $$

      which is again a random variable.
      It can be shown that this is the almost surely unique random variable $Y$ which is $mathcal{G}$-measurable and satisfies
      $$
      E[Y mathbf{1}_A] = E[X mathbf{1}_A]
      $$

      for all $A in mathcal{G}$.



      Conditioning on a Random Variable



      Given the general definitions of conditional probability and conditional expectation given above, we may easily define what it means to condition on a random variable $Y$: it means conditioning on the $sigma$-algebra generated by $Y$:
      $$
      sigma(Y)
      = big{{Y in B} : text{$B$ is a Borel subset of $mathbb{R}$}big}.
      $$

      I said "easy to define," but I am aware that that doesn't mean "easy to understand."
      But at least we can now say what an expression like $E[X mid Y]$ means: it is a random variable that satisfies
      $$
      E[E[X mid Y] mathbf{1}_A] = E[X mathbf{1}_A]
      $$

      for every event $A$ of the form $A = {Y in B}$ for some Borel subset $B$ of $mathbb{R}$.
      Wow, that's abstract!
      Fortunately, there are easy ways to work with $E[X mid Y]$ if $Y$ is discrete or absolutely continuous.




      $Y$ Discrete



      Suppose $Y$ takes values in a countable set $S subseteq mathbb{R}$.
      Then it can be shown that
      $$
      P(A mid Y)(omega) = P(A mid Y = Y(omega))
      $$

      for each outcome $omega$.
      The right-hand side above is shorthand for the more verbose
      $$
      P(A mid {Y = Y(omega)})
      $$

      where ${Y = Y(omega)}$ is the event
      $$
      {Y = Y(omega)}
      = {omega^prime : Y(omega^prime) = Y(omega)}.
      $$

      That is, if our outcome is $omega$, and $Y(omega) = k$, then
      $$
      P(A mid Y)(omega) = P(A mid Y = k) = frac{P(A cap {Y = k})}{P(Y = k)}.
      $$

      Similarly, if $X$ is another random variable taking values in $S$, then we have
      $$
      E[X mid Y](omega) = E[X mid Y = Y(omega)] = sum_{x in S} x P(X = x mid Y = Y(omega))
      $$




      $Y$ Absolutely Continuous



      Suppose now that $Y$ is absolutely continuous with density $f_Y$.
      Let $X$ be another absolutely continuous random variable, with density $f_X$.
      Let $f_{X, Y}$ be the joint density of $X$ and $Y$.
      Then we define the conditional density of $X$ given $Y = y$ by
      $$
      f_{Xmid Y}(x mid y) = frac{f_{X, Y}(x, y)}{f_Y(y)}
      = frac{f_{X, Y}(x, y)}{int_{mathbb{R}} x^prime f_{X, Y}(x^prime, y) , dx^prime}.
      $$

      Now we may define a function $g : mathbb{R} to mathbb{R}$ given by
      $$
      g(y)
      = E[X mid Y = y]
      = int_{mathbb{R}} x f_{X mid Y}(x mid y) , dx.
      $$

      In particular, $g(y) = E[X mid Y = y]$ is a real number for each $y$.
      Using this $g$, we can show that
      $$
      E[X mid Y] = g(Y),
      $$

      meaning that
      $$
      E[X mid Y](omega) = g(Y(omega)) = E[X mid Y = Y(omega)]
      $$

      for each outcome $omega$.



      This is just scratching the surface of the theory of conditioning.
      For a great reference, see chapters 21 and 23 of A Modern Approach to Probability by Fristedt and Gray.




      Some Takeaways




      1. Conditioning on a random variable is different from conditioning on an event.

      2. Expressions like $P(A mid Y)$ and $E[X mid Y]$ are random variables

      3. Expressions like $P(A mid Y = y)$ and $E[X mid Y = y]$ are real numbers.







      share|cite|improve this answer











      $endgroup$
















        6












        6








        6





        $begingroup$

        Conditioning on a random variable is much more subtle than conditioning on an event.



        Conditioning on an Event



        Recall that for an event $B$ with $P(B) > 0$ we define the conditional probability given $B$ by
        $$
        P(A mid B) = frac{P(A cap B)}{P(B)}
        $$

        for every event $A$. This defines a new probability measure $P( cdotmid B)$ on the underlying probability space, and if $X$ is a random variable which is either non-negative or $P$-integrable on $A$, then we have
        $$
        E[X mid B]
        = int X , dP( cdotmid B)
        = frac{1}{P(B)} int X mathbf{1}_B , dP.
        $$

        The intuitive interpretation is that $E[X mid B]$ is the "best guess" for what value $X$ takes, knowing that the event $B$ actually happens.
        This intuition is justified by the last integral above: we integrate $X$ with respect to $P$, but only on the event $B$ (and dividing by $P(B)$ is due to us concentrating all our attention on $B$ and hence re-weighting $B$ to have probability $1$).



        That's the easy case. To understand conditioning on a random variable, we need the more general idea of conditioning on information. A probability measure by itself gives us prior probabilities for all possible events. But probabilities that certain events happen change if we know that certain other events do or do not happen. That is, when we have information about whether certain events happen or not, we can update our probabilities for the remaining events.



        Conditioning on a Collection of Events



        Formally, suppose $mathcal{G}$ is a $sigma$-algebra of events. Assume that it is known whether each event in $mathcal{G}$ happens or not.
        We want to define the conditional probability $P( cdotmid mathcal{G})$ and the conditional expectation $E[ cdotmid mathcal{G}]$.
        The conditional probability $P(A mid mathcal{G})$ should reflect our updated probability of an event $A$ after knowing the information contained in $mathcal{G}$, and $E[X midmathcal{G}]$ should be our "best guess" for the value of a random variable $X$ using the information contained in $mathcal{G}$.



        (NB: Why should $mathcal{G}$ be a $sigma$-algebra and not a more general collection of events? Because if $mathcal{G}$ weren't a $sigma$ algebra but we know whether each event in $mathcal{G}$ happens or not, then we would know whether each event in the $sigma$-algebra generated by $mathcal{G}$ happens or not, so we might as well replace $mathcal{G}$ with $sigma(mathcal{G})$.)



        Conditional Probability



        Here's where things get interesting. $P(A midmathcal{G})$ is no longer just a number: it is a random variable!. We define $P(A midmathcal{G})$ to be any $mathcal{G}$-measurable random variable $X$ such that
        $$
        E[X mathbf{1}_B] = P(A cap B)
        $$

        for every event $B in mathcal{G}$.
        Moreover, if $X$ and $X^prime$ are two random variables satisfying this definition, then $X = X^prime$ almost surely.
        That is pretty abstract stuff, so hopefully an example can shed some light on the abstraction.



        Example.
        Let $(Omega, mathcal{F}, P)$ be a probability space, and let $B in mathcal{F}$ be an event with $0 < P(B) < 1$.
        Suppose $mathcal{G} = {emptyset, B, B^c, Omega}$.
        That is, $mathcal{G}$ is the $sigma$-algebra containing all the information about whether $B$ happens or not.
        Then for any event $A in mathcal{F}$ we have
        $$
        P(A mid mathcal{G})
        = P(A mid B) mathbf{1}_B + P(A mid B^c) mathbf{1}_{B^c}.
        $$

        That is, for an outcome $omega in Omega$, we have
        $$
        P(A mid mathcal{G})(omega) = P(A mid B)
        $$

        if $omega in B$ (i.e., if $B$ happens), and
        $$
        P(A mid mathcal{G})(omega) = P(A mid B^c)
        $$

        if $omega notin B$ (i.e., if $B$ doesn't happen).
        It is easy to check that this random variable actually satisfies the definition of the conditional probability $P(A mid mathcal{G})$ defined above.



        Conditional Expectation



        I mentioned already that conditional probabilities aren't unique, but they are unique almost surely.
        It turns out that if $X$ is a nonnegative or integrable random variable, $mathcal{G}$ is a $sigma$-algebra of events, and $Q$ is the distribution of $X$ (a Borel probability measure on $mathbb{R}$) then it is possible to choose versions of conditional probabilities $Q(B mid mathcal{G})$ for all Borel subsets $B$ of $mathbb{R}$ such that $Q( cdot mid mathcal{G})(omega)$ is a probability measure for each outcome $omega$. Given this possibility, we may define
        $$
        E[Xmidmathcal{G}]=int_{mathbb{R}} x , Q(dxmidmathcal{G}),
        $$

        which is again a random variable.
        It can be shown that this is the almost surely unique random variable $Y$ which is $mathcal{G}$-measurable and satisfies
        $$
        E[Y mathbf{1}_A] = E[X mathbf{1}_A]
        $$

        for all $A in mathcal{G}$.



        Conditioning on a Random Variable



        Given the general definitions of conditional probability and conditional expectation given above, we may easily define what it means to condition on a random variable $Y$: it means conditioning on the $sigma$-algebra generated by $Y$:
        $$
        sigma(Y)
        = big{{Y in B} : text{$B$ is a Borel subset of $mathbb{R}$}big}.
        $$

        I said "easy to define," but I am aware that that doesn't mean "easy to understand."
        But at least we can now say what an expression like $E[X mid Y]$ means: it is a random variable that satisfies
        $$
        E[E[X mid Y] mathbf{1}_A] = E[X mathbf{1}_A]
        $$

        for every event $A$ of the form $A = {Y in B}$ for some Borel subset $B$ of $mathbb{R}$.
        Wow, that's abstract!
        Fortunately, there are easy ways to work with $E[X mid Y]$ if $Y$ is discrete or absolutely continuous.




        $Y$ Discrete



        Suppose $Y$ takes values in a countable set $S subseteq mathbb{R}$.
        Then it can be shown that
        $$
        P(A mid Y)(omega) = P(A mid Y = Y(omega))
        $$

        for each outcome $omega$.
        The right-hand side above is shorthand for the more verbose
        $$
        P(A mid {Y = Y(omega)})
        $$

        where ${Y = Y(omega)}$ is the event
        $$
        {Y = Y(omega)}
        = {omega^prime : Y(omega^prime) = Y(omega)}.
        $$

        That is, if our outcome is $omega$, and $Y(omega) = k$, then
        $$
        P(A mid Y)(omega) = P(A mid Y = k) = frac{P(A cap {Y = k})}{P(Y = k)}.
        $$

        Similarly, if $X$ is another random variable taking values in $S$, then we have
        $$
        E[X mid Y](omega) = E[X mid Y = Y(omega)] = sum_{x in S} x P(X = x mid Y = Y(omega))
        $$




        $Y$ Absolutely Continuous



        Suppose now that $Y$ is absolutely continuous with density $f_Y$.
        Let $X$ be another absolutely continuous random variable, with density $f_X$.
        Let $f_{X, Y}$ be the joint density of $X$ and $Y$.
        Then we define the conditional density of $X$ given $Y = y$ by
        $$
        f_{Xmid Y}(x mid y) = frac{f_{X, Y}(x, y)}{f_Y(y)}
        = frac{f_{X, Y}(x, y)}{int_{mathbb{R}} x^prime f_{X, Y}(x^prime, y) , dx^prime}.
        $$

        Now we may define a function $g : mathbb{R} to mathbb{R}$ given by
        $$
        g(y)
        = E[X mid Y = y]
        = int_{mathbb{R}} x f_{X mid Y}(x mid y) , dx.
        $$

        In particular, $g(y) = E[X mid Y = y]$ is a real number for each $y$.
        Using this $g$, we can show that
        $$
        E[X mid Y] = g(Y),
        $$

        meaning that
        $$
        E[X mid Y](omega) = g(Y(omega)) = E[X mid Y = Y(omega)]
        $$

        for each outcome $omega$.



        This is just scratching the surface of the theory of conditioning.
        For a great reference, see chapters 21 and 23 of A Modern Approach to Probability by Fristedt and Gray.




        Some Takeaways




        1. Conditioning on a random variable is different from conditioning on an event.

        2. Expressions like $P(A mid Y)$ and $E[X mid Y]$ are random variables

        3. Expressions like $P(A mid Y = y)$ and $E[X mid Y = y]$ are real numbers.







        share|cite|improve this answer











        $endgroup$



        Conditioning on a random variable is much more subtle than conditioning on an event.



        Conditioning on an Event



        Recall that for an event $B$ with $P(B) > 0$ we define the conditional probability given $B$ by
        $$
        P(A mid B) = frac{P(A cap B)}{P(B)}
        $$

        for every event $A$. This defines a new probability measure $P( cdotmid B)$ on the underlying probability space, and if $X$ is a random variable which is either non-negative or $P$-integrable on $A$, then we have
        $$
        E[X mid B]
        = int X , dP( cdotmid B)
        = frac{1}{P(B)} int X mathbf{1}_B , dP.
        $$

        The intuitive interpretation is that $E[X mid B]$ is the "best guess" for what value $X$ takes, knowing that the event $B$ actually happens.
        This intuition is justified by the last integral above: we integrate $X$ with respect to $P$, but only on the event $B$ (and dividing by $P(B)$ is due to us concentrating all our attention on $B$ and hence re-weighting $B$ to have probability $1$).



        That's the easy case. To understand conditioning on a random variable, we need the more general idea of conditioning on information. A probability measure by itself gives us prior probabilities for all possible events. But probabilities that certain events happen change if we know that certain other events do or do not happen. That is, when we have information about whether certain events happen or not, we can update our probabilities for the remaining events.



        Conditioning on a Collection of Events



        Formally, suppose $mathcal{G}$ is a $sigma$-algebra of events. Assume that it is known whether each event in $mathcal{G}$ happens or not.
        We want to define the conditional probability $P( cdotmid mathcal{G})$ and the conditional expectation $E[ cdotmid mathcal{G}]$.
        The conditional probability $P(A mid mathcal{G})$ should reflect our updated probability of an event $A$ after knowing the information contained in $mathcal{G}$, and $E[X midmathcal{G}]$ should be our "best guess" for the value of a random variable $X$ using the information contained in $mathcal{G}$.



        (NB: Why should $mathcal{G}$ be a $sigma$-algebra and not a more general collection of events? Because if $mathcal{G}$ weren't a $sigma$ algebra but we know whether each event in $mathcal{G}$ happens or not, then we would know whether each event in the $sigma$-algebra generated by $mathcal{G}$ happens or not, so we might as well replace $mathcal{G}$ with $sigma(mathcal{G})$.)



        Conditional Probability



        Here's where things get interesting. $P(A midmathcal{G})$ is no longer just a number: it is a random variable!. We define $P(A midmathcal{G})$ to be any $mathcal{G}$-measurable random variable $X$ such that
        $$
        E[X mathbf{1}_B] = P(A cap B)
        $$

        for every event $B in mathcal{G}$.
        Moreover, if $X$ and $X^prime$ are two random variables satisfying this definition, then $X = X^prime$ almost surely.
        That is pretty abstract stuff, so hopefully an example can shed some light on the abstraction.



        Example.
        Let $(Omega, mathcal{F}, P)$ be a probability space, and let $B in mathcal{F}$ be an event with $0 < P(B) < 1$.
        Suppose $mathcal{G} = {emptyset, B, B^c, Omega}$.
        That is, $mathcal{G}$ is the $sigma$-algebra containing all the information about whether $B$ happens or not.
        Then for any event $A in mathcal{F}$ we have
        $$
        P(A mid mathcal{G})
        = P(A mid B) mathbf{1}_B + P(A mid B^c) mathbf{1}_{B^c}.
        $$

        That is, for an outcome $omega in Omega$, we have
        $$
        P(A mid mathcal{G})(omega) = P(A mid B)
        $$

        if $omega in B$ (i.e., if $B$ happens), and
        $$
        P(A mid mathcal{G})(omega) = P(A mid B^c)
        $$

        if $omega notin B$ (i.e., if $B$ doesn't happen).
        It is easy to check that this random variable actually satisfies the definition of the conditional probability $P(A mid mathcal{G})$ defined above.



        Conditional Expectation



        I mentioned already that conditional probabilities aren't unique, but they are unique almost surely.
        It turns out that if $X$ is a nonnegative or integrable random variable, $mathcal{G}$ is a $sigma$-algebra of events, and $Q$ is the distribution of $X$ (a Borel probability measure on $mathbb{R}$) then it is possible to choose versions of conditional probabilities $Q(B mid mathcal{G})$ for all Borel subsets $B$ of $mathbb{R}$ such that $Q( cdot mid mathcal{G})(omega)$ is a probability measure for each outcome $omega$. Given this possibility, we may define
        $$
        E[Xmidmathcal{G}]=int_{mathbb{R}} x , Q(dxmidmathcal{G}),
        $$

        which is again a random variable.
        It can be shown that this is the almost surely unique random variable $Y$ which is $mathcal{G}$-measurable and satisfies
        $$
        E[Y mathbf{1}_A] = E[X mathbf{1}_A]
        $$

        for all $A in mathcal{G}$.



        Conditioning on a Random Variable



        Given the general definitions of conditional probability and conditional expectation given above, we may easily define what it means to condition on a random variable $Y$: it means conditioning on the $sigma$-algebra generated by $Y$:
        $$
        sigma(Y)
        = big{{Y in B} : text{$B$ is a Borel subset of $mathbb{R}$}big}.
        $$

        I said "easy to define," but I am aware that that doesn't mean "easy to understand."
        But at least we can now say what an expression like $E[X mid Y]$ means: it is a random variable that satisfies
        $$
        E[E[X mid Y] mathbf{1}_A] = E[X mathbf{1}_A]
        $$

        for every event $A$ of the form $A = {Y in B}$ for some Borel subset $B$ of $mathbb{R}$.
        Wow, that's abstract!
        Fortunately, there are easy ways to work with $E[X mid Y]$ if $Y$ is discrete or absolutely continuous.




        $Y$ Discrete



        Suppose $Y$ takes values in a countable set $S subseteq mathbb{R}$.
        Then it can be shown that
        $$
        P(A mid Y)(omega) = P(A mid Y = Y(omega))
        $$

        for each outcome $omega$.
        The right-hand side above is shorthand for the more verbose
        $$
        P(A mid {Y = Y(omega)})
        $$

        where ${Y = Y(omega)}$ is the event
        $$
        {Y = Y(omega)}
        = {omega^prime : Y(omega^prime) = Y(omega)}.
        $$

        That is, if our outcome is $omega$, and $Y(omega) = k$, then
        $$
        P(A mid Y)(omega) = P(A mid Y = k) = frac{P(A cap {Y = k})}{P(Y = k)}.
        $$

        Similarly, if $X$ is another random variable taking values in $S$, then we have
        $$
        E[X mid Y](omega) = E[X mid Y = Y(omega)] = sum_{x in S} x P(X = x mid Y = Y(omega))
        $$




        $Y$ Absolutely Continuous



        Suppose now that $Y$ is absolutely continuous with density $f_Y$.
        Let $X$ be another absolutely continuous random variable, with density $f_X$.
        Let $f_{X, Y}$ be the joint density of $X$ and $Y$.
        Then we define the conditional density of $X$ given $Y = y$ by
        $$
        f_{Xmid Y}(x mid y) = frac{f_{X, Y}(x, y)}{f_Y(y)}
        = frac{f_{X, Y}(x, y)}{int_{mathbb{R}} x^prime f_{X, Y}(x^prime, y) , dx^prime}.
        $$

        Now we may define a function $g : mathbb{R} to mathbb{R}$ given by
        $$
        g(y)
        = E[X mid Y = y]
        = int_{mathbb{R}} x f_{X mid Y}(x mid y) , dx.
        $$

        In particular, $g(y) = E[X mid Y = y]$ is a real number for each $y$.
        Using this $g$, we can show that
        $$
        E[X mid Y] = g(Y),
        $$

        meaning that
        $$
        E[X mid Y](omega) = g(Y(omega)) = E[X mid Y = Y(omega)]
        $$

        for each outcome $omega$.



        This is just scratching the surface of the theory of conditioning.
        For a great reference, see chapters 21 and 23 of A Modern Approach to Probability by Fristedt and Gray.




        Some Takeaways




        1. Conditioning on a random variable is different from conditioning on an event.

        2. Expressions like $P(A mid Y)$ and $E[X mid Y]$ are random variables

        3. Expressions like $P(A mid Y = y)$ and $E[X mid Y = y]$ are real numbers.








        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited Mar 3 at 7:09

























        answered Mar 3 at 7:03









        Artem MavrinArtem Mavrin

        1,246712




        1,246712























            2












            $begingroup$

            It means that the value of the random variable Y is known. For example, suppose $E(X|Y)=10+Y^2$. Then if $Y=2, $E(X|Y=2)=14.$






            share|cite|improve this answer









            $endgroup$













            • $begingroup$
              Thanks for your explanation, I found that E(X|Y=y) is also possible, so here the y is still unspecified and E(X|Y=y)=10+y^2, in this case, do the condition only remove the randomness of the Y?
              $endgroup$
              – Yneedtobeserious
              Mar 3 at 4:07










            • $begingroup$
              $E(X|Y=y)$ is just a different notation. If you are conditioning on $Y$, that means $Y$ is known. $E(X|Y=y)=10+y^2$ is a function that gives expected value of $X$ conditional on $Y=y$ for an arbitrary value of $Y$.
              $endgroup$
              – user239680
              Mar 4 at 5:07










            • $begingroup$
              my understanding is that if E(X|Y=y), the Y is still not a known value, but a non-random unknown variable, which means the condition Y=y only removed the randomness of random variable Y. In summary, the conditioning random variable is either a specific value(like Y=5) or a unspecified non-random variable(like Y=y), but in either case, the Y is not treated as random anymore in terms of expectation of X. Am I right to think this way?
              $endgroup$
              – Yneedtobeserious
              Mar 4 at 7:38










            • $begingroup$
              Yes, that's fair. Our different explanations our just semantics.
              $endgroup$
              – user239680
              Mar 4 at 14:05
















            2












            $begingroup$

            It means that the value of the random variable Y is known. For example, suppose $E(X|Y)=10+Y^2$. Then if $Y=2, $E(X|Y=2)=14.$






            share|cite|improve this answer









            $endgroup$













            • $begingroup$
              Thanks for your explanation, I found that E(X|Y=y) is also possible, so here the y is still unspecified and E(X|Y=y)=10+y^2, in this case, do the condition only remove the randomness of the Y?
              $endgroup$
              – Yneedtobeserious
              Mar 3 at 4:07










            • $begingroup$
              $E(X|Y=y)$ is just a different notation. If you are conditioning on $Y$, that means $Y$ is known. $E(X|Y=y)=10+y^2$ is a function that gives expected value of $X$ conditional on $Y=y$ for an arbitrary value of $Y$.
              $endgroup$
              – user239680
              Mar 4 at 5:07










            • $begingroup$
              my understanding is that if E(X|Y=y), the Y is still not a known value, but a non-random unknown variable, which means the condition Y=y only removed the randomness of random variable Y. In summary, the conditioning random variable is either a specific value(like Y=5) or a unspecified non-random variable(like Y=y), but in either case, the Y is not treated as random anymore in terms of expectation of X. Am I right to think this way?
              $endgroup$
              – Yneedtobeserious
              Mar 4 at 7:38










            • $begingroup$
              Yes, that's fair. Our different explanations our just semantics.
              $endgroup$
              – user239680
              Mar 4 at 14:05














            2












            2








            2





            $begingroup$

            It means that the value of the random variable Y is known. For example, suppose $E(X|Y)=10+Y^2$. Then if $Y=2, $E(X|Y=2)=14.$






            share|cite|improve this answer









            $endgroup$



            It means that the value of the random variable Y is known. For example, suppose $E(X|Y)=10+Y^2$. Then if $Y=2, $E(X|Y=2)=14.$







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered Mar 3 at 3:52









            user239680user239680

            211




            211












            • $begingroup$
              Thanks for your explanation, I found that E(X|Y=y) is also possible, so here the y is still unspecified and E(X|Y=y)=10+y^2, in this case, do the condition only remove the randomness of the Y?
              $endgroup$
              – Yneedtobeserious
              Mar 3 at 4:07










            • $begingroup$
              $E(X|Y=y)$ is just a different notation. If you are conditioning on $Y$, that means $Y$ is known. $E(X|Y=y)=10+y^2$ is a function that gives expected value of $X$ conditional on $Y=y$ for an arbitrary value of $Y$.
              $endgroup$
              – user239680
              Mar 4 at 5:07










            • $begingroup$
              my understanding is that if E(X|Y=y), the Y is still not a known value, but a non-random unknown variable, which means the condition Y=y only removed the randomness of random variable Y. In summary, the conditioning random variable is either a specific value(like Y=5) or a unspecified non-random variable(like Y=y), but in either case, the Y is not treated as random anymore in terms of expectation of X. Am I right to think this way?
              $endgroup$
              – Yneedtobeserious
              Mar 4 at 7:38










            • $begingroup$
              Yes, that's fair. Our different explanations our just semantics.
              $endgroup$
              – user239680
              Mar 4 at 14:05


















            • $begingroup$
              Thanks for your explanation, I found that E(X|Y=y) is also possible, so here the y is still unspecified and E(X|Y=y)=10+y^2, in this case, do the condition only remove the randomness of the Y?
              $endgroup$
              – Yneedtobeserious
              Mar 3 at 4:07










            • $begingroup$
              $E(X|Y=y)$ is just a different notation. If you are conditioning on $Y$, that means $Y$ is known. $E(X|Y=y)=10+y^2$ is a function that gives expected value of $X$ conditional on $Y=y$ for an arbitrary value of $Y$.
              $endgroup$
              – user239680
              Mar 4 at 5:07










            • $begingroup$
              my understanding is that if E(X|Y=y), the Y is still not a known value, but a non-random unknown variable, which means the condition Y=y only removed the randomness of random variable Y. In summary, the conditioning random variable is either a specific value(like Y=5) or a unspecified non-random variable(like Y=y), but in either case, the Y is not treated as random anymore in terms of expectation of X. Am I right to think this way?
              $endgroup$
              – Yneedtobeserious
              Mar 4 at 7:38










            • $begingroup$
              Yes, that's fair. Our different explanations our just semantics.
              $endgroup$
              – user239680
              Mar 4 at 14:05
















            $begingroup$
            Thanks for your explanation, I found that E(X|Y=y) is also possible, so here the y is still unspecified and E(X|Y=y)=10+y^2, in this case, do the condition only remove the randomness of the Y?
            $endgroup$
            – Yneedtobeserious
            Mar 3 at 4:07




            $begingroup$
            Thanks for your explanation, I found that E(X|Y=y) is also possible, so here the y is still unspecified and E(X|Y=y)=10+y^2, in this case, do the condition only remove the randomness of the Y?
            $endgroup$
            – Yneedtobeserious
            Mar 3 at 4:07












            $begingroup$
            $E(X|Y=y)$ is just a different notation. If you are conditioning on $Y$, that means $Y$ is known. $E(X|Y=y)=10+y^2$ is a function that gives expected value of $X$ conditional on $Y=y$ for an arbitrary value of $Y$.
            $endgroup$
            – user239680
            Mar 4 at 5:07




            $begingroup$
            $E(X|Y=y)$ is just a different notation. If you are conditioning on $Y$, that means $Y$ is known. $E(X|Y=y)=10+y^2$ is a function that gives expected value of $X$ conditional on $Y=y$ for an arbitrary value of $Y$.
            $endgroup$
            – user239680
            Mar 4 at 5:07












            $begingroup$
            my understanding is that if E(X|Y=y), the Y is still not a known value, but a non-random unknown variable, which means the condition Y=y only removed the randomness of random variable Y. In summary, the conditioning random variable is either a specific value(like Y=5) or a unspecified non-random variable(like Y=y), but in either case, the Y is not treated as random anymore in terms of expectation of X. Am I right to think this way?
            $endgroup$
            – Yneedtobeserious
            Mar 4 at 7:38




            $begingroup$
            my understanding is that if E(X|Y=y), the Y is still not a known value, but a non-random unknown variable, which means the condition Y=y only removed the randomness of random variable Y. In summary, the conditioning random variable is either a specific value(like Y=5) or a unspecified non-random variable(like Y=y), but in either case, the Y is not treated as random anymore in terms of expectation of X. Am I right to think this way?
            $endgroup$
            – Yneedtobeserious
            Mar 4 at 7:38












            $begingroup$
            Yes, that's fair. Our different explanations our just semantics.
            $endgroup$
            – user239680
            Mar 4 at 14:05




            $begingroup$
            Yes, that's fair. Our different explanations our just semantics.
            $endgroup$
            – user239680
            Mar 4 at 14:05


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f395310%2fwhat-does-conditioning-on-a-random-variable-mean%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How do I know what Microsoft account the skydrive app is syncing to?

            When does type information flow backwards in C++?

            Grease: Live!