What are the foundations of probability and how are they dependent upon a $sigma$-field?












19














I am reading Christopher D. Manning's Foundations of Statistical Natural Language Processing which gives an introduction on Probability Theory where it talks about $sigma$-fields. It says,




The foundations of probability theory depend on the set of events $mathscr{F}$ forming a $sigma$-field".




I understand the definition of a $sigma$-field, but what are these foundations of probability theory, and how are these foundations dependent upon a $sigma$-field?










share|cite|improve this question




















  • 5




    But this is not a textbook on probability theory. This is a book on Natural Language Processing which includes some preliminaries of probability theory -- this is not the same thing. If you want to understand the foundations of probability, consult a textbook on that, not on something that merely uses probabilities.
    – Clement C.
    Dec 16 at 0:48






  • 1




    @eddard Some people are concerned with the scope of your question being too broad ("foundations of probability theory" is a topic of considerable size). Would you mind editing the post to be only about the second bullet point?
    – Lord_Farin
    Dec 16 at 22:00






  • 2




    @Lord_Farin someone tried to focus it and improve the title. Then someone didn't like it and rolled it back. Not sure why, since that would make this a good question.
    – Don Hatch
    Dec 18 at 8:09
















19














I am reading Christopher D. Manning's Foundations of Statistical Natural Language Processing which gives an introduction on Probability Theory where it talks about $sigma$-fields. It says,




The foundations of probability theory depend on the set of events $mathscr{F}$ forming a $sigma$-field".




I understand the definition of a $sigma$-field, but what are these foundations of probability theory, and how are these foundations dependent upon a $sigma$-field?










share|cite|improve this question




















  • 5




    But this is not a textbook on probability theory. This is a book on Natural Language Processing which includes some preliminaries of probability theory -- this is not the same thing. If you want to understand the foundations of probability, consult a textbook on that, not on something that merely uses probabilities.
    – Clement C.
    Dec 16 at 0:48






  • 1




    @eddard Some people are concerned with the scope of your question being too broad ("foundations of probability theory" is a topic of considerable size). Would you mind editing the post to be only about the second bullet point?
    – Lord_Farin
    Dec 16 at 22:00






  • 2




    @Lord_Farin someone tried to focus it and improve the title. Then someone didn't like it and rolled it back. Not sure why, since that would make this a good question.
    – Don Hatch
    Dec 18 at 8:09














19












19








19


6





I am reading Christopher D. Manning's Foundations of Statistical Natural Language Processing which gives an introduction on Probability Theory where it talks about $sigma$-fields. It says,




The foundations of probability theory depend on the set of events $mathscr{F}$ forming a $sigma$-field".




I understand the definition of a $sigma$-field, but what are these foundations of probability theory, and how are these foundations dependent upon a $sigma$-field?










share|cite|improve this question















I am reading Christopher D. Manning's Foundations of Statistical Natural Language Processing which gives an introduction on Probability Theory where it talks about $sigma$-fields. It says,




The foundations of probability theory depend on the set of events $mathscr{F}$ forming a $sigma$-field".




I understand the definition of a $sigma$-field, but what are these foundations of probability theory, and how are these foundations dependent upon a $sigma$-field?







probability-theory measure-theory






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Dec 18 at 9:25









Mike Pierce

11.4k103583




11.4k103583










asked Dec 15 at 23:03









eddard.stark

20914




20914








  • 5




    But this is not a textbook on probability theory. This is a book on Natural Language Processing which includes some preliminaries of probability theory -- this is not the same thing. If you want to understand the foundations of probability, consult a textbook on that, not on something that merely uses probabilities.
    – Clement C.
    Dec 16 at 0:48






  • 1




    @eddard Some people are concerned with the scope of your question being too broad ("foundations of probability theory" is a topic of considerable size). Would you mind editing the post to be only about the second bullet point?
    – Lord_Farin
    Dec 16 at 22:00






  • 2




    @Lord_Farin someone tried to focus it and improve the title. Then someone didn't like it and rolled it back. Not sure why, since that would make this a good question.
    – Don Hatch
    Dec 18 at 8:09














  • 5




    But this is not a textbook on probability theory. This is a book on Natural Language Processing which includes some preliminaries of probability theory -- this is not the same thing. If you want to understand the foundations of probability, consult a textbook on that, not on something that merely uses probabilities.
    – Clement C.
    Dec 16 at 0:48






  • 1




    @eddard Some people are concerned with the scope of your question being too broad ("foundations of probability theory" is a topic of considerable size). Would you mind editing the post to be only about the second bullet point?
    – Lord_Farin
    Dec 16 at 22:00






  • 2




    @Lord_Farin someone tried to focus it and improve the title. Then someone didn't like it and rolled it back. Not sure why, since that would make this a good question.
    – Don Hatch
    Dec 18 at 8:09








5




5




But this is not a textbook on probability theory. This is a book on Natural Language Processing which includes some preliminaries of probability theory -- this is not the same thing. If you want to understand the foundations of probability, consult a textbook on that, not on something that merely uses probabilities.
– Clement C.
Dec 16 at 0:48




But this is not a textbook on probability theory. This is a book on Natural Language Processing which includes some preliminaries of probability theory -- this is not the same thing. If you want to understand the foundations of probability, consult a textbook on that, not on something that merely uses probabilities.
– Clement C.
Dec 16 at 0:48




1




1




@eddard Some people are concerned with the scope of your question being too broad ("foundations of probability theory" is a topic of considerable size). Would you mind editing the post to be only about the second bullet point?
– Lord_Farin
Dec 16 at 22:00




@eddard Some people are concerned with the scope of your question being too broad ("foundations of probability theory" is a topic of considerable size). Would you mind editing the post to be only about the second bullet point?
– Lord_Farin
Dec 16 at 22:00




2




2




@Lord_Farin someone tried to focus it and improve the title. Then someone didn't like it and rolled it back. Not sure why, since that would make this a good question.
– Don Hatch
Dec 18 at 8:09




@Lord_Farin someone tried to focus it and improve the title. Then someone didn't like it and rolled it back. Not sure why, since that would make this a good question.
– Don Hatch
Dec 18 at 8:09










2 Answers
2






active

oldest

votes


















28














Probability when there are only finitely many outcomes is a matter of counting. There are $36$ possible results from a roll of two dice and $6$ of them sum to $7$ so the probability of a sum of $7$ is $6/36$. You've measured the size of the set of outcomes that you are interested in.



It's harder to make rigorous sense of things when the set of possible results is infinite. What does it mean to choose two numbers at random in the interval $[1,6]$ and ask for their sum? Any particular pair, like $(1.3, pi)$, will have probability $0$.



You deal with this problem by replacing counting with integration. Unfortunately, the integration you learn in first year calculus ("Riemann integration") isn't powerful enough to derive all you need about probability. (It is enough to determine the probability that your two rolls total exactly $7$ is $0$, and to find the probability that it's at least $7$.)



For the definitions and theorems of rigorous probability theory (those are the "foundations" you ask about) you need "Lebesgue integration". That requires first carefully specifying the sets that you are going to ask for the probabilities of - and not every set is allowed, for technical reasons without which you can't make the mathematics work the way you want. It turns out that the set of sets whose probability you are going to ask about carries the name "$sigma$-field" or "sigma algebra". (It's not a field in the arithmetic sense.)
The essential point is that it's closed under countable set operations. That's what the "$sigma$" says. Your text may not provide a formal definition - you may not need it for NLP applications.






share|cite|improve this answer























  • I am reading it a couple of times. I think it is going to take me some time to understand all of it. Thanks for the answer anyways.
    – eddard.stark
    Dec 16 at 1:43






  • 1




    I find the following answer related and useful Why do we need sigma-algebras to define probability spaces
    – Jesper Hybel
    Dec 24 at 15:18










  • "Probability when there are only finitely many outcomes is a matter of counting." This is a rather dangerous simplification since it assumes a uniform distribution on those outcomes. Cue the old joke, "I have a 50% chance to win the lottery, since I'll either win or I'll lose."
    – user7530
    yesterday








  • 1




    @user7530 Fair enough. I suppose I could have said "simple algebra" or "weighted averages" instead of "counting". But as an introduction to an answer to the OP's question I think what I wrote is OK.
    – Ethan Bolker
    yesterday



















0














To add more to the answer by Ethan Bolker and flesh this out, probability functions are defined on sets, representing events, i.e. some set of outcomes of which we're interested in the probability of whatever we are querying or observing to happen as falling into or not, e.g. the probability that the temperature at noon tomorrow will be in the range $[25, 30]$.



Every probability function, which assigns to each event set $E$, itself a subset of the total set of possible outcomes, or sample space, $S$, is required satisfy the following rules, called the Kolmogorov axioms. The reason for this is they capture the most basic rules of how we expect probabilities to behave intuitively:




  1. Rule 1: There are no negative probabilities. That is, for every event $E$ we have $P(E) ge 0$. Since probabilities are meant to formalize the idea of "how many chances in..." is there for something to happen, it makes no sense to talk of a negative number of chances for the same reason that it makes no sense to talk of a negative number of apples. What does it mean to have -3 occurrences of something, or -6 apples held in my hand right now at this very moment in time?

  2. Rule 2: The probability of the entire sample space is 1. i.e. $P(S) = 1$. This should be intuitive, because at least some outcome must occur, and the set $S$ is the set of all possible outcomes, so whatever outcome occurs has to be within it. Thus the event $S$ will always occur no matter what.

  3. Rule 3: Probabilities of mutually exclusive events add. If we have an up-to-countable sequence of mutually exclusive events $E_1, E_2, E_3, cdots$, i.e. that $E_i cap E_j = emptyset$ for all possible pairs with $i ne j$, then we should have


$$P(E_1 cup E_2 cup E_3 ...) = sum_{i=1}^{infty} P(E_i)$$



Now as mentioned, we may not be able to assign every event a probability. For the case of a discrete sample space, i.e. where $S$ is a finite or at most countably infinite set, this may be doable. But for continuous sample spaces (e.g. $mathbb{R}$), there are subtleties that make it difficult to define a useful probability function in most cases for most sets using methods that are convenient to use such as integration, and thus we must restrict the domain of $P$ to not all subsets of $S$, but only some selected amount, which we call the $sigma$-field, usually denoted $Sigma$. That is, $mathrm{dom}(P) = Sigma subseteq 2^S$, and we are not thus allowed to consider events $E notin Sigma$. The definition of a $sigma$-field is just whatever is required to ensure that with regard to the above definition, all the sets involved in it make sense. Which basically means we must have




  1. Because of rule 2, in order for us to have $P(S) = 1$ we need $S$ to be in the domain of $P$ in the first place, so we must have $S in Sigma$.

  2. While this second stipulation is not strictly speaking required simply to make the above definition valid, we typically take that the complement $bar{E} = S backslash E$ of any event should be in $Sigma$. This is because very often we are interested in the probability of something NOT happening (e.g. the probability that a given number of people do NOT get better with some sort of medical treatment we are testing), and we want that question to make sense and thus must be able to have the event corresponding to this as an available input to our probability function $P$.

  3. Finally, so that rule 3 can make sense, given any countable sequence of members $E_1, E_2, E_3, cdots in Sigma$ we must have $(E_1 cup E_2 cup E_3 cup cdots) in Sigma$.


And that's about it.






share|cite|improve this answer





















    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3042059%2fwhat-are-the-foundations-of-probability-and-how-are-they-dependent-upon-a-sigm%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    28














    Probability when there are only finitely many outcomes is a matter of counting. There are $36$ possible results from a roll of two dice and $6$ of them sum to $7$ so the probability of a sum of $7$ is $6/36$. You've measured the size of the set of outcomes that you are interested in.



    It's harder to make rigorous sense of things when the set of possible results is infinite. What does it mean to choose two numbers at random in the interval $[1,6]$ and ask for their sum? Any particular pair, like $(1.3, pi)$, will have probability $0$.



    You deal with this problem by replacing counting with integration. Unfortunately, the integration you learn in first year calculus ("Riemann integration") isn't powerful enough to derive all you need about probability. (It is enough to determine the probability that your two rolls total exactly $7$ is $0$, and to find the probability that it's at least $7$.)



    For the definitions and theorems of rigorous probability theory (those are the "foundations" you ask about) you need "Lebesgue integration". That requires first carefully specifying the sets that you are going to ask for the probabilities of - and not every set is allowed, for technical reasons without which you can't make the mathematics work the way you want. It turns out that the set of sets whose probability you are going to ask about carries the name "$sigma$-field" or "sigma algebra". (It's not a field in the arithmetic sense.)
    The essential point is that it's closed under countable set operations. That's what the "$sigma$" says. Your text may not provide a formal definition - you may not need it for NLP applications.






    share|cite|improve this answer























    • I am reading it a couple of times. I think it is going to take me some time to understand all of it. Thanks for the answer anyways.
      – eddard.stark
      Dec 16 at 1:43






    • 1




      I find the following answer related and useful Why do we need sigma-algebras to define probability spaces
      – Jesper Hybel
      Dec 24 at 15:18










    • "Probability when there are only finitely many outcomes is a matter of counting." This is a rather dangerous simplification since it assumes a uniform distribution on those outcomes. Cue the old joke, "I have a 50% chance to win the lottery, since I'll either win or I'll lose."
      – user7530
      yesterday








    • 1




      @user7530 Fair enough. I suppose I could have said "simple algebra" or "weighted averages" instead of "counting". But as an introduction to an answer to the OP's question I think what I wrote is OK.
      – Ethan Bolker
      yesterday
















    28














    Probability when there are only finitely many outcomes is a matter of counting. There are $36$ possible results from a roll of two dice and $6$ of them sum to $7$ so the probability of a sum of $7$ is $6/36$. You've measured the size of the set of outcomes that you are interested in.



    It's harder to make rigorous sense of things when the set of possible results is infinite. What does it mean to choose two numbers at random in the interval $[1,6]$ and ask for their sum? Any particular pair, like $(1.3, pi)$, will have probability $0$.



    You deal with this problem by replacing counting with integration. Unfortunately, the integration you learn in first year calculus ("Riemann integration") isn't powerful enough to derive all you need about probability. (It is enough to determine the probability that your two rolls total exactly $7$ is $0$, and to find the probability that it's at least $7$.)



    For the definitions and theorems of rigorous probability theory (those are the "foundations" you ask about) you need "Lebesgue integration". That requires first carefully specifying the sets that you are going to ask for the probabilities of - and not every set is allowed, for technical reasons without which you can't make the mathematics work the way you want. It turns out that the set of sets whose probability you are going to ask about carries the name "$sigma$-field" or "sigma algebra". (It's not a field in the arithmetic sense.)
    The essential point is that it's closed under countable set operations. That's what the "$sigma$" says. Your text may not provide a formal definition - you may not need it for NLP applications.






    share|cite|improve this answer























    • I am reading it a couple of times. I think it is going to take me some time to understand all of it. Thanks for the answer anyways.
      – eddard.stark
      Dec 16 at 1:43






    • 1




      I find the following answer related and useful Why do we need sigma-algebras to define probability spaces
      – Jesper Hybel
      Dec 24 at 15:18










    • "Probability when there are only finitely many outcomes is a matter of counting." This is a rather dangerous simplification since it assumes a uniform distribution on those outcomes. Cue the old joke, "I have a 50% chance to win the lottery, since I'll either win or I'll lose."
      – user7530
      yesterday








    • 1




      @user7530 Fair enough. I suppose I could have said "simple algebra" or "weighted averages" instead of "counting". But as an introduction to an answer to the OP's question I think what I wrote is OK.
      – Ethan Bolker
      yesterday














    28












    28








    28






    Probability when there are only finitely many outcomes is a matter of counting. There are $36$ possible results from a roll of two dice and $6$ of them sum to $7$ so the probability of a sum of $7$ is $6/36$. You've measured the size of the set of outcomes that you are interested in.



    It's harder to make rigorous sense of things when the set of possible results is infinite. What does it mean to choose two numbers at random in the interval $[1,6]$ and ask for their sum? Any particular pair, like $(1.3, pi)$, will have probability $0$.



    You deal with this problem by replacing counting with integration. Unfortunately, the integration you learn in first year calculus ("Riemann integration") isn't powerful enough to derive all you need about probability. (It is enough to determine the probability that your two rolls total exactly $7$ is $0$, and to find the probability that it's at least $7$.)



    For the definitions and theorems of rigorous probability theory (those are the "foundations" you ask about) you need "Lebesgue integration". That requires first carefully specifying the sets that you are going to ask for the probabilities of - and not every set is allowed, for technical reasons without which you can't make the mathematics work the way you want. It turns out that the set of sets whose probability you are going to ask about carries the name "$sigma$-field" or "sigma algebra". (It's not a field in the arithmetic sense.)
    The essential point is that it's closed under countable set operations. That's what the "$sigma$" says. Your text may not provide a formal definition - you may not need it for NLP applications.






    share|cite|improve this answer














    Probability when there are only finitely many outcomes is a matter of counting. There are $36$ possible results from a roll of two dice and $6$ of them sum to $7$ so the probability of a sum of $7$ is $6/36$. You've measured the size of the set of outcomes that you are interested in.



    It's harder to make rigorous sense of things when the set of possible results is infinite. What does it mean to choose two numbers at random in the interval $[1,6]$ and ask for their sum? Any particular pair, like $(1.3, pi)$, will have probability $0$.



    You deal with this problem by replacing counting with integration. Unfortunately, the integration you learn in first year calculus ("Riemann integration") isn't powerful enough to derive all you need about probability. (It is enough to determine the probability that your two rolls total exactly $7$ is $0$, and to find the probability that it's at least $7$.)



    For the definitions and theorems of rigorous probability theory (those are the "foundations" you ask about) you need "Lebesgue integration". That requires first carefully specifying the sets that you are going to ask for the probabilities of - and not every set is allowed, for technical reasons without which you can't make the mathematics work the way you want. It turns out that the set of sets whose probability you are going to ask about carries the name "$sigma$-field" or "sigma algebra". (It's not a field in the arithmetic sense.)
    The essential point is that it's closed under countable set operations. That's what the "$sigma$" says. Your text may not provide a formal definition - you may not need it for NLP applications.







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Dec 16 at 22:17









    Lord_Farin

    15.5k636108




    15.5k636108










    answered Dec 16 at 1:00









    Ethan Bolker

    41.2k547108




    41.2k547108












    • I am reading it a couple of times. I think it is going to take me some time to understand all of it. Thanks for the answer anyways.
      – eddard.stark
      Dec 16 at 1:43






    • 1




      I find the following answer related and useful Why do we need sigma-algebras to define probability spaces
      – Jesper Hybel
      Dec 24 at 15:18










    • "Probability when there are only finitely many outcomes is a matter of counting." This is a rather dangerous simplification since it assumes a uniform distribution on those outcomes. Cue the old joke, "I have a 50% chance to win the lottery, since I'll either win or I'll lose."
      – user7530
      yesterday








    • 1




      @user7530 Fair enough. I suppose I could have said "simple algebra" or "weighted averages" instead of "counting". But as an introduction to an answer to the OP's question I think what I wrote is OK.
      – Ethan Bolker
      yesterday


















    • I am reading it a couple of times. I think it is going to take me some time to understand all of it. Thanks for the answer anyways.
      – eddard.stark
      Dec 16 at 1:43






    • 1




      I find the following answer related and useful Why do we need sigma-algebras to define probability spaces
      – Jesper Hybel
      Dec 24 at 15:18










    • "Probability when there are only finitely many outcomes is a matter of counting." This is a rather dangerous simplification since it assumes a uniform distribution on those outcomes. Cue the old joke, "I have a 50% chance to win the lottery, since I'll either win or I'll lose."
      – user7530
      yesterday








    • 1




      @user7530 Fair enough. I suppose I could have said "simple algebra" or "weighted averages" instead of "counting". But as an introduction to an answer to the OP's question I think what I wrote is OK.
      – Ethan Bolker
      yesterday
















    I am reading it a couple of times. I think it is going to take me some time to understand all of it. Thanks for the answer anyways.
    – eddard.stark
    Dec 16 at 1:43




    I am reading it a couple of times. I think it is going to take me some time to understand all of it. Thanks for the answer anyways.
    – eddard.stark
    Dec 16 at 1:43




    1




    1




    I find the following answer related and useful Why do we need sigma-algebras to define probability spaces
    – Jesper Hybel
    Dec 24 at 15:18




    I find the following answer related and useful Why do we need sigma-algebras to define probability spaces
    – Jesper Hybel
    Dec 24 at 15:18












    "Probability when there are only finitely many outcomes is a matter of counting." This is a rather dangerous simplification since it assumes a uniform distribution on those outcomes. Cue the old joke, "I have a 50% chance to win the lottery, since I'll either win or I'll lose."
    – user7530
    yesterday






    "Probability when there are only finitely many outcomes is a matter of counting." This is a rather dangerous simplification since it assumes a uniform distribution on those outcomes. Cue the old joke, "I have a 50% chance to win the lottery, since I'll either win or I'll lose."
    – user7530
    yesterday






    1




    1




    @user7530 Fair enough. I suppose I could have said "simple algebra" or "weighted averages" instead of "counting". But as an introduction to an answer to the OP's question I think what I wrote is OK.
    – Ethan Bolker
    yesterday




    @user7530 Fair enough. I suppose I could have said "simple algebra" or "weighted averages" instead of "counting". But as an introduction to an answer to the OP's question I think what I wrote is OK.
    – Ethan Bolker
    yesterday











    0














    To add more to the answer by Ethan Bolker and flesh this out, probability functions are defined on sets, representing events, i.e. some set of outcomes of which we're interested in the probability of whatever we are querying or observing to happen as falling into or not, e.g. the probability that the temperature at noon tomorrow will be in the range $[25, 30]$.



    Every probability function, which assigns to each event set $E$, itself a subset of the total set of possible outcomes, or sample space, $S$, is required satisfy the following rules, called the Kolmogorov axioms. The reason for this is they capture the most basic rules of how we expect probabilities to behave intuitively:




    1. Rule 1: There are no negative probabilities. That is, for every event $E$ we have $P(E) ge 0$. Since probabilities are meant to formalize the idea of "how many chances in..." is there for something to happen, it makes no sense to talk of a negative number of chances for the same reason that it makes no sense to talk of a negative number of apples. What does it mean to have -3 occurrences of something, or -6 apples held in my hand right now at this very moment in time?

    2. Rule 2: The probability of the entire sample space is 1. i.e. $P(S) = 1$. This should be intuitive, because at least some outcome must occur, and the set $S$ is the set of all possible outcomes, so whatever outcome occurs has to be within it. Thus the event $S$ will always occur no matter what.

    3. Rule 3: Probabilities of mutually exclusive events add. If we have an up-to-countable sequence of mutually exclusive events $E_1, E_2, E_3, cdots$, i.e. that $E_i cap E_j = emptyset$ for all possible pairs with $i ne j$, then we should have


    $$P(E_1 cup E_2 cup E_3 ...) = sum_{i=1}^{infty} P(E_i)$$



    Now as mentioned, we may not be able to assign every event a probability. For the case of a discrete sample space, i.e. where $S$ is a finite or at most countably infinite set, this may be doable. But for continuous sample spaces (e.g. $mathbb{R}$), there are subtleties that make it difficult to define a useful probability function in most cases for most sets using methods that are convenient to use such as integration, and thus we must restrict the domain of $P$ to not all subsets of $S$, but only some selected amount, which we call the $sigma$-field, usually denoted $Sigma$. That is, $mathrm{dom}(P) = Sigma subseteq 2^S$, and we are not thus allowed to consider events $E notin Sigma$. The definition of a $sigma$-field is just whatever is required to ensure that with regard to the above definition, all the sets involved in it make sense. Which basically means we must have




    1. Because of rule 2, in order for us to have $P(S) = 1$ we need $S$ to be in the domain of $P$ in the first place, so we must have $S in Sigma$.

    2. While this second stipulation is not strictly speaking required simply to make the above definition valid, we typically take that the complement $bar{E} = S backslash E$ of any event should be in $Sigma$. This is because very often we are interested in the probability of something NOT happening (e.g. the probability that a given number of people do NOT get better with some sort of medical treatment we are testing), and we want that question to make sense and thus must be able to have the event corresponding to this as an available input to our probability function $P$.

    3. Finally, so that rule 3 can make sense, given any countable sequence of members $E_1, E_2, E_3, cdots in Sigma$ we must have $(E_1 cup E_2 cup E_3 cup cdots) in Sigma$.


    And that's about it.






    share|cite|improve this answer


























      0














      To add more to the answer by Ethan Bolker and flesh this out, probability functions are defined on sets, representing events, i.e. some set of outcomes of which we're interested in the probability of whatever we are querying or observing to happen as falling into or not, e.g. the probability that the temperature at noon tomorrow will be in the range $[25, 30]$.



      Every probability function, which assigns to each event set $E$, itself a subset of the total set of possible outcomes, or sample space, $S$, is required satisfy the following rules, called the Kolmogorov axioms. The reason for this is they capture the most basic rules of how we expect probabilities to behave intuitively:




      1. Rule 1: There are no negative probabilities. That is, for every event $E$ we have $P(E) ge 0$. Since probabilities are meant to formalize the idea of "how many chances in..." is there for something to happen, it makes no sense to talk of a negative number of chances for the same reason that it makes no sense to talk of a negative number of apples. What does it mean to have -3 occurrences of something, or -6 apples held in my hand right now at this very moment in time?

      2. Rule 2: The probability of the entire sample space is 1. i.e. $P(S) = 1$. This should be intuitive, because at least some outcome must occur, and the set $S$ is the set of all possible outcomes, so whatever outcome occurs has to be within it. Thus the event $S$ will always occur no matter what.

      3. Rule 3: Probabilities of mutually exclusive events add. If we have an up-to-countable sequence of mutually exclusive events $E_1, E_2, E_3, cdots$, i.e. that $E_i cap E_j = emptyset$ for all possible pairs with $i ne j$, then we should have


      $$P(E_1 cup E_2 cup E_3 ...) = sum_{i=1}^{infty} P(E_i)$$



      Now as mentioned, we may not be able to assign every event a probability. For the case of a discrete sample space, i.e. where $S$ is a finite or at most countably infinite set, this may be doable. But for continuous sample spaces (e.g. $mathbb{R}$), there are subtleties that make it difficult to define a useful probability function in most cases for most sets using methods that are convenient to use such as integration, and thus we must restrict the domain of $P$ to not all subsets of $S$, but only some selected amount, which we call the $sigma$-field, usually denoted $Sigma$. That is, $mathrm{dom}(P) = Sigma subseteq 2^S$, and we are not thus allowed to consider events $E notin Sigma$. The definition of a $sigma$-field is just whatever is required to ensure that with regard to the above definition, all the sets involved in it make sense. Which basically means we must have




      1. Because of rule 2, in order for us to have $P(S) = 1$ we need $S$ to be in the domain of $P$ in the first place, so we must have $S in Sigma$.

      2. While this second stipulation is not strictly speaking required simply to make the above definition valid, we typically take that the complement $bar{E} = S backslash E$ of any event should be in $Sigma$. This is because very often we are interested in the probability of something NOT happening (e.g. the probability that a given number of people do NOT get better with some sort of medical treatment we are testing), and we want that question to make sense and thus must be able to have the event corresponding to this as an available input to our probability function $P$.

      3. Finally, so that rule 3 can make sense, given any countable sequence of members $E_1, E_2, E_3, cdots in Sigma$ we must have $(E_1 cup E_2 cup E_3 cup cdots) in Sigma$.


      And that's about it.






      share|cite|improve this answer
























        0












        0








        0






        To add more to the answer by Ethan Bolker and flesh this out, probability functions are defined on sets, representing events, i.e. some set of outcomes of which we're interested in the probability of whatever we are querying or observing to happen as falling into or not, e.g. the probability that the temperature at noon tomorrow will be in the range $[25, 30]$.



        Every probability function, which assigns to each event set $E$, itself a subset of the total set of possible outcomes, or sample space, $S$, is required satisfy the following rules, called the Kolmogorov axioms. The reason for this is they capture the most basic rules of how we expect probabilities to behave intuitively:




        1. Rule 1: There are no negative probabilities. That is, for every event $E$ we have $P(E) ge 0$. Since probabilities are meant to formalize the idea of "how many chances in..." is there for something to happen, it makes no sense to talk of a negative number of chances for the same reason that it makes no sense to talk of a negative number of apples. What does it mean to have -3 occurrences of something, or -6 apples held in my hand right now at this very moment in time?

        2. Rule 2: The probability of the entire sample space is 1. i.e. $P(S) = 1$. This should be intuitive, because at least some outcome must occur, and the set $S$ is the set of all possible outcomes, so whatever outcome occurs has to be within it. Thus the event $S$ will always occur no matter what.

        3. Rule 3: Probabilities of mutually exclusive events add. If we have an up-to-countable sequence of mutually exclusive events $E_1, E_2, E_3, cdots$, i.e. that $E_i cap E_j = emptyset$ for all possible pairs with $i ne j$, then we should have


        $$P(E_1 cup E_2 cup E_3 ...) = sum_{i=1}^{infty} P(E_i)$$



        Now as mentioned, we may not be able to assign every event a probability. For the case of a discrete sample space, i.e. where $S$ is a finite or at most countably infinite set, this may be doable. But for continuous sample spaces (e.g. $mathbb{R}$), there are subtleties that make it difficult to define a useful probability function in most cases for most sets using methods that are convenient to use such as integration, and thus we must restrict the domain of $P$ to not all subsets of $S$, but only some selected amount, which we call the $sigma$-field, usually denoted $Sigma$. That is, $mathrm{dom}(P) = Sigma subseteq 2^S$, and we are not thus allowed to consider events $E notin Sigma$. The definition of a $sigma$-field is just whatever is required to ensure that with regard to the above definition, all the sets involved in it make sense. Which basically means we must have




        1. Because of rule 2, in order for us to have $P(S) = 1$ we need $S$ to be in the domain of $P$ in the first place, so we must have $S in Sigma$.

        2. While this second stipulation is not strictly speaking required simply to make the above definition valid, we typically take that the complement $bar{E} = S backslash E$ of any event should be in $Sigma$. This is because very often we are interested in the probability of something NOT happening (e.g. the probability that a given number of people do NOT get better with some sort of medical treatment we are testing), and we want that question to make sense and thus must be able to have the event corresponding to this as an available input to our probability function $P$.

        3. Finally, so that rule 3 can make sense, given any countable sequence of members $E_1, E_2, E_3, cdots in Sigma$ we must have $(E_1 cup E_2 cup E_3 cup cdots) in Sigma$.


        And that's about it.






        share|cite|improve this answer












        To add more to the answer by Ethan Bolker and flesh this out, probability functions are defined on sets, representing events, i.e. some set of outcomes of which we're interested in the probability of whatever we are querying or observing to happen as falling into or not, e.g. the probability that the temperature at noon tomorrow will be in the range $[25, 30]$.



        Every probability function, which assigns to each event set $E$, itself a subset of the total set of possible outcomes, or sample space, $S$, is required satisfy the following rules, called the Kolmogorov axioms. The reason for this is they capture the most basic rules of how we expect probabilities to behave intuitively:




        1. Rule 1: There are no negative probabilities. That is, for every event $E$ we have $P(E) ge 0$. Since probabilities are meant to formalize the idea of "how many chances in..." is there for something to happen, it makes no sense to talk of a negative number of chances for the same reason that it makes no sense to talk of a negative number of apples. What does it mean to have -3 occurrences of something, or -6 apples held in my hand right now at this very moment in time?

        2. Rule 2: The probability of the entire sample space is 1. i.e. $P(S) = 1$. This should be intuitive, because at least some outcome must occur, and the set $S$ is the set of all possible outcomes, so whatever outcome occurs has to be within it. Thus the event $S$ will always occur no matter what.

        3. Rule 3: Probabilities of mutually exclusive events add. If we have an up-to-countable sequence of mutually exclusive events $E_1, E_2, E_3, cdots$, i.e. that $E_i cap E_j = emptyset$ for all possible pairs with $i ne j$, then we should have


        $$P(E_1 cup E_2 cup E_3 ...) = sum_{i=1}^{infty} P(E_i)$$



        Now as mentioned, we may not be able to assign every event a probability. For the case of a discrete sample space, i.e. where $S$ is a finite or at most countably infinite set, this may be doable. But for continuous sample spaces (e.g. $mathbb{R}$), there are subtleties that make it difficult to define a useful probability function in most cases for most sets using methods that are convenient to use such as integration, and thus we must restrict the domain of $P$ to not all subsets of $S$, but only some selected amount, which we call the $sigma$-field, usually denoted $Sigma$. That is, $mathrm{dom}(P) = Sigma subseteq 2^S$, and we are not thus allowed to consider events $E notin Sigma$. The definition of a $sigma$-field is just whatever is required to ensure that with regard to the above definition, all the sets involved in it make sense. Which basically means we must have




        1. Because of rule 2, in order for us to have $P(S) = 1$ we need $S$ to be in the domain of $P$ in the first place, so we must have $S in Sigma$.

        2. While this second stipulation is not strictly speaking required simply to make the above definition valid, we typically take that the complement $bar{E} = S backslash E$ of any event should be in $Sigma$. This is because very often we are interested in the probability of something NOT happening (e.g. the probability that a given number of people do NOT get better with some sort of medical treatment we are testing), and we want that question to make sense and thus must be able to have the event corresponding to this as an available input to our probability function $P$.

        3. Finally, so that rule 3 can make sense, given any countable sequence of members $E_1, E_2, E_3, cdots in Sigma$ we must have $(E_1 cup E_2 cup E_3 cup cdots) in Sigma$.


        And that's about it.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered 12 hours ago









        The_Sympathizer

        7,1672243




        7,1672243






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3042059%2fwhat-are-the-foundations-of-probability-and-how-are-they-dependent-upon-a-sigm%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Probability when a professor distributes a quiz and homework assignment to a class of n students.

            Aardman Animations

            Are they similar matrix