Probability of finding a $k$-subset in a $d$-dimensional random binary array of length $n$ with at least...












0












$begingroup$


For a somwehat simple version of the problem, imagine a group of ten people ($n=10$), for which we observe two binary attributes ($d= 2$). One of the attributes tells us if any given person wears glasses ($d_1$) the other if any given person wears a hat ($d_2$). We know that in total $5$ people wear hats and $6$ people wear glasses, but we don't know who wears what (random binary array). What is the probability of finding a subset of size $4$ ($k= 4$) such that at least three people in the subset wear a hat ($s_1 = 3$) AND at least four people wear glasses ($s_2 = 4$)? It's somewhat clear that there are realisations where wearing a hat and wearing glasses are mutually exclusive, such that those who have a hat don't have glasses, and realisations where having glasses and having a hat go together. Thus, for any realisation of the random binary array there might or might not be a subset of size $k$ with the above properties. I can't figure, though, how to calculate the probabilities.



The more general definition of the problem is this:



What is the probability of finding a $k$-subset in a $d$-dimensional random binary array of length $n$, such that this subset has at least $s_1$ ones in dimension $1$, $s_2$ ones in dimension  $2$, ..., $s_d$ ones in dimension $d$. The total number of ones per dimension ($ts_1$ to $ts_d$) are fixed.



Is there a name to this problem?  I tried to google, but I really did not know what to look for, since the problem definition is quite unwieldy to begin with.



Is the problem tractable analytically?  If yes does the solution scale to larger $d$, $k$ and $n$ when solved algorithmically?



Your help is highly appreciated! Thanks a lot in advance!










share|cite|improve this question











$endgroup$

















    0












    $begingroup$


    For a somwehat simple version of the problem, imagine a group of ten people ($n=10$), for which we observe two binary attributes ($d= 2$). One of the attributes tells us if any given person wears glasses ($d_1$) the other if any given person wears a hat ($d_2$). We know that in total $5$ people wear hats and $6$ people wear glasses, but we don't know who wears what (random binary array). What is the probability of finding a subset of size $4$ ($k= 4$) such that at least three people in the subset wear a hat ($s_1 = 3$) AND at least four people wear glasses ($s_2 = 4$)? It's somewhat clear that there are realisations where wearing a hat and wearing glasses are mutually exclusive, such that those who have a hat don't have glasses, and realisations where having glasses and having a hat go together. Thus, for any realisation of the random binary array there might or might not be a subset of size $k$ with the above properties. I can't figure, though, how to calculate the probabilities.



    The more general definition of the problem is this:



    What is the probability of finding a $k$-subset in a $d$-dimensional random binary array of length $n$, such that this subset has at least $s_1$ ones in dimension $1$, $s_2$ ones in dimension  $2$, ..., $s_d$ ones in dimension $d$. The total number of ones per dimension ($ts_1$ to $ts_d$) are fixed.



    Is there a name to this problem?  I tried to google, but I really did not know what to look for, since the problem definition is quite unwieldy to begin with.



    Is the problem tractable analytically?  If yes does the solution scale to larger $d$, $k$ and $n$ when solved algorithmically?



    Your help is highly appreciated! Thanks a lot in advance!










    share|cite|improve this question











    $endgroup$















      0












      0








      0





      $begingroup$


      For a somwehat simple version of the problem, imagine a group of ten people ($n=10$), for which we observe two binary attributes ($d= 2$). One of the attributes tells us if any given person wears glasses ($d_1$) the other if any given person wears a hat ($d_2$). We know that in total $5$ people wear hats and $6$ people wear glasses, but we don't know who wears what (random binary array). What is the probability of finding a subset of size $4$ ($k= 4$) such that at least three people in the subset wear a hat ($s_1 = 3$) AND at least four people wear glasses ($s_2 = 4$)? It's somewhat clear that there are realisations where wearing a hat and wearing glasses are mutually exclusive, such that those who have a hat don't have glasses, and realisations where having glasses and having a hat go together. Thus, for any realisation of the random binary array there might or might not be a subset of size $k$ with the above properties. I can't figure, though, how to calculate the probabilities.



      The more general definition of the problem is this:



      What is the probability of finding a $k$-subset in a $d$-dimensional random binary array of length $n$, such that this subset has at least $s_1$ ones in dimension $1$, $s_2$ ones in dimension  $2$, ..., $s_d$ ones in dimension $d$. The total number of ones per dimension ($ts_1$ to $ts_d$) are fixed.



      Is there a name to this problem?  I tried to google, but I really did not know what to look for, since the problem definition is quite unwieldy to begin with.



      Is the problem tractable analytically?  If yes does the solution scale to larger $d$, $k$ and $n$ when solved algorithmically?



      Your help is highly appreciated! Thanks a lot in advance!










      share|cite|improve this question











      $endgroup$




      For a somwehat simple version of the problem, imagine a group of ten people ($n=10$), for which we observe two binary attributes ($d= 2$). One of the attributes tells us if any given person wears glasses ($d_1$) the other if any given person wears a hat ($d_2$). We know that in total $5$ people wear hats and $6$ people wear glasses, but we don't know who wears what (random binary array). What is the probability of finding a subset of size $4$ ($k= 4$) such that at least three people in the subset wear a hat ($s_1 = 3$) AND at least four people wear glasses ($s_2 = 4$)? It's somewhat clear that there are realisations where wearing a hat and wearing glasses are mutually exclusive, such that those who have a hat don't have glasses, and realisations where having glasses and having a hat go together. Thus, for any realisation of the random binary array there might or might not be a subset of size $k$ with the above properties. I can't figure, though, how to calculate the probabilities.



      The more general definition of the problem is this:



      What is the probability of finding a $k$-subset in a $d$-dimensional random binary array of length $n$, such that this subset has at least $s_1$ ones in dimension $1$, $s_2$ ones in dimension  $2$, ..., $s_d$ ones in dimension $d$. The total number of ones per dimension ($ts_1$ to $ts_d$) are fixed.



      Is there a name to this problem?  I tried to google, but I really did not know what to look for, since the problem definition is quite unwieldy to begin with.



      Is the problem tractable analytically?  If yes does the solution scale to larger $d$, $k$ and $n$ when solved algorithmically?



      Your help is highly appreciated! Thanks a lot in advance!







      combinatorics random-variables






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Dec 8 '18 at 7:18









      Brahadeesh

      6,22742361




      6,22742361










      asked Dec 7 '18 at 23:56









      derpetermannderpetermann

      11




      11






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          $newcommand{prob}[1]{text{Pr}mathopen{}left[#1right]}$
          $newcommand{ksubsets}{binom{[N]}{k}}$



          I am a colleague of @derpetermann and wanted to share a attempt to solving this, in the hope that it clarifies things and from there on it may be easier for others to solve it. So this is the way I approached the problem...



          Among a population of $N$ individuals, every individual $n in [N]$ (I use the short notation [N] for ${1,2,..,N}$) is described by absence or presence of $D$ features:



          $$ X_{n,d} :=
          begin{cases}
          1 & text{Feature $d$ present in individual $n$} \
          0 & text{o/w}
          end{cases}
          $$



          We know the total number of occurrences of each feature, given by



          $$C_d := sum_{n in [N]} X_{n,d}$$



          Otherwise, for a fixed $d$, the $X_{n,d}$ are equally distributed. Now we want to know: What is the probability of finding a subset $S$ of $k$ individuals, such that every feature $d$ is present in at least $c_d$ individuals in $S$ (for given $c_d in [k]$). I.e. how can we compute (or) estimate the following probability?



          $$prob{bigcup_{S in binom{[N]}{k}} bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big) }$$



          Note: While the $X_{cdot,d}$ for different features are independent, the union is not over independent (nor disjoint) events. And that is where we got stuck. We can rewrite the inner part (the probability that a fixed $k$-subset $S$ satisfies our condition):



          $$Y_S := bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big)$$
          $$prob{Y_S} = prod_{d in [D]} frac{sum_{c=c_d}^{min(k, C_d)} binom{k}{c} binom{N-k}{C_d - c}}{binom{N}{C_d}}$$



          but have no idea of how to compute the union $prob{bigcup_{S in binom{[N]}{k}} Y_S}$. Even a useful bound upper bound on this probability would be highly appreciated.






          share|cite|improve this answer









          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "69"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3030523%2fprobability-of-finding-a-k-subset-in-a-d-dimensional-random-binary-array-of%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0












            $begingroup$

            $newcommand{prob}[1]{text{Pr}mathopen{}left[#1right]}$
            $newcommand{ksubsets}{binom{[N]}{k}}$



            I am a colleague of @derpetermann and wanted to share a attempt to solving this, in the hope that it clarifies things and from there on it may be easier for others to solve it. So this is the way I approached the problem...



            Among a population of $N$ individuals, every individual $n in [N]$ (I use the short notation [N] for ${1,2,..,N}$) is described by absence or presence of $D$ features:



            $$ X_{n,d} :=
            begin{cases}
            1 & text{Feature $d$ present in individual $n$} \
            0 & text{o/w}
            end{cases}
            $$



            We know the total number of occurrences of each feature, given by



            $$C_d := sum_{n in [N]} X_{n,d}$$



            Otherwise, for a fixed $d$, the $X_{n,d}$ are equally distributed. Now we want to know: What is the probability of finding a subset $S$ of $k$ individuals, such that every feature $d$ is present in at least $c_d$ individuals in $S$ (for given $c_d in [k]$). I.e. how can we compute (or) estimate the following probability?



            $$prob{bigcup_{S in binom{[N]}{k}} bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big) }$$



            Note: While the $X_{cdot,d}$ for different features are independent, the union is not over independent (nor disjoint) events. And that is where we got stuck. We can rewrite the inner part (the probability that a fixed $k$-subset $S$ satisfies our condition):



            $$Y_S := bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big)$$
            $$prob{Y_S} = prod_{d in [D]} frac{sum_{c=c_d}^{min(k, C_d)} binom{k}{c} binom{N-k}{C_d - c}}{binom{N}{C_d}}$$



            but have no idea of how to compute the union $prob{bigcup_{S in binom{[N]}{k}} Y_S}$. Even a useful bound upper bound on this probability would be highly appreciated.






            share|cite|improve this answer









            $endgroup$


















              0












              $begingroup$

              $newcommand{prob}[1]{text{Pr}mathopen{}left[#1right]}$
              $newcommand{ksubsets}{binom{[N]}{k}}$



              I am a colleague of @derpetermann and wanted to share a attempt to solving this, in the hope that it clarifies things and from there on it may be easier for others to solve it. So this is the way I approached the problem...



              Among a population of $N$ individuals, every individual $n in [N]$ (I use the short notation [N] for ${1,2,..,N}$) is described by absence or presence of $D$ features:



              $$ X_{n,d} :=
              begin{cases}
              1 & text{Feature $d$ present in individual $n$} \
              0 & text{o/w}
              end{cases}
              $$



              We know the total number of occurrences of each feature, given by



              $$C_d := sum_{n in [N]} X_{n,d}$$



              Otherwise, for a fixed $d$, the $X_{n,d}$ are equally distributed. Now we want to know: What is the probability of finding a subset $S$ of $k$ individuals, such that every feature $d$ is present in at least $c_d$ individuals in $S$ (for given $c_d in [k]$). I.e. how can we compute (or) estimate the following probability?



              $$prob{bigcup_{S in binom{[N]}{k}} bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big) }$$



              Note: While the $X_{cdot,d}$ for different features are independent, the union is not over independent (nor disjoint) events. And that is where we got stuck. We can rewrite the inner part (the probability that a fixed $k$-subset $S$ satisfies our condition):



              $$Y_S := bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big)$$
              $$prob{Y_S} = prod_{d in [D]} frac{sum_{c=c_d}^{min(k, C_d)} binom{k}{c} binom{N-k}{C_d - c}}{binom{N}{C_d}}$$



              but have no idea of how to compute the union $prob{bigcup_{S in binom{[N]}{k}} Y_S}$. Even a useful bound upper bound on this probability would be highly appreciated.






              share|cite|improve this answer









              $endgroup$
















                0












                0








                0





                $begingroup$

                $newcommand{prob}[1]{text{Pr}mathopen{}left[#1right]}$
                $newcommand{ksubsets}{binom{[N]}{k}}$



                I am a colleague of @derpetermann and wanted to share a attempt to solving this, in the hope that it clarifies things and from there on it may be easier for others to solve it. So this is the way I approached the problem...



                Among a population of $N$ individuals, every individual $n in [N]$ (I use the short notation [N] for ${1,2,..,N}$) is described by absence or presence of $D$ features:



                $$ X_{n,d} :=
                begin{cases}
                1 & text{Feature $d$ present in individual $n$} \
                0 & text{o/w}
                end{cases}
                $$



                We know the total number of occurrences of each feature, given by



                $$C_d := sum_{n in [N]} X_{n,d}$$



                Otherwise, for a fixed $d$, the $X_{n,d}$ are equally distributed. Now we want to know: What is the probability of finding a subset $S$ of $k$ individuals, such that every feature $d$ is present in at least $c_d$ individuals in $S$ (for given $c_d in [k]$). I.e. how can we compute (or) estimate the following probability?



                $$prob{bigcup_{S in binom{[N]}{k}} bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big) }$$



                Note: While the $X_{cdot,d}$ for different features are independent, the union is not over independent (nor disjoint) events. And that is where we got stuck. We can rewrite the inner part (the probability that a fixed $k$-subset $S$ satisfies our condition):



                $$Y_S := bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big)$$
                $$prob{Y_S} = prod_{d in [D]} frac{sum_{c=c_d}^{min(k, C_d)} binom{k}{c} binom{N-k}{C_d - c}}{binom{N}{C_d}}$$



                but have no idea of how to compute the union $prob{bigcup_{S in binom{[N]}{k}} Y_S}$. Even a useful bound upper bound on this probability would be highly appreciated.






                share|cite|improve this answer









                $endgroup$



                $newcommand{prob}[1]{text{Pr}mathopen{}left[#1right]}$
                $newcommand{ksubsets}{binom{[N]}{k}}$



                I am a colleague of @derpetermann and wanted to share a attempt to solving this, in the hope that it clarifies things and from there on it may be easier for others to solve it. So this is the way I approached the problem...



                Among a population of $N$ individuals, every individual $n in [N]$ (I use the short notation [N] for ${1,2,..,N}$) is described by absence or presence of $D$ features:



                $$ X_{n,d} :=
                begin{cases}
                1 & text{Feature $d$ present in individual $n$} \
                0 & text{o/w}
                end{cases}
                $$



                We know the total number of occurrences of each feature, given by



                $$C_d := sum_{n in [N]} X_{n,d}$$



                Otherwise, for a fixed $d$, the $X_{n,d}$ are equally distributed. Now we want to know: What is the probability of finding a subset $S$ of $k$ individuals, such that every feature $d$ is present in at least $c_d$ individuals in $S$ (for given $c_d in [k]$). I.e. how can we compute (or) estimate the following probability?



                $$prob{bigcup_{S in binom{[N]}{k}} bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big) }$$



                Note: While the $X_{cdot,d}$ for different features are independent, the union is not over independent (nor disjoint) events. And that is where we got stuck. We can rewrite the inner part (the probability that a fixed $k$-subset $S$ satisfies our condition):



                $$Y_S := bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big)$$
                $$prob{Y_S} = prod_{d in [D]} frac{sum_{c=c_d}^{min(k, C_d)} binom{k}{c} binom{N-k}{C_d - c}}{binom{N}{C_d}}$$



                but have no idea of how to compute the union $prob{bigcup_{S in binom{[N]}{k}} Y_S}$. Even a useful bound upper bound on this probability would be highly appreciated.







                share|cite|improve this answer












                share|cite|improve this answer



                share|cite|improve this answer










                answered Dec 20 '18 at 16:40









                NicoNico

                1




                1






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Mathematics Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3030523%2fprobability-of-finding-a-k-subset-in-a-d-dimensional-random-binary-array-of%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Probability when a professor distributes a quiz and homework assignment to a class of n students.

                    Aardman Animations

                    Are they similar matrix