Why is there no permutation in Regexes? (Even if regular languages seem to be able to do this)











up vote
10
down vote

favorite
3












The Problem



There is no easy way to get a permutation with a regex.





  • Permutation: Getting a word $$w=x_1…x_n$$ ("aabc") to another order, without changing number or kind of letters.


  • Regex: Regular expression.


For verification:





  • "Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.


  • "How to find all permutations of a given word in a given text" – The answer doesn't use regexes either.


  • "Regex to match all {1, 2, 3, 4} without repetition" – The answer uses regexes, but it's neither adaptable nor simple.

  • This answer even claims: "A regular expression cannot do what you're asking for. It cannot generate permutations from a string".


The kind of solution I am searching for



It should have the form:




  • »aabc« (or anything else you could use a opening and closing parentheses)

  • (aabc)! (similar to (abc)? but with another symbol in the end)

  • [aabc]! (similar to [abc]+ but with another symbol in the end)


Advantages of these solutions



They are:




  • easy

  • adaptable

  • reusable


Why this should exist




  • Regexes are a way to describe a grammar of a regular language. They have the full power to be any kind of regular language.

  • Let's say, regular languages are powerful enough for permutations (proof below) – why is there no easy way to express this?


So my question is:




  • (Why) Is my proof wrong?

  • If it is right: Why is there no easy way to express permutations?


The proof




  • Regular expressions are one way to note the grammar of a regular language. They can describe any regular languages grammar.

  • Another way to describe any regular languages (that have a finite number of letters within their alphabet) grammar are non-deterministic Automatons (with a finite number of states).


Having a finite number of letters I can create this automaton: (Example. Formal: see below)



Grammar that accepts permutations of "abbc":



(sry for numbers on top, maybe someone knows how to make this part looking better)



s -> ah¹



s -> bh²



s -> ch³



h¹ -> bh¹¹



h¹ -> ch¹²



h² -> ah¹¹ (no typo! equivalence)



h² -> bh²²



h² -> ch²³



h³ -> ah¹²



h³ -> bh²³



h¹¹ -> bc



h¹¹ -> cb



h¹² -> bb



h²² -> ac



h²² -> ca



h²³ -> ab



h²³ -> ba



More formal: (using a finite-state-automaton but this could be made with grammar as well)




  • A word q (with finite length) to which any permutation should reach an accepting state.

  • X is the finite alphabet.

  • Set of states S contains any order of letters up to the length of q. (So the size of S is finite.) Plus one state of "any longer word".

  • state transition function d which takes a letter and moves on the state that corresponds to the now read part of the word.

  • F is a set of that states that are exact permutations of q.


So it is possible to create a finite-state automaton for accepting permutations of a given word.



Moving on with the proof



So I have proven that regular languages have the power to check for permutations, haven't I?



So why is there no approach to reach this with Regexes? It's a useful functionality.










share|cite|improve this question




















  • 9




    You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
    – Yuval Filmus
    Nov 17 at 8:12






  • 6




    I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
    – Yuval Filmus
    Nov 17 at 8:18












  • The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated: ^(a()|a()|b()|c()){4}2345$ seems to work (see regex101.com/r/9URPpg/4/tests).
    – boboquack
    Nov 18 at 6:13








  • 5




    @boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
    – David Richerby
    Nov 18 at 19:04















up vote
10
down vote

favorite
3












The Problem



There is no easy way to get a permutation with a regex.





  • Permutation: Getting a word $$w=x_1…x_n$$ ("aabc") to another order, without changing number or kind of letters.


  • Regex: Regular expression.


For verification:





  • "Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.


  • "How to find all permutations of a given word in a given text" – The answer doesn't use regexes either.


  • "Regex to match all {1, 2, 3, 4} without repetition" – The answer uses regexes, but it's neither adaptable nor simple.

  • This answer even claims: "A regular expression cannot do what you're asking for. It cannot generate permutations from a string".


The kind of solution I am searching for



It should have the form:




  • »aabc« (or anything else you could use a opening and closing parentheses)

  • (aabc)! (similar to (abc)? but with another symbol in the end)

  • [aabc]! (similar to [abc]+ but with another symbol in the end)


Advantages of these solutions



They are:




  • easy

  • adaptable

  • reusable


Why this should exist




  • Regexes are a way to describe a grammar of a regular language. They have the full power to be any kind of regular language.

  • Let's say, regular languages are powerful enough for permutations (proof below) – why is there no easy way to express this?


So my question is:




  • (Why) Is my proof wrong?

  • If it is right: Why is there no easy way to express permutations?


The proof




  • Regular expressions are one way to note the grammar of a regular language. They can describe any regular languages grammar.

  • Another way to describe any regular languages (that have a finite number of letters within their alphabet) grammar are non-deterministic Automatons (with a finite number of states).


Having a finite number of letters I can create this automaton: (Example. Formal: see below)



Grammar that accepts permutations of "abbc":



(sry for numbers on top, maybe someone knows how to make this part looking better)



s -> ah¹



s -> bh²



s -> ch³



h¹ -> bh¹¹



h¹ -> ch¹²



h² -> ah¹¹ (no typo! equivalence)



h² -> bh²²



h² -> ch²³



h³ -> ah¹²



h³ -> bh²³



h¹¹ -> bc



h¹¹ -> cb



h¹² -> bb



h²² -> ac



h²² -> ca



h²³ -> ab



h²³ -> ba



More formal: (using a finite-state-automaton but this could be made with grammar as well)




  • A word q (with finite length) to which any permutation should reach an accepting state.

  • X is the finite alphabet.

  • Set of states S contains any order of letters up to the length of q. (So the size of S is finite.) Plus one state of "any longer word".

  • state transition function d which takes a letter and moves on the state that corresponds to the now read part of the word.

  • F is a set of that states that are exact permutations of q.


So it is possible to create a finite-state automaton for accepting permutations of a given word.



Moving on with the proof



So I have proven that regular languages have the power to check for permutations, haven't I?



So why is there no approach to reach this with Regexes? It's a useful functionality.










share|cite|improve this question




















  • 9




    You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
    – Yuval Filmus
    Nov 17 at 8:12






  • 6




    I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
    – Yuval Filmus
    Nov 17 at 8:18












  • The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated: ^(a()|a()|b()|c()){4}2345$ seems to work (see regex101.com/r/9URPpg/4/tests).
    – boboquack
    Nov 18 at 6:13








  • 5




    @boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
    – David Richerby
    Nov 18 at 19:04













up vote
10
down vote

favorite
3









up vote
10
down vote

favorite
3






3





The Problem



There is no easy way to get a permutation with a regex.





  • Permutation: Getting a word $$w=x_1…x_n$$ ("aabc") to another order, without changing number or kind of letters.


  • Regex: Regular expression.


For verification:





  • "Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.


  • "How to find all permutations of a given word in a given text" – The answer doesn't use regexes either.


  • "Regex to match all {1, 2, 3, 4} without repetition" – The answer uses regexes, but it's neither adaptable nor simple.

  • This answer even claims: "A regular expression cannot do what you're asking for. It cannot generate permutations from a string".


The kind of solution I am searching for



It should have the form:




  • »aabc« (or anything else you could use a opening and closing parentheses)

  • (aabc)! (similar to (abc)? but with another symbol in the end)

  • [aabc]! (similar to [abc]+ but with another symbol in the end)


Advantages of these solutions



They are:




  • easy

  • adaptable

  • reusable


Why this should exist




  • Regexes are a way to describe a grammar of a regular language. They have the full power to be any kind of regular language.

  • Let's say, regular languages are powerful enough for permutations (proof below) – why is there no easy way to express this?


So my question is:




  • (Why) Is my proof wrong?

  • If it is right: Why is there no easy way to express permutations?


The proof




  • Regular expressions are one way to note the grammar of a regular language. They can describe any regular languages grammar.

  • Another way to describe any regular languages (that have a finite number of letters within their alphabet) grammar are non-deterministic Automatons (with a finite number of states).


Having a finite number of letters I can create this automaton: (Example. Formal: see below)



Grammar that accepts permutations of "abbc":



(sry for numbers on top, maybe someone knows how to make this part looking better)



s -> ah¹



s -> bh²



s -> ch³



h¹ -> bh¹¹



h¹ -> ch¹²



h² -> ah¹¹ (no typo! equivalence)



h² -> bh²²



h² -> ch²³



h³ -> ah¹²



h³ -> bh²³



h¹¹ -> bc



h¹¹ -> cb



h¹² -> bb



h²² -> ac



h²² -> ca



h²³ -> ab



h²³ -> ba



More formal: (using a finite-state-automaton but this could be made with grammar as well)




  • A word q (with finite length) to which any permutation should reach an accepting state.

  • X is the finite alphabet.

  • Set of states S contains any order of letters up to the length of q. (So the size of S is finite.) Plus one state of "any longer word".

  • state transition function d which takes a letter and moves on the state that corresponds to the now read part of the word.

  • F is a set of that states that are exact permutations of q.


So it is possible to create a finite-state automaton for accepting permutations of a given word.



Moving on with the proof



So I have proven that regular languages have the power to check for permutations, haven't I?



So why is there no approach to reach this with Regexes? It's a useful functionality.










share|cite|improve this question















The Problem



There is no easy way to get a permutation with a regex.





  • Permutation: Getting a word $$w=x_1…x_n$$ ("aabc") to another order, without changing number or kind of letters.


  • Regex: Regular expression.


For verification:





  • "Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.


  • "How to find all permutations of a given word in a given text" – The answer doesn't use regexes either.


  • "Regex to match all {1, 2, 3, 4} without repetition" – The answer uses regexes, but it's neither adaptable nor simple.

  • This answer even claims: "A regular expression cannot do what you're asking for. It cannot generate permutations from a string".


The kind of solution I am searching for



It should have the form:




  • »aabc« (or anything else you could use a opening and closing parentheses)

  • (aabc)! (similar to (abc)? but with another symbol in the end)

  • [aabc]! (similar to [abc]+ but with another symbol in the end)


Advantages of these solutions



They are:




  • easy

  • adaptable

  • reusable


Why this should exist




  • Regexes are a way to describe a grammar of a regular language. They have the full power to be any kind of regular language.

  • Let's say, regular languages are powerful enough for permutations (proof below) – why is there no easy way to express this?


So my question is:




  • (Why) Is my proof wrong?

  • If it is right: Why is there no easy way to express permutations?


The proof




  • Regular expressions are one way to note the grammar of a regular language. They can describe any regular languages grammar.

  • Another way to describe any regular languages (that have a finite number of letters within their alphabet) grammar are non-deterministic Automatons (with a finite number of states).


Having a finite number of letters I can create this automaton: (Example. Formal: see below)



Grammar that accepts permutations of "abbc":



(sry for numbers on top, maybe someone knows how to make this part looking better)



s -> ah¹



s -> bh²



s -> ch³



h¹ -> bh¹¹



h¹ -> ch¹²



h² -> ah¹¹ (no typo! equivalence)



h² -> bh²²



h² -> ch²³



h³ -> ah¹²



h³ -> bh²³



h¹¹ -> bc



h¹¹ -> cb



h¹² -> bb



h²² -> ac



h²² -> ca



h²³ -> ab



h²³ -> ba



More formal: (using a finite-state-automaton but this could be made with grammar as well)




  • A word q (with finite length) to which any permutation should reach an accepting state.

  • X is the finite alphabet.

  • Set of states S contains any order of letters up to the length of q. (So the size of S is finite.) Plus one state of "any longer word".

  • state transition function d which takes a letter and moves on the state that corresponds to the now read part of the word.

  • F is a set of that states that are exact permutations of q.


So it is possible to create a finite-state automaton for accepting permutations of a given word.



Moving on with the proof



So I have proven that regular languages have the power to check for permutations, haven't I?



So why is there no approach to reach this with Regexes? It's a useful functionality.







formal-languages regular-languages regular-expressions






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Nov 17 at 13:08









Glorfindel

1741110




1741110










asked Nov 17 at 6:46









Asqiir

1568




1568








  • 9




    You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
    – Yuval Filmus
    Nov 17 at 8:12






  • 6




    I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
    – Yuval Filmus
    Nov 17 at 8:18












  • The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated: ^(a()|a()|b()|c()){4}2345$ seems to work (see regex101.com/r/9URPpg/4/tests).
    – boboquack
    Nov 18 at 6:13








  • 5




    @boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
    – David Richerby
    Nov 18 at 19:04














  • 9




    You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
    – Yuval Filmus
    Nov 17 at 8:12






  • 6




    I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
    – Yuval Filmus
    Nov 17 at 8:18












  • The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated: ^(a()|a()|b()|c()){4}2345$ seems to work (see regex101.com/r/9URPpg/4/tests).
    – boboquack
    Nov 18 at 6:13








  • 5




    @boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
    – David Richerby
    Nov 18 at 19:04








9




9




You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
– Yuval Filmus
Nov 17 at 8:12




You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
– Yuval Filmus
Nov 17 at 8:12




6




6




I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
– Yuval Filmus
Nov 17 at 8:18






I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
– Yuval Filmus
Nov 17 at 8:18














The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated: ^(a()|a()|b()|c()){4}2345$ seems to work (see regex101.com/r/9URPpg/4/tests).
– boboquack
Nov 18 at 6:13






The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated: ^(a()|a()|b()|c()){4}2345$ seems to work (see regex101.com/r/9URPpg/4/tests).
– boboquack
Nov 18 at 6:13






5




5




@boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
– David Richerby
Nov 18 at 19:04




@boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
– David Richerby
Nov 18 at 19:04










3 Answers
3






active

oldest

votes

















up vote
34
down vote



accepted










The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.



Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.



So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.






share|cite|improve this answer























  • But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
    – Asqiir
    Nov 17 at 13:11








  • 7




    @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
    – David Richerby
    Nov 17 at 14:02






  • 3




    @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
    – David Richerby
    Nov 17 at 14:08






  • 1




    You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
    – Asqiir
    Nov 17 at 14:12






  • 1




    Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
    – reinierpost
    Nov 18 at 19:36




















up vote
13
down vote














So my question is:




  • (Why) Is my proof wrong?

  • If it is right: Why is there no easy way to express permutations?




Your "proof" only looked at permutations of single words, which are finite languages.



Every finite language is regular (e.g. just by listing all of the members with a | inbetween), but there are infinite regular languages (and those are generally the more interesting ones).



As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the * operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).



The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.






share|cite|improve this answer




























    up vote
    7
    down vote













    Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).



    So in some sense, there is no succinct way to specify all permutations of a word.





    Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.



    We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:





    • $x_iy_i in L$.

    • If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.


    Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.



    Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.






    share|cite|improve this answer























    • Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
      – Asqiir
      Nov 17 at 8:15






    • 1




      Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
      – Yuval Filmus
      Nov 17 at 8:17






    • 1




      What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
      – Asqiir
      Nov 17 at 8:23






    • 1




      Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
      – Yuval Filmus
      Nov 17 at 8:27






    • 1




      For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
      – Asqiir
      Nov 17 at 14:13











    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "419"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f100206%2fwhy-is-there-no-permutation-in-regexes-even-if-regular-languages-seem-to-be-ab%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    34
    down vote



    accepted










    The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.



    Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.



    So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.






    share|cite|improve this answer























    • But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
      – Asqiir
      Nov 17 at 13:11








    • 7




      @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
      – David Richerby
      Nov 17 at 14:02






    • 3




      @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
      – David Richerby
      Nov 17 at 14:08






    • 1




      You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
      – Asqiir
      Nov 17 at 14:12






    • 1




      Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
      – reinierpost
      Nov 18 at 19:36

















    up vote
    34
    down vote



    accepted










    The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.



    Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.



    So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.






    share|cite|improve this answer























    • But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
      – Asqiir
      Nov 17 at 13:11








    • 7




      @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
      – David Richerby
      Nov 17 at 14:02






    • 3




      @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
      – David Richerby
      Nov 17 at 14:08






    • 1




      You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
      – Asqiir
      Nov 17 at 14:12






    • 1




      Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
      – reinierpost
      Nov 18 at 19:36















    up vote
    34
    down vote



    accepted







    up vote
    34
    down vote



    accepted






    The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.



    Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.



    So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.






    share|cite|improve this answer














    The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.



    Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.



    So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Nov 19 at 10:03









    Laurel

    1034




    1034










    answered Nov 17 at 10:10









    David Richerby

    64.7k1597186




    64.7k1597186












    • But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
      – Asqiir
      Nov 17 at 13:11








    • 7




      @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
      – David Richerby
      Nov 17 at 14:02






    • 3




      @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
      – David Richerby
      Nov 17 at 14:08






    • 1




      You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
      – Asqiir
      Nov 17 at 14:12






    • 1




      Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
      – reinierpost
      Nov 18 at 19:36




















    • But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
      – Asqiir
      Nov 17 at 13:11








    • 7




      @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
      – David Richerby
      Nov 17 at 14:02






    • 3




      @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
      – David Richerby
      Nov 17 at 14:08






    • 1




      You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
      – Asqiir
      Nov 17 at 14:12






    • 1




      Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
      – reinierpost
      Nov 18 at 19:36


















    But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
    – Asqiir
    Nov 17 at 13:11






    But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
    – Asqiir
    Nov 17 at 13:11






    7




    7




    @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
    – David Richerby
    Nov 17 at 14:02




    @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
    – David Richerby
    Nov 17 at 14:02




    3




    3




    @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
    – David Richerby
    Nov 17 at 14:08




    @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
    – David Richerby
    Nov 17 at 14:08




    1




    1




    You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
    – Asqiir
    Nov 17 at 14:12




    You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
    – Asqiir
    Nov 17 at 14:12




    1




    1




    Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
    – reinierpost
    Nov 18 at 19:36






    Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
    – reinierpost
    Nov 18 at 19:36












    up vote
    13
    down vote














    So my question is:




    • (Why) Is my proof wrong?

    • If it is right: Why is there no easy way to express permutations?




    Your "proof" only looked at permutations of single words, which are finite languages.



    Every finite language is regular (e.g. just by listing all of the members with a | inbetween), but there are infinite regular languages (and those are generally the more interesting ones).



    As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the * operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).



    The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.






    share|cite|improve this answer

























      up vote
      13
      down vote














      So my question is:




      • (Why) Is my proof wrong?

      • If it is right: Why is there no easy way to express permutations?




      Your "proof" only looked at permutations of single words, which are finite languages.



      Every finite language is regular (e.g. just by listing all of the members with a | inbetween), but there are infinite regular languages (and those are generally the more interesting ones).



      As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the * operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).



      The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.






      share|cite|improve this answer























        up vote
        13
        down vote










        up vote
        13
        down vote










        So my question is:




        • (Why) Is my proof wrong?

        • If it is right: Why is there no easy way to express permutations?




        Your "proof" only looked at permutations of single words, which are finite languages.



        Every finite language is regular (e.g. just by listing all of the members with a | inbetween), but there are infinite regular languages (and those are generally the more interesting ones).



        As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the * operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).



        The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.






        share|cite|improve this answer













        So my question is:




        • (Why) Is my proof wrong?

        • If it is right: Why is there no easy way to express permutations?




        Your "proof" only looked at permutations of single words, which are finite languages.



        Every finite language is regular (e.g. just by listing all of the members with a | inbetween), but there are infinite regular languages (and those are generally the more interesting ones).



        As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the * operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).



        The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Nov 18 at 0:23









        Paŭlo Ebermann

        47329




        47329






















            up vote
            7
            down vote













            Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).



            So in some sense, there is no succinct way to specify all permutations of a word.





            Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.



            We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:





            • $x_iy_i in L$.

            • If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.


            Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.



            Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.






            share|cite|improve this answer























            • Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
              – Asqiir
              Nov 17 at 8:15






            • 1




              Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
              – Yuval Filmus
              Nov 17 at 8:17






            • 1




              What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
              – Asqiir
              Nov 17 at 8:23






            • 1




              Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
              – Yuval Filmus
              Nov 17 at 8:27






            • 1




              For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
              – Asqiir
              Nov 17 at 14:13















            up vote
            7
            down vote













            Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).



            So in some sense, there is no succinct way to specify all permutations of a word.





            Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.



            We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:





            • $x_iy_i in L$.

            • If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.


            Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.



            Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.






            share|cite|improve this answer























            • Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
              – Asqiir
              Nov 17 at 8:15






            • 1




              Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
              – Yuval Filmus
              Nov 17 at 8:17






            • 1




              What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
              – Asqiir
              Nov 17 at 8:23






            • 1




              Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
              – Yuval Filmus
              Nov 17 at 8:27






            • 1




              For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
              – Asqiir
              Nov 17 at 14:13













            up vote
            7
            down vote










            up vote
            7
            down vote









            Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).



            So in some sense, there is no succinct way to specify all permutations of a word.





            Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.



            We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:





            • $x_iy_i in L$.

            • If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.


            Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.



            Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.






            share|cite|improve this answer














            Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).



            So in some sense, there is no succinct way to specify all permutations of a word.





            Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.



            We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:





            • $x_iy_i in L$.

            • If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.


            Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.



            Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited Nov 17 at 8:36

























            answered Nov 17 at 8:06









            Yuval Filmus

            187k12176338




            187k12176338












            • Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
              – Asqiir
              Nov 17 at 8:15






            • 1




              Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
              – Yuval Filmus
              Nov 17 at 8:17






            • 1




              What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
              – Asqiir
              Nov 17 at 8:23






            • 1




              Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
              – Yuval Filmus
              Nov 17 at 8:27






            • 1




              For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
              – Asqiir
              Nov 17 at 14:13


















            • Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
              – Asqiir
              Nov 17 at 8:15






            • 1




              Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
              – Yuval Filmus
              Nov 17 at 8:17






            • 1




              What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
              – Asqiir
              Nov 17 at 8:23






            • 1




              Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
              – Yuval Filmus
              Nov 17 at 8:27






            • 1




              For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
              – Asqiir
              Nov 17 at 14:13
















            Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
            – Asqiir
            Nov 17 at 8:15




            Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
            – Asqiir
            Nov 17 at 8:15




            1




            1




            Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
            – Yuval Filmus
            Nov 17 at 8:17




            Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
            – Yuval Filmus
            Nov 17 at 8:17




            1




            1




            What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
            – Asqiir
            Nov 17 at 8:23




            What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
            – Asqiir
            Nov 17 at 8:23




            1




            1




            Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
            – Yuval Filmus
            Nov 17 at 8:27




            Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
            – Yuval Filmus
            Nov 17 at 8:27




            1




            1




            For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
            – Asqiir
            Nov 17 at 14:13




            For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
            – Asqiir
            Nov 17 at 14:13


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Computer Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f100206%2fwhy-is-there-no-permutation-in-regexes-even-if-regular-languages-seem-to-be-ab%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How do I know what Microsoft account the skydrive app is syncing to?

            When does type information flow backwards in C++?

            Grease: Live!