Separate title string with no spaces into words
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I want to find and separate words in a title that has no spaces.
Before:
ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]
After:
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
I'm looking for a regular expression rule that can do the following.
I thought I'd identify each word if it starts with an uppercase letter.
But also preserve all uppercase words as not to space them into A L L U P P E R C A S E
.
Additional rules:
- Space a letter if it touches a number:
Hello2019World
Hello 2019 World
- Ignore spacing initials that contain periods, hyphens, or underscores
T.E.S.T.
- Ignore spacing if between brackets, parentheses, or quotes
[Test] (Test) "Test" 'Test'
- Preserve hyphens
Hello-World
C#
https://rextester.com/GAZJS38767
// Title without spaces
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]"Test"'Test'";
// Detect where to space words
string split = Regex.Split(title, "(?<!^)(?=(?<![.\-'"([{])[A-Z][\d+]?)");
// Trim each word of extra spaces before joining
split = (from e in split
select e.Trim()).ToArray();
// Join into new title
string newtitle = string.Join(" ", split);
// Display
Console.WriteLine(newtitle);
Regular expression
I'm having trouble with spacing before the numbers, brackets, parentheses, and quotes.
https://regex101.com/r/9IIYGX/1
(?<!^)(?=(?<![.-'"([{])(?<![A-Z])[A-Z][d+?]?)
(?<!^) // Negative look behind
(?= // Positive look ahead
(?<![.-'"([{]) // Ignore if starts with punctuation
(?<![A-Z]) // Ignore if starts with double Uppercase letter
[A-Z] // Space after each Uppercase letter
[d+]? // Space after number
)
Solution
Thanks for all your combined effort in answers. Here's a Regex example. I'm applying this to file names and have exclude special characters /:*?"<>|
.
https://rextester.com/FYEVE73725
https://regex101.com/r/xi8L4z/1
c# regex
add a comment |
I want to find and separate words in a title that has no spaces.
Before:
ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]
After:
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
I'm looking for a regular expression rule that can do the following.
I thought I'd identify each word if it starts with an uppercase letter.
But also preserve all uppercase words as not to space them into A L L U P P E R C A S E
.
Additional rules:
- Space a letter if it touches a number:
Hello2019World
Hello 2019 World
- Ignore spacing initials that contain periods, hyphens, or underscores
T.E.S.T.
- Ignore spacing if between brackets, parentheses, or quotes
[Test] (Test) "Test" 'Test'
- Preserve hyphens
Hello-World
C#
https://rextester.com/GAZJS38767
// Title without spaces
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]"Test"'Test'";
// Detect where to space words
string split = Regex.Split(title, "(?<!^)(?=(?<![.\-'"([{])[A-Z][\d+]?)");
// Trim each word of extra spaces before joining
split = (from e in split
select e.Trim()).ToArray();
// Join into new title
string newtitle = string.Join(" ", split);
// Display
Console.WriteLine(newtitle);
Regular expression
I'm having trouble with spacing before the numbers, brackets, parentheses, and quotes.
https://regex101.com/r/9IIYGX/1
(?<!^)(?=(?<![.-'"([{])(?<![A-Z])[A-Z][d+?]?)
(?<!^) // Negative look behind
(?= // Positive look ahead
(?<![.-'"([{]) // Ignore if starts with punctuation
(?<![A-Z]) // Ignore if starts with double Uppercase letter
[A-Z] // Space after each Uppercase letter
[d+]? // Space after number
)
Solution
Thanks for all your combined effort in answers. Here's a Regex example. I'm applying this to file names and have exclude special characters /:*?"<>|
.
https://rextester.com/FYEVE73725
https://regex101.com/r/xi8L4z/1
c# regex
10
I am up-voting because its the first post i have seen in hours that has an appropriate amount of information, research and effort
– Michael Randall
Mar 11 at 6:02
2
@MichaelRandall And sadly, that is a better track record than what I see coming on the site during most weekend days.
– Tim Biegeleisen
Mar 11 at 6:04
add a comment |
I want to find and separate words in a title that has no spaces.
Before:
ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]
After:
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
I'm looking for a regular expression rule that can do the following.
I thought I'd identify each word if it starts with an uppercase letter.
But also preserve all uppercase words as not to space them into A L L U P P E R C A S E
.
Additional rules:
- Space a letter if it touches a number:
Hello2019World
Hello 2019 World
- Ignore spacing initials that contain periods, hyphens, or underscores
T.E.S.T.
- Ignore spacing if between brackets, parentheses, or quotes
[Test] (Test) "Test" 'Test'
- Preserve hyphens
Hello-World
C#
https://rextester.com/GAZJS38767
// Title without spaces
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]"Test"'Test'";
// Detect where to space words
string split = Regex.Split(title, "(?<!^)(?=(?<![.\-'"([{])[A-Z][\d+]?)");
// Trim each word of extra spaces before joining
split = (from e in split
select e.Trim()).ToArray();
// Join into new title
string newtitle = string.Join(" ", split);
// Display
Console.WriteLine(newtitle);
Regular expression
I'm having trouble with spacing before the numbers, brackets, parentheses, and quotes.
https://regex101.com/r/9IIYGX/1
(?<!^)(?=(?<![.-'"([{])(?<![A-Z])[A-Z][d+?]?)
(?<!^) // Negative look behind
(?= // Positive look ahead
(?<![.-'"([{]) // Ignore if starts with punctuation
(?<![A-Z]) // Ignore if starts with double Uppercase letter
[A-Z] // Space after each Uppercase letter
[d+]? // Space after number
)
Solution
Thanks for all your combined effort in answers. Here's a Regex example. I'm applying this to file names and have exclude special characters /:*?"<>|
.
https://rextester.com/FYEVE73725
https://regex101.com/r/xi8L4z/1
c# regex
I want to find and separate words in a title that has no spaces.
Before:
ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]
After:
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
I'm looking for a regular expression rule that can do the following.
I thought I'd identify each word if it starts with an uppercase letter.
But also preserve all uppercase words as not to space them into A L L U P P E R C A S E
.
Additional rules:
- Space a letter if it touches a number:
Hello2019World
Hello 2019 World
- Ignore spacing initials that contain periods, hyphens, or underscores
T.E.S.T.
- Ignore spacing if between brackets, parentheses, or quotes
[Test] (Test) "Test" 'Test'
- Preserve hyphens
Hello-World
C#
https://rextester.com/GAZJS38767
// Title without spaces
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]"Test"'Test'";
// Detect where to space words
string split = Regex.Split(title, "(?<!^)(?=(?<![.\-'"([{])[A-Z][\d+]?)");
// Trim each word of extra spaces before joining
split = (from e in split
select e.Trim()).ToArray();
// Join into new title
string newtitle = string.Join(" ", split);
// Display
Console.WriteLine(newtitle);
Regular expression
I'm having trouble with spacing before the numbers, brackets, parentheses, and quotes.
https://regex101.com/r/9IIYGX/1
(?<!^)(?=(?<![.-'"([{])(?<![A-Z])[A-Z][d+?]?)
(?<!^) // Negative look behind
(?= // Positive look ahead
(?<![.-'"([{]) // Ignore if starts with punctuation
(?<![A-Z]) // Ignore if starts with double Uppercase letter
[A-Z] // Space after each Uppercase letter
[d+]? // Space after number
)
Solution
Thanks for all your combined effort in answers. Here's a Regex example. I'm applying this to file names and have exclude special characters /:*?"<>|
.
https://rextester.com/FYEVE73725
https://regex101.com/r/xi8L4z/1
c# regex
c# regex
edited Mar 12 at 1:57
Matt McManis
asked Mar 11 at 5:55
Matt McManisMatt McManis
1,62511133
1,62511133
10
I am up-voting because its the first post i have seen in hours that has an appropriate amount of information, research and effort
– Michael Randall
Mar 11 at 6:02
2
@MichaelRandall And sadly, that is a better track record than what I see coming on the site during most weekend days.
– Tim Biegeleisen
Mar 11 at 6:04
add a comment |
10
I am up-voting because its the first post i have seen in hours that has an appropriate amount of information, research and effort
– Michael Randall
Mar 11 at 6:02
2
@MichaelRandall And sadly, that is a better track record than what I see coming on the site during most weekend days.
– Tim Biegeleisen
Mar 11 at 6:04
10
10
I am up-voting because its the first post i have seen in hours that has an appropriate amount of information, research and effort
– Michael Randall
Mar 11 at 6:02
I am up-voting because its the first post i have seen in hours that has an appropriate amount of information, research and effort
– Michael Randall
Mar 11 at 6:02
2
2
@MichaelRandall And sadly, that is a better track record than what I see coming on the site during most weekend days.
– Tim Biegeleisen
Mar 11 at 6:04
@MichaelRandall And sadly, that is a better track record than what I see coming on the site during most weekend days.
– Tim Biegeleisen
Mar 11 at 6:04
add a comment |
4 Answers
4
active
oldest
votes
First few parts are similar to @revo answer: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}
, additionally I add the following regex to space between number and letter: (?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])
and to detect OTPIsADevice
then replace with lookahead and lookbehind to find uppercase with a lowercase: (((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Note that |
is or operator which allowed all the regex to be executed.
Regex: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Demo
Update
Improvised a bit:
From: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])
into: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d
which do the same thing.
(((?<!^)(?<!p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)]}!&}])
improvised from OP comment which is adding exception to some punctuation: (((?<!^)(?<!['([{])[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)\]}!&}])
Final regex:
(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d|(((?<!^)(?<!p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)]}!&}])
Demo
This is almost working perfect. One issue, somewhere in the last part|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
is not preserving the parentheses, brackets, and quotes. rextester.com/BTA83734
– Matt McManis
Mar 11 at 20:48
Thanks, your regex has solved the single letter problem. I've added some extra rules at the end to handle the other issues. rextester.com/FYEVE73725
– Matt McManis
Mar 12 at 1:53
add a comment |
Here is a regex which seems to work well, at least for your sample input:
(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=W)(?=W)
This patten says to make a split on a boundary of one of the following conditions:
- what precedes is a lowercase, and what precedes is an uppercase (or
vice-versa) - what precedes is a digit and what follows is a letter (or
vice-versa) - what precedes and what follows is a non word character
(e.g. quote, parenthesis, etc.)
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]"Test"'Test'";
string split = Regex.Split(title, "(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\W)(?=\W)");
split = (from e in split select e.Trim()).ToArray();
string newtitle = string.Join(" ", split);
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
Note: You might also want to add this assertion to the regex alternation:
(?<=W)(?=w)|(?<=w)(?=W)
We got away with this here, because this boundary condition never happened. But you might need it with other inputs.
I ran into one issue, when it comes to single letter words likeA
andI
, it will not separate because it uses theALL UPPERCASE
rule (two uppercase next to each other).ATitleExample
becomesATitle Example
.
– Matt McManis
Mar 11 at 7:35
1
@MattMcManis This is an edge case which will potentially break all of the answers given here. You would need to do more work to cover such cses.a
– Tim Biegeleisen
Mar 11 at 7:36
Maybe I can run the output of this through a second regex to fix those.
– Matt McManis
Mar 11 at 7:38
add a comment |
Aiming for simplicity rather than huge regex, I would recommend this code with small simple patterns (comments with explanation are in code):
string str = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]";
// insert space when there is small letter followed by upercase letter
str = Regex.Replace(str, "(?<=[a-z])(?=[A-Z])", " ");
// insert space whenever there's digit followed by a ltter
str = Regex.Replace(str, @"(?<=d)(?=[A-Za-z])", " ");
// insert space when there's letter followed by digit
str = Regex.Replace(str, @"(?<=[A-Za-z])(?=d)", " ");
// insert space when there's one of characters ("'[ followed by letter or digit
str = Regex.Replace(str, @"(?=[([""'][a-zA-Z0-9])", " ");
// insert space when what preceeds is on of characters ])"'
str = Regex.Replace(str, @"(?<=[)]""'])", " ");
If commenting was your main concern you could enable x-mode or use inline comments i.e.(?#insert space when there's letter followed by digit)
.
– revo
Mar 11 at 7:45
2
@revo I used standard C# comments :) I think it's more readable.
– Michał Turczyn
Mar 11 at 7:46
2
You could also write such kind of readable comments by setting standardx
modifier which enables you to write multiline, indented perfect comments. It's not simple by the way. Just split. .
– revo
Mar 11 at 7:49
add a comment |
You could reduce the requirements to shorten the steps of a regular expression using a different interpretation of them. For example, the first requirement would be the same as to say, preserve capital letters if they are not preceded by punctuation marks or capital letters.
The following regex works almost for all of the mentioned requirements and may be extended to include or exclude other situations:
(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}
You have to use Replace()
method and use $0
as substitution string.
See live demo here
.NET (See it in action):
string input = @"ThisIsAnExample.TitleHELLO-WORLD2019T.E.S.T.(Test)""Test""'Test'[Test]";
Regex regex = new Regex(@"(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}", RegexOptions.Multiline);
Console.WriteLine(regex.Replace(input, @" $0"));
This is an interesting way. Which rule can be added to fixHELLO-WORLD2019
by spacing the2019
?
– Matt McManis
Mar 11 at 7:10
1
Add(?<=p{L})d
within an alternation:(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d
.
– revo
Mar 11 at 7:17
I have one other issue, single letter words likeA
andI
won't space.ATitleExample
becomesATitle Example
.
– Matt McManis
Mar 11 at 8:23
What about something likeOTPIsADevice
?
– revo
Mar 11 at 8:29
It starts to get complicated.OTPIs ADevice
maybe I can run the output through a second filter. Rules: If a word starts with 2 Uppercase lettersADevice
, add a space after the first letterA Device
. And if anALL UPPERCASE
word ends in alowercase
letterOTPIs
, add a space before the last two lettersOTP Is
.
– Matt McManis
Mar 11 at 8:54
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55095949%2fseparate-title-string-with-no-spaces-into-words%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
First few parts are similar to @revo answer: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}
, additionally I add the following regex to space between number and letter: (?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])
and to detect OTPIsADevice
then replace with lookahead and lookbehind to find uppercase with a lowercase: (((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Note that |
is or operator which allowed all the regex to be executed.
Regex: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Demo
Update
Improvised a bit:
From: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])
into: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d
which do the same thing.
(((?<!^)(?<!p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)]}!&}])
improvised from OP comment which is adding exception to some punctuation: (((?<!^)(?<!['([{])[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)\]}!&}])
Final regex:
(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d|(((?<!^)(?<!p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)]}!&}])
Demo
This is almost working perfect. One issue, somewhere in the last part|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
is not preserving the parentheses, brackets, and quotes. rextester.com/BTA83734
– Matt McManis
Mar 11 at 20:48
Thanks, your regex has solved the single letter problem. I've added some extra rules at the end to handle the other issues. rextester.com/FYEVE73725
– Matt McManis
Mar 12 at 1:53
add a comment |
First few parts are similar to @revo answer: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}
, additionally I add the following regex to space between number and letter: (?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])
and to detect OTPIsADevice
then replace with lookahead and lookbehind to find uppercase with a lowercase: (((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Note that |
is or operator which allowed all the regex to be executed.
Regex: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Demo
Update
Improvised a bit:
From: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])
into: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d
which do the same thing.
(((?<!^)(?<!p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)]}!&}])
improvised from OP comment which is adding exception to some punctuation: (((?<!^)(?<!['([{])[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)\]}!&}])
Final regex:
(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d|(((?<!^)(?<!p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)]}!&}])
Demo
This is almost working perfect. One issue, somewhere in the last part|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
is not preserving the parentheses, brackets, and quotes. rextester.com/BTA83734
– Matt McManis
Mar 11 at 20:48
Thanks, your regex has solved the single letter problem. I've added some extra rules at the end to handle the other issues. rextester.com/FYEVE73725
– Matt McManis
Mar 12 at 1:53
add a comment |
First few parts are similar to @revo answer: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}
, additionally I add the following regex to space between number and letter: (?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])
and to detect OTPIsADevice
then replace with lookahead and lookbehind to find uppercase with a lowercase: (((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Note that |
is or operator which allowed all the regex to be executed.
Regex: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Demo
Update
Improvised a bit:
From: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])
into: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d
which do the same thing.
(((?<!^)(?<!p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)]}!&}])
improvised from OP comment which is adding exception to some punctuation: (((?<!^)(?<!['([{])[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)\]}!&}])
Final regex:
(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d|(((?<!^)(?<!p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)]}!&}])
Demo
First few parts are similar to @revo answer: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}
, additionally I add the following regex to space between number and letter: (?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])
and to detect OTPIsADevice
then replace with lookahead and lookbehind to find uppercase with a lowercase: (((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Note that |
is or operator which allowed all the regex to be executed.
Regex: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Demo
Update
Improvised a bit:
From: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=[a-z])(?=d)|(?<=d)(?=[a-z])|(?<=[A-Z])(?=d)|(?<=d)(?=[A-Z])
into: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d
which do the same thing.
(((?<!^)(?<!p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)]}!&}])
improvised from OP comment which is adding exception to some punctuation: (((?<!^)(?<!['([{])[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)\]}!&}])
Final regex:
(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d|(((?<!^)(?<!p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)]}!&}])
Demo
edited Mar 13 at 10:31
answered Mar 11 at 10:26
MukyuuMukyuu
2,12131125
2,12131125
This is almost working perfect. One issue, somewhere in the last part|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
is not preserving the parentheses, brackets, and quotes. rextester.com/BTA83734
– Matt McManis
Mar 11 at 20:48
Thanks, your regex has solved the single letter problem. I've added some extra rules at the end to handle the other issues. rextester.com/FYEVE73725
– Matt McManis
Mar 12 at 1:53
add a comment |
This is almost working perfect. One issue, somewhere in the last part|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
is not preserving the parentheses, brackets, and quotes. rextester.com/BTA83734
– Matt McManis
Mar 11 at 20:48
Thanks, your regex has solved the single letter problem. I've added some extra rules at the end to handle the other issues. rextester.com/FYEVE73725
– Matt McManis
Mar 12 at 1:53
This is almost working perfect. One issue, somewhere in the last part
|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
is not preserving the parentheses, brackets, and quotes. rextester.com/BTA83734– Matt McManis
Mar 11 at 20:48
This is almost working perfect. One issue, somewhere in the last part
|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
is not preserving the parentheses, brackets, and quotes. rextester.com/BTA83734– Matt McManis
Mar 11 at 20:48
Thanks, your regex has solved the single letter problem. I've added some extra rules at the end to handle the other issues. rextester.com/FYEVE73725
– Matt McManis
Mar 12 at 1:53
Thanks, your regex has solved the single letter problem. I've added some extra rules at the end to handle the other issues. rextester.com/FYEVE73725
– Matt McManis
Mar 12 at 1:53
add a comment |
Here is a regex which seems to work well, at least for your sample input:
(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=W)(?=W)
This patten says to make a split on a boundary of one of the following conditions:
- what precedes is a lowercase, and what precedes is an uppercase (or
vice-versa) - what precedes is a digit and what follows is a letter (or
vice-versa) - what precedes and what follows is a non word character
(e.g. quote, parenthesis, etc.)
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]"Test"'Test'";
string split = Regex.Split(title, "(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\W)(?=\W)");
split = (from e in split select e.Trim()).ToArray();
string newtitle = string.Join(" ", split);
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
Note: You might also want to add this assertion to the regex alternation:
(?<=W)(?=w)|(?<=w)(?=W)
We got away with this here, because this boundary condition never happened. But you might need it with other inputs.
I ran into one issue, when it comes to single letter words likeA
andI
, it will not separate because it uses theALL UPPERCASE
rule (two uppercase next to each other).ATitleExample
becomesATitle Example
.
– Matt McManis
Mar 11 at 7:35
1
@MattMcManis This is an edge case which will potentially break all of the answers given here. You would need to do more work to cover such cses.a
– Tim Biegeleisen
Mar 11 at 7:36
Maybe I can run the output of this through a second regex to fix those.
– Matt McManis
Mar 11 at 7:38
add a comment |
Here is a regex which seems to work well, at least for your sample input:
(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=W)(?=W)
This patten says to make a split on a boundary of one of the following conditions:
- what precedes is a lowercase, and what precedes is an uppercase (or
vice-versa) - what precedes is a digit and what follows is a letter (or
vice-versa) - what precedes and what follows is a non word character
(e.g. quote, parenthesis, etc.)
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]"Test"'Test'";
string split = Regex.Split(title, "(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\W)(?=\W)");
split = (from e in split select e.Trim()).ToArray();
string newtitle = string.Join(" ", split);
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
Note: You might also want to add this assertion to the regex alternation:
(?<=W)(?=w)|(?<=w)(?=W)
We got away with this here, because this boundary condition never happened. But you might need it with other inputs.
I ran into one issue, when it comes to single letter words likeA
andI
, it will not separate because it uses theALL UPPERCASE
rule (two uppercase next to each other).ATitleExample
becomesATitle Example
.
– Matt McManis
Mar 11 at 7:35
1
@MattMcManis This is an edge case which will potentially break all of the answers given here. You would need to do more work to cover such cses.a
– Tim Biegeleisen
Mar 11 at 7:36
Maybe I can run the output of this through a second regex to fix those.
– Matt McManis
Mar 11 at 7:38
add a comment |
Here is a regex which seems to work well, at least for your sample input:
(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=W)(?=W)
This patten says to make a split on a boundary of one of the following conditions:
- what precedes is a lowercase, and what precedes is an uppercase (or
vice-versa) - what precedes is a digit and what follows is a letter (or
vice-versa) - what precedes and what follows is a non word character
(e.g. quote, parenthesis, etc.)
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]"Test"'Test'";
string split = Regex.Split(title, "(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\W)(?=\W)");
split = (from e in split select e.Trim()).ToArray();
string newtitle = string.Join(" ", split);
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
Note: You might also want to add this assertion to the regex alternation:
(?<=W)(?=w)|(?<=w)(?=W)
We got away with this here, because this boundary condition never happened. But you might need it with other inputs.
Here is a regex which seems to work well, at least for your sample input:
(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=W)(?=W)
This patten says to make a split on a boundary of one of the following conditions:
- what precedes is a lowercase, and what precedes is an uppercase (or
vice-versa) - what precedes is a digit and what follows is a letter (or
vice-versa) - what precedes and what follows is a non word character
(e.g. quote, parenthesis, etc.)
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]"Test"'Test'";
string split = Regex.Split(title, "(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\W)(?=\W)");
split = (from e in split select e.Trim()).ToArray();
string newtitle = string.Join(" ", split);
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
Note: You might also want to add this assertion to the regex alternation:
(?<=W)(?=w)|(?<=w)(?=W)
We got away with this here, because this boundary condition never happened. But you might need it with other inputs.
edited Mar 11 at 6:05
answered Mar 11 at 6:00
Tim BiegeleisenTim Biegeleisen
239k13100160
239k13100160
I ran into one issue, when it comes to single letter words likeA
andI
, it will not separate because it uses theALL UPPERCASE
rule (two uppercase next to each other).ATitleExample
becomesATitle Example
.
– Matt McManis
Mar 11 at 7:35
1
@MattMcManis This is an edge case which will potentially break all of the answers given here. You would need to do more work to cover such cses.a
– Tim Biegeleisen
Mar 11 at 7:36
Maybe I can run the output of this through a second regex to fix those.
– Matt McManis
Mar 11 at 7:38
add a comment |
I ran into one issue, when it comes to single letter words likeA
andI
, it will not separate because it uses theALL UPPERCASE
rule (two uppercase next to each other).ATitleExample
becomesATitle Example
.
– Matt McManis
Mar 11 at 7:35
1
@MattMcManis This is an edge case which will potentially break all of the answers given here. You would need to do more work to cover such cses.a
– Tim Biegeleisen
Mar 11 at 7:36
Maybe I can run the output of this through a second regex to fix those.
– Matt McManis
Mar 11 at 7:38
I ran into one issue, when it comes to single letter words like
A
and I
, it will not separate because it uses the ALL UPPERCASE
rule (two uppercase next to each other). ATitleExample
becomes ATitle Example
.– Matt McManis
Mar 11 at 7:35
I ran into one issue, when it comes to single letter words like
A
and I
, it will not separate because it uses the ALL UPPERCASE
rule (two uppercase next to each other). ATitleExample
becomes ATitle Example
.– Matt McManis
Mar 11 at 7:35
1
1
@MattMcManis This is an edge case which will potentially break all of the answers given here. You would need to do more work to cover such cses.a
– Tim Biegeleisen
Mar 11 at 7:36
@MattMcManis This is an edge case which will potentially break all of the answers given here. You would need to do more work to cover such cses.a
– Tim Biegeleisen
Mar 11 at 7:36
Maybe I can run the output of this through a second regex to fix those.
– Matt McManis
Mar 11 at 7:38
Maybe I can run the output of this through a second regex to fix those.
– Matt McManis
Mar 11 at 7:38
add a comment |
Aiming for simplicity rather than huge regex, I would recommend this code with small simple patterns (comments with explanation are in code):
string str = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]";
// insert space when there is small letter followed by upercase letter
str = Regex.Replace(str, "(?<=[a-z])(?=[A-Z])", " ");
// insert space whenever there's digit followed by a ltter
str = Regex.Replace(str, @"(?<=d)(?=[A-Za-z])", " ");
// insert space when there's letter followed by digit
str = Regex.Replace(str, @"(?<=[A-Za-z])(?=d)", " ");
// insert space when there's one of characters ("'[ followed by letter or digit
str = Regex.Replace(str, @"(?=[([""'][a-zA-Z0-9])", " ");
// insert space when what preceeds is on of characters ])"'
str = Regex.Replace(str, @"(?<=[)]""'])", " ");
If commenting was your main concern you could enable x-mode or use inline comments i.e.(?#insert space when there's letter followed by digit)
.
– revo
Mar 11 at 7:45
2
@revo I used standard C# comments :) I think it's more readable.
– Michał Turczyn
Mar 11 at 7:46
2
You could also write such kind of readable comments by setting standardx
modifier which enables you to write multiline, indented perfect comments. It's not simple by the way. Just split. .
– revo
Mar 11 at 7:49
add a comment |
Aiming for simplicity rather than huge regex, I would recommend this code with small simple patterns (comments with explanation are in code):
string str = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]";
// insert space when there is small letter followed by upercase letter
str = Regex.Replace(str, "(?<=[a-z])(?=[A-Z])", " ");
// insert space whenever there's digit followed by a ltter
str = Regex.Replace(str, @"(?<=d)(?=[A-Za-z])", " ");
// insert space when there's letter followed by digit
str = Regex.Replace(str, @"(?<=[A-Za-z])(?=d)", " ");
// insert space when there's one of characters ("'[ followed by letter or digit
str = Regex.Replace(str, @"(?=[([""'][a-zA-Z0-9])", " ");
// insert space when what preceeds is on of characters ])"'
str = Regex.Replace(str, @"(?<=[)]""'])", " ");
If commenting was your main concern you could enable x-mode or use inline comments i.e.(?#insert space when there's letter followed by digit)
.
– revo
Mar 11 at 7:45
2
@revo I used standard C# comments :) I think it's more readable.
– Michał Turczyn
Mar 11 at 7:46
2
You could also write such kind of readable comments by setting standardx
modifier which enables you to write multiline, indented perfect comments. It's not simple by the way. Just split. .
– revo
Mar 11 at 7:49
add a comment |
Aiming for simplicity rather than huge regex, I would recommend this code with small simple patterns (comments with explanation are in code):
string str = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]";
// insert space when there is small letter followed by upercase letter
str = Regex.Replace(str, "(?<=[a-z])(?=[A-Z])", " ");
// insert space whenever there's digit followed by a ltter
str = Regex.Replace(str, @"(?<=d)(?=[A-Za-z])", " ");
// insert space when there's letter followed by digit
str = Regex.Replace(str, @"(?<=[A-Za-z])(?=d)", " ");
// insert space when there's one of characters ("'[ followed by letter or digit
str = Regex.Replace(str, @"(?=[([""'][a-zA-Z0-9])", " ");
// insert space when what preceeds is on of characters ])"'
str = Regex.Replace(str, @"(?<=[)]""'])", " ");
Aiming for simplicity rather than huge regex, I would recommend this code with small simple patterns (comments with explanation are in code):
string str = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]";
// insert space when there is small letter followed by upercase letter
str = Regex.Replace(str, "(?<=[a-z])(?=[A-Z])", " ");
// insert space whenever there's digit followed by a ltter
str = Regex.Replace(str, @"(?<=d)(?=[A-Za-z])", " ");
// insert space when there's letter followed by digit
str = Regex.Replace(str, @"(?<=[A-Za-z])(?=d)", " ");
// insert space when there's one of characters ("'[ followed by letter or digit
str = Regex.Replace(str, @"(?=[([""'][a-zA-Z0-9])", " ");
// insert space when what preceeds is on of characters ])"'
str = Regex.Replace(str, @"(?<=[)]""'])", " ");
answered Mar 11 at 7:29
Michał TurczynMichał Turczyn
16.4k132241
16.4k132241
If commenting was your main concern you could enable x-mode or use inline comments i.e.(?#insert space when there's letter followed by digit)
.
– revo
Mar 11 at 7:45
2
@revo I used standard C# comments :) I think it's more readable.
– Michał Turczyn
Mar 11 at 7:46
2
You could also write such kind of readable comments by setting standardx
modifier which enables you to write multiline, indented perfect comments. It's not simple by the way. Just split. .
– revo
Mar 11 at 7:49
add a comment |
If commenting was your main concern you could enable x-mode or use inline comments i.e.(?#insert space when there's letter followed by digit)
.
– revo
Mar 11 at 7:45
2
@revo I used standard C# comments :) I think it's more readable.
– Michał Turczyn
Mar 11 at 7:46
2
You could also write such kind of readable comments by setting standardx
modifier which enables you to write multiline, indented perfect comments. It's not simple by the way. Just split. .
– revo
Mar 11 at 7:49
If commenting was your main concern you could enable x-mode or use inline comments i.e.
(?#insert space when there's letter followed by digit)
.– revo
Mar 11 at 7:45
If commenting was your main concern you could enable x-mode or use inline comments i.e.
(?#insert space when there's letter followed by digit)
.– revo
Mar 11 at 7:45
2
2
@revo I used standard C# comments :) I think it's more readable.
– Michał Turczyn
Mar 11 at 7:46
@revo I used standard C# comments :) I think it's more readable.
– Michał Turczyn
Mar 11 at 7:46
2
2
You could also write such kind of readable comments by setting standard
x
modifier which enables you to write multiline, indented perfect comments. It's not simple by the way. Just split. .– revo
Mar 11 at 7:49
You could also write such kind of readable comments by setting standard
x
modifier which enables you to write multiline, indented perfect comments. It's not simple by the way. Just split. .– revo
Mar 11 at 7:49
add a comment |
You could reduce the requirements to shorten the steps of a regular expression using a different interpretation of them. For example, the first requirement would be the same as to say, preserve capital letters if they are not preceded by punctuation marks or capital letters.
The following regex works almost for all of the mentioned requirements and may be extended to include or exclude other situations:
(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}
You have to use Replace()
method and use $0
as substitution string.
See live demo here
.NET (See it in action):
string input = @"ThisIsAnExample.TitleHELLO-WORLD2019T.E.S.T.(Test)""Test""'Test'[Test]";
Regex regex = new Regex(@"(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}", RegexOptions.Multiline);
Console.WriteLine(regex.Replace(input, @" $0"));
This is an interesting way. Which rule can be added to fixHELLO-WORLD2019
by spacing the2019
?
– Matt McManis
Mar 11 at 7:10
1
Add(?<=p{L})d
within an alternation:(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d
.
– revo
Mar 11 at 7:17
I have one other issue, single letter words likeA
andI
won't space.ATitleExample
becomesATitle Example
.
– Matt McManis
Mar 11 at 8:23
What about something likeOTPIsADevice
?
– revo
Mar 11 at 8:29
It starts to get complicated.OTPIs ADevice
maybe I can run the output through a second filter. Rules: If a word starts with 2 Uppercase lettersADevice
, add a space after the first letterA Device
. And if anALL UPPERCASE
word ends in alowercase
letterOTPIs
, add a space before the last two lettersOTP Is
.
– Matt McManis
Mar 11 at 8:54
add a comment |
You could reduce the requirements to shorten the steps of a regular expression using a different interpretation of them. For example, the first requirement would be the same as to say, preserve capital letters if they are not preceded by punctuation marks or capital letters.
The following regex works almost for all of the mentioned requirements and may be extended to include or exclude other situations:
(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}
You have to use Replace()
method and use $0
as substitution string.
See live demo here
.NET (See it in action):
string input = @"ThisIsAnExample.TitleHELLO-WORLD2019T.E.S.T.(Test)""Test""'Test'[Test]";
Regex regex = new Regex(@"(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}", RegexOptions.Multiline);
Console.WriteLine(regex.Replace(input, @" $0"));
This is an interesting way. Which rule can be added to fixHELLO-WORLD2019
by spacing the2019
?
– Matt McManis
Mar 11 at 7:10
1
Add(?<=p{L})d
within an alternation:(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d
.
– revo
Mar 11 at 7:17
I have one other issue, single letter words likeA
andI
won't space.ATitleExample
becomesATitle Example
.
– Matt McManis
Mar 11 at 8:23
What about something likeOTPIsADevice
?
– revo
Mar 11 at 8:29
It starts to get complicated.OTPIs ADevice
maybe I can run the output through a second filter. Rules: If a word starts with 2 Uppercase lettersADevice
, add a space after the first letterA Device
. And if anALL UPPERCASE
word ends in alowercase
letterOTPIs
, add a space before the last two lettersOTP Is
.
– Matt McManis
Mar 11 at 8:54
add a comment |
You could reduce the requirements to shorten the steps of a regular expression using a different interpretation of them. For example, the first requirement would be the same as to say, preserve capital letters if they are not preceded by punctuation marks or capital letters.
The following regex works almost for all of the mentioned requirements and may be extended to include or exclude other situations:
(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}
You have to use Replace()
method and use $0
as substitution string.
See live demo here
.NET (See it in action):
string input = @"ThisIsAnExample.TitleHELLO-WORLD2019T.E.S.T.(Test)""Test""'Test'[Test]";
Regex regex = new Regex(@"(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}", RegexOptions.Multiline);
Console.WriteLine(regex.Replace(input, @" $0"));
You could reduce the requirements to shorten the steps of a regular expression using a different interpretation of them. For example, the first requirement would be the same as to say, preserve capital letters if they are not preceded by punctuation marks or capital letters.
The following regex works almost for all of the mentioned requirements and may be extended to include or exclude other situations:
(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}
You have to use Replace()
method and use $0
as substitution string.
See live demo here
.NET (See it in action):
string input = @"ThisIsAnExample.TitleHELLO-WORLD2019T.E.S.T.(Test)""Test""'Test'[Test]";
Regex regex = new Regex(@"(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}", RegexOptions.Multiline);
Console.WriteLine(regex.Replace(input, @" $0"));
edited Mar 11 at 7:12
answered Mar 11 at 7:06
revorevo
34.3k135188
34.3k135188
This is an interesting way. Which rule can be added to fixHELLO-WORLD2019
by spacing the2019
?
– Matt McManis
Mar 11 at 7:10
1
Add(?<=p{L})d
within an alternation:(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d
.
– revo
Mar 11 at 7:17
I have one other issue, single letter words likeA
andI
won't space.ATitleExample
becomesATitle Example
.
– Matt McManis
Mar 11 at 8:23
What about something likeOTPIsADevice
?
– revo
Mar 11 at 8:29
It starts to get complicated.OTPIs ADevice
maybe I can run the output through a second filter. Rules: If a word starts with 2 Uppercase lettersADevice
, add a space after the first letterA Device
. And if anALL UPPERCASE
word ends in alowercase
letterOTPIs
, add a space before the last two lettersOTP Is
.
– Matt McManis
Mar 11 at 8:54
add a comment |
This is an interesting way. Which rule can be added to fixHELLO-WORLD2019
by spacing the2019
?
– Matt McManis
Mar 11 at 7:10
1
Add(?<=p{L})d
within an alternation:(?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d
.
– revo
Mar 11 at 7:17
I have one other issue, single letter words likeA
andI
won't space.ATitleExample
becomesATitle Example
.
– Matt McManis
Mar 11 at 8:23
What about something likeOTPIsADevice
?
– revo
Mar 11 at 8:29
It starts to get complicated.OTPIs ADevice
maybe I can run the output through a second filter. Rules: If a word starts with 2 Uppercase lettersADevice
, add a space after the first letterA Device
. And if anALL UPPERCASE
word ends in alowercase
letterOTPIs
, add a space before the last two lettersOTP Is
.
– Matt McManis
Mar 11 at 8:54
This is an interesting way. Which rule can be added to fix
HELLO-WORLD2019
by spacing the 2019
?– Matt McManis
Mar 11 at 7:10
This is an interesting way. Which rule can be added to fix
HELLO-WORLD2019
by spacing the 2019
?– Matt McManis
Mar 11 at 7:10
1
1
Add
(?<=p{L})d
within an alternation: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d
.– revo
Mar 11 at 7:17
Add
(?<=p{L})d
within an alternation: (?<!^|[A-Zp{P}])[A-Z]|(?<=p{P})p{P}|(?<=p{L})d
.– revo
Mar 11 at 7:17
I have one other issue, single letter words like
A
and I
won't space. ATitleExample
becomes ATitle Example
.– Matt McManis
Mar 11 at 8:23
I have one other issue, single letter words like
A
and I
won't space. ATitleExample
becomes ATitle Example
.– Matt McManis
Mar 11 at 8:23
What about something like
OTPIsADevice
?– revo
Mar 11 at 8:29
What about something like
OTPIsADevice
?– revo
Mar 11 at 8:29
It starts to get complicated.
OTPIs ADevice
maybe I can run the output through a second filter. Rules: If a word starts with 2 Uppercase letters ADevice
, add a space after the first letter A Device
. And if an ALL UPPERCASE
word ends in a lowercase
letter OTPIs
, add a space before the last two letters OTP Is
.– Matt McManis
Mar 11 at 8:54
It starts to get complicated.
OTPIs ADevice
maybe I can run the output through a second filter. Rules: If a word starts with 2 Uppercase letters ADevice
, add a space after the first letter A Device
. And if an ALL UPPERCASE
word ends in a lowercase
letter OTPIs
, add a space before the last two letters OTP Is
.– Matt McManis
Mar 11 at 8:54
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55095949%2fseparate-title-string-with-no-spaces-into-words%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
10
I am up-voting because its the first post i have seen in hours that has an appropriate amount of information, research and effort
– Michael Randall
Mar 11 at 6:02
2
@MichaelRandall And sadly, that is a better track record than what I see coming on the site during most weekend days.
– Tim Biegeleisen
Mar 11 at 6:04