Splitting a string at all whitespace

I need to split a string at all whitespace, it should ONLY contain the words themselves.

How can I do this in vb.net?

Tabs, Newlines, etc. must all be split!

This has been bugging me for quite a while now, as my syntax highlighter I made completely ignores the first word in each line except for the very first line.

13.10.2009 21:23:10
See also possible duplicate with SplitStringOptions to remove the extra whitespace. stackoverflow.com/questions/6111298/…
goodeye 1.02.2016 01:24:18
7 ОТВЕТОВ
РЕШЕНИЕ

String.Split() (no parameters) does split on all whitespace (including LF/CR)

24
13.10.2009 21:27:31
Why didn't they include that as an overload lol? Thanks so much!
Cyclone 13.10.2009 21:31:30
because it resolves to the Split(params char[]) overload, with an empty array. The documentation for that overload mentions this behavior.
Jimmy 13.10.2009 22:14:04
CAUTION: As Johannes Rudolph mentions in his answer, if there are multiple whitespace characters in a row, String.Split will contain empty elements. That is why Rubens Farias answer is superior.
ToolmakerSteve 3.09.2014 00:32:12
Adam Ralph's solution of using String.Split().Where(...) has quicker performance than the Regex solution. I posted test results below.
u8it 10.09.2015 20:14:52
@ToolMakerSteve - to remove empty elements String.Split(new char[] {}, StringSplitOptions.RemoveEmptyEntries)
Joe 12.01.2018 11:52:35

Try this:

Regex.Split("your string here", "\s+")
20
13.10.2009 21:32:13
It's C#. you should be fine without.
Jimmy 13.10.2009 21:29:59
Dim words As String = "This is a list of words, with: a bit of punctuation" + _
                          vbTab + "and a tab character." + vbNewLine
Dim split As String() = words.Split(New [Char]() {" "c, CChar(vbTab), CChar(vbNewLine) })
-1
13.10.2009 21:26:41

String.Split() will split on every single whitespace, so the result will contain empty strings usually. The Regex solution Ruben Farias has given is the correct way to do it. I have upvoted his answer but I want to give a small addition, dissecting the regex:

\s is a character class that matches all whitespace characters.

In order to split the string correctly when it contains multiple whitespace characters between words, we need to add a quantifier (or repetition operator) to the specification to match all whitespace between words. The correct quantifier to use in this case is +, meaning "one or more" occurrences of a given specification. While the syntax "\s+" is sufficient here, I prefer the more explicit "[\s]+".

2
7.08.2010 18:11:32
As usual, we now have two problems instead of one... ;-)
Adam Ralph 13.01.2015 13:39:19

If you want to avoid regex, you can do it like this:

"Lorem ipsum dolor sit amet, consectetur adipiscing elit"
    .Split()
    .Where(x => x != string.Empty)

Visual Basic equivalent:

"Lorem ipsum dolor sit amet, consectetur adipiscing elit" _
    .Split() _
    .Where(Function(X$) X <> String.Empty)

The Where() is important since, if your string has multiple white space characters next to each other, it removes the empty strings that will result from the Split().

At the time of writing, the currently accepted answer (https://stackoverflow.com/a/1563000/49241) does not take this into account.

5
18.05.2019 10:36:16
great solution. Not only does it avoid the need for a Regex reference but it's also quicker (see my post below). I'd like to add that I don't think VB makes use of the lambda operator "=>", so the VB version of this is a little different, I think like this: s.Split().Where(Function(x) x <> String.Empty)
u8it 10.09.2015 19:54:34
Hey @u8it, I have added a VB .NET version to this answer. I just read your comment a few days after editing the answer!!!
Sreenikethan I 13.05.2019 18:29:14
@Sree your edit is incorrect. The Visual Basic version is not the equivalent of the C# version because it uses String.IsNullOrWhiteSpace() instead of the != operator to compare with String.Empty. Can you please fix it? I don't know what the Visual Basic syntax for that is.
Adam Ralph 14.05.2019 19:10:20
@Adam I have put a Not, as in Not String.IsNullOrWhiteSpace(X))… the Not operator negates a Boolean value. Is that what you were telling about?
Sreenikethan I 15.05.2019 20:18:34
I've tried my edit with a sample string as well (before posting the edit), and it worked perfectly as asked by OP. Am I missing anything you said?
Sreenikethan I 16.05.2019 02:19:35

So, after seeing Adam Ralph's post, I suspected his solution of being faster than the Regex solution. Just thought I'd share the results of my testing since I did find it was faster.


There are really two factors at play (ignoring system variables): number of sub-strings extracted (determined by number of delimiters), and total string length. The very simple scenario plotted below uses "A" as the sub-string delimited by two white space characters (a space followed by tab). This accentuates the effect of number of sub-strings extracted. I went ahead and did some multiple variable testing to arrive at the following general equations for my operating system.

Regex()
t = (28.33*SSL + 572)(SSN/10^6)

Split().Where()
t = (6.23*SSL + 250)(SSN/10^6)

Where t is execution time in milliseconds, SSL is average sub-string length, and SSN is number of sub-strings delimited in string.

These equations can also written as

t = (28.33*SL + 572*SSN)/10^6

and

t = (6.23*SL + 250*SSN)/10^6

where SL is total string length (SL = SSL * SSN)

Conclusion: The Split().Where() solution is faster than Regex(). The major factor is number of sub-strings, while string length plays a minor role. Performance gains are about 2x and 5x for the respective coefficients.


enter image description here


Here's my testing code (probably way more material than necessary, but it's set-up for getting the multi-variable data I talked about)

using System;
using System.Linq;
using System.Diagnostics;
using System.Text.RegularExpressions;
using System.Windows.Forms;
namespace ConsoleApplication1
{
    class Program
    {
        public enum TestMethods {regex, split};
        [STAThread]
        static void Main(string[] args)
        {
            //Compare TestMethod execution times and output result information
            //to the console at runtime and to the clipboard at program finish (so that data is ready to paste into analysis environment)
            #region Config_Variables
            //Choose test method from TestMethods enumerator (regex or split)
            TestMethods TestMethod = TestMethods.split;
            //Configure RepetitionString
            String RepetitionString =  string.Join(" \t", Enumerable.Repeat("A",100));
            //Configure initial and maximum count of string repetitions (final count may not equal max)
            int RepCountInitial = 100;int RepCountMax = 1000 * 100;

            //Step increment to next RepCount (calculated as 20% increase from current value)
            Func<int, int> Step = x => (int)Math.Round(x / 5.0, 0);
            //Execution count used to determine average speed (calculated to adjust down to 1 execution at long execution times)
            Func<double, int> ExecutionCount = x => (int)(1 + Math.Round(500.0 / (x + 1), 0));
            #endregion

            #region NonConfig_Variables
            string s; 
            string Results = "";
            string ResultInfo; 
            double ResultTime = 1;
            #endregion

            for (int RepCount = RepCountInitial; RepCount < RepCountMax; RepCount += Step(RepCount))
            {
                s = string.Join("", Enumerable.Repeat(RepetitionString, RepCount));
                ResultTime = Test(s, ExecutionCount(ResultTime), TestMethod);
                ResultInfo = ResultTime.ToString() + "\t" + RepCount.ToString() + "\t" + ExecutionCount(ResultTime).ToString() + "\t" + TestMethod.ToString();
                Console.WriteLine(ResultInfo); 
                Results += ResultInfo + "\r\n";
            }
            Clipboard.SetText(Results);
        }
        public static double Test(string s, int iMax, TestMethods Method)
        {
            switch (Method)
            {
                case TestMethods.regex:
                    return Math.Round(RegexRunTime(s, iMax),2);
                case TestMethods.split:
                    return Math.Round(SplitRunTime(s, iMax),2);
                default:
                    return -1;
            }
        }
        private static double RegexRunTime(string s, int iMax)
        {
            Stopwatch sw = new Stopwatch();
            sw.Restart();
            for (int i = 0; i < iMax; i++)
            {
                System.Collections.Generic.IEnumerable<string> ens = Regex.Split(s, @"\s+");
            }
            sw.Stop();
            return Math.Round(sw.ElapsedMilliseconds / (double)iMax, 2);
        }
        private static double SplitRunTime(string s,int iMax)
        {
            Stopwatch sw = new Stopwatch();
            sw.Restart();
            for (int i = 0; i < iMax; i++)
            {
                System.Collections.Generic.IEnumerable<string> ens = s.Split().Where(x => x != string.Empty);
            }
            sw.Stop();
            return Math.Round(sw.ElapsedMilliseconds / (double)iMax, 2);
        }
    }
}
2
11.09.2015 04:56:55

I found I used the solution as noted by Adam Ralph, plus the VB.NET comment below by P57, but with one odd exception. I found I had to add .ToList.ToArray on the end.

Like so:

.Split().Where(Function(x) x <> String.Empty).ToList.ToArray

Without that, I kept getting "Unable to cast object of type 'WhereArrayIterator`1[System.String]' to type 'System.String[]'."

1
4.05.2016 19:38:05
I was able to make this work fine with only: .Split().Where(Function(x) x <> String.Empty).ToArray
Taegost 24.08.2016 15:12:06
You're welcome. I guess I should have also said at that time that it was using VS2013 and .Net 4.5.2, just in case it was a recent change.
Taegost 21.02.2017 18:52:55