While driving home from work today, I was thinking of how to write an
algorithm in C# 2.0 that takes a string representing a multi-line text file,
and return the lines, one at a time.
I know Mono's System.IO.StringReader class can do this, but I was thinking there must be a more efficient way of doing it. Here's what I came
up with:
public class LineReader
{
private string _source;
public LineReader(string src)
{
_source = src;
}
public IEnumerator GetEnumerator()
{
bool finished=false;
int nextChar=0;
int readTo;
string nextLine;
while(!finished) {
readTo = _source.IndexOf ('\n', nextChar);
if (readTo==-1){
readTo=_source.Length-1;
finished=true;
if (readTo<nextChar){
yield break;
}
}
nextLine = _source.Substring(nextChar,readTo-nextChar);
nextChar = readTo+1;
yield return nextLine;
}
}
}
Timing tests show that with the following test harness running on my 1.33GHz
iBook G4 running Mono 1.2.5, parsing a 1000 line text file 1000 times takes
2.7 seconds on average using this algorithm and 3.4 seconds on average using
StringReader.
public static void Main(string[] args)
{
StreamReader sr = new StreamReader("sample.text");
string test = sr.ReadToEnd();
System.DateTime start = DateTime.Now;
for (int i=0;i<1000;i++){
LineReader lineReader = new LineReader(test);
foreach (string s in lineReader){
}
}
System.DateTime finish = DateTime.Now;
double seconds = (((double)finish.Ticks) - ((double)start.Ticks)) / 10000000;
Console.WriteLine("{0} seconds",seconds);
}
Yes, StringReader handles both LF and CRLF line endings, and has a PushBack
method to allow jumping back to the previous line, but for pure speed of
parsing standard text with LF line endings, this algorithm seems to be
around 20% faster than StringReader, and 30% faster than using Split to turn
large string into an array of one string per line.
Update on 14th November 2007:
I ran the same tests on Windows XP with Microsoft .NET 2.0, on a dual core 3GHz PC, and produced this graph which compares various ways of splitting the large string into one string per line: