Do replacing strings with Regex slow down than replace?

17

Let's say I wanted to do something like this question

How to convert CamelCase to snake_case in C #?

Convert CamelCase to snake_case in C #

You had some answers, but I want to highlight these two.

Answer 1

string stringSnake = string.Concat(
                     stringCamel.Select((x, i) => i > 0 && char.IsUpper(x) ? "_" + x.ToString().ToLower() : x.ToString().ToLower())
                 ); 

Answer 2

string stringSnake = Regex.Replace(input, "(?<=.)([A-Z])", "_$0", RegexOptions.Compiled);

Some taxpayers from SO have said that answer 2 is much slower , would you like to know why?

  • Is it because it's in regex?
  • Is the regex "badly done"?
  • Regex is slow in general?
asked by anonymous 04.07.2016 / 19:10

1 answer

10

Always measure

To know about speed you always have to measure the actual case. And it can change depending on a number of issues. It may be different if you run on a different platform, if the data is different, if it is used in conjunction with other things and mostly depends on the version of the language or library you are using. What's worth today may not be worth tomorrow.

Which is faster

You can not say that a RegEx will always be worse than a Replace() , because the exchange function can be misused and not done in the best way possible. If you keep applying Replace() multiple times in the same string it may slow down.

But under normal conditions it will be faster because this function goes more straight to the point. As much as the RegEx algorithm is well done is a general solution that tries to solve any problem, it has to try to see every character if it meets all the situations that it intends to work.

Measuring RegEx performance is not so intuitive to say that it will always be slower. But my experience is that most cases are slower, in some cases the differences are brutal.

RegEx can win if it has several modifications, because it can go all in one step. Primarily if the string of the pattern to be found is already compiled (at least in C # there is compilation of the text pattern and a cache is made). If you use it multiple times, it can become interesting. Of course it also depends on the quality of the expression compiler. If it can do some optimizations, it helps.

Compile help, but no miracle .

Other techniques

It is possible to use techniques other than these two that can give an even better result, for example doing a character-by-character analysis and making the decision in each case. This may be simpler or more complicated to do depending on the case. This technique can do a "replace" of several things in one step.

I've seen a lot of people have poor performance because they do not understand how the garbage collector works, the problem is not always in the handling of the string itself, but memory management . It can be tragic in large volumes.

I can guarantee that the programmer will always get a better result in the hand than a RegEx , as long as he uses the right technique ( Replace() or not) implemented correctly. If it will get more work, if it will get ugly, if it will be confusing, if it will have other problems, it is another question.

It is always possible to produce a non-tragic RegEx expression, but it can do as much or more work as writing a code at hand.

In the specific example has a response with identical code, with a performance comparison made in the OS .

See a comparison made by Microsoft . Note that StringBuilder() that everyone thinks better for these things is worse. Not always what seems to be the best is in fact.

A actual example posted by one of the founders of this site citing the problem that Microsoft faced.

It has a site that helps to understand RegEx and shows the traps that can fall, among them the backtracking .

    
04.07.2016 / 19:36