Replace in large string or split loop

2

I have a multidimensional array with some routes and need to convert some elements defined as: (alpha), (int), (id), etc. . As an array, I currently use a loop to do the replace.

I thought of approaching another way without the loop, and working with string to do the general replace. First convert the array to string using serialize , then apply replace and then return the array with unserialize .

From the initial form I had a foreach with separate replaces, as in the example below.

foreach( $array as $line )
{
    $lineA = str_replace( ',' , '&' , $lineA );
    $lineB = preg_replace( array( '(int)' , '(alpha)' ) , '([0-9]+) , ([a-zA-Z]+)' , $line[0] );
}

With new approach I have serialize, unserialize and I can use a single replace.

$line = serialize( $array );
$line = str_replace( array( '(int)' , '(alpha)' , ',' ) , array( '([0-9]+)' , '([a-zA-Z]+)' , '&' , $line );
$line = unserialize( $line );

After the serialize, the replace will be in a rather large string and then I will apply unserialize.

I do not know the limits of str_replace - is it more advantageous to loop in small strings or a single replace in a large string?

It is not a question about BENCH, just to know the advantages and disadvantages of each case, where one applies better than the other.

    
asked by anonymous 01.10.2014 / 10:17

2 answers

2

Trying to respond in a logical way. I have a principle that helps me solve some problems. In this case it would look like this:

  • strings - > string functions, arrays - > arrays functions

The serialize function is used to make it possible to preserve data types when we want to store this data either in a text file or in the database. When we need this data again we will get the file and call the unserialize function to return this data to its 'original' state. Put this in another way: if we want to store an array in the database, we call the serialize function that transforms the array into a string and stores it in the database. When we need to use the array again, we will search the database and call the unserialize function to have the array in the same "state" we had before saving. This brings us:

1) Using serialize, manipulating the string and then doing unserialize can break the integrity of the data and get unexpected results when we do the unserialize.

2) The serialize function will do a series of loops to turn the array into a string.

3) the unserialize function will do a series of loops to transform a string into an array.

Conclusion:

For the above reasons it is not a good idea to use serialize, because at the very least it will double the number of loops.

    
01.10.2014 / 13:00
1

I found this question very interesting because in this area of programming the details are very important especially for those who deal with large amounts of information. In that sense I wanted to try to add some value to this pertinent question.

The question poses:

  

I do not know the limits of str_replace - it is more advantageous to loop in   small strings or a single replace in a large string?

Good for this I have the answer from previous searches which is as follows:

Most programming languages limit the number of characters that can be stored in a string , but PHP does not. This does not mean that you can store unlimited data in a PHP string, however and while there is no limit on string length, a limit is imposed on the overall amount of data that a script PHP can use. This limit is expressed in bytes and can be changed by editing the "php.ini" configuration file.

Having said that and looking at your question, I noticed that you did not consider json_encode and json_decode , but I advise you to do so because it is a variant to take into account.

At first glance it seems to me to be three-way, the most obvious option for serialization data is serialize and unserialize of PHP. A little less popular is json_encode and json_decode . There is also a third option, using a third-party module that you can easily install on your server. This module is called igbinary . The latter is very good in performance but has some requirements that not all environments can make available and if used with memcached then it is pump .

In data serialization , we are always concerned about the size of the result, but also about the time it took for the data to be serialized.

However, there are documents and tests that put json_encode and json_decode into performance gains. From my experience I would say the following:

If your application is more focused on reading than typing, igbinary is the winner since it will unserialize your data faster than the other functions . However if you are more focused on data storage, json_encode is the right choice as it makes your serialized result smaller and faster.

I hope I have contributed to your clarification.

    
02.10.2014 / 19:36