Problem
There is a problem that one of the creators of this site (SO) calls Shlemiel the painter's algorithm .
A painter is painting the passing line of a highway. It starts off very well, with high productivity. But each day he produces less, until his work becomes unfeasible. This occurs because it holds the paint can in a fixed place, so it paints a portion of the strip and has to return to the starting point to wet the brush. Each day it is further away from the can and takes longer on the way than in painting.
Imaginethispatterngrowingtens,hundreds,orthousandsoftimes.Quicklyunfeasible.
Thisisoftenthecasewithconcatenationsofdatacollections,especiallystring.Asitgrowsanddoesnotfitintothespacethatexistedforthepreviousversion,anewallocationneedstobemadetosupportthefullsizeofthenewversion.Andyouhavetodeallocatetheoldonethathasbecomerubbish.Anditfragmentsthememory.Allthisiscostoftime.Insomelanguagesthesituationisworsesinceachangethatdoesnotevenbreakthecurrentsizelimitalreadycausesareallocationforagoodcause.ThisisthecasewithPython.
Solution
Therightwayistofigureoutthefinalsize,oratleastanapproximationofit,andallocateeverythingyouneed,youcanputthetextsinthiscrazyarea.Obviouslyitneedstobedoneinastructurethatacceptsthetexttobechanged,whichisnotthecaseoftypestring
thatisimmutable,thatis,anychangegeneratesanewobject.
join()
doesexactlywhatIdescribed.Itdiscoversthetotalsizeneeded-takingthesizesofallstringsthatwillbeconcatenated-allocatesallnecessaryspaceandthrowsthetextsinthatspacethatisnotyetastring.Attheenditturnsthisintostring.Therethecostisequivalenttothetotalsizeofthetextwhichismuchshorterthanwalkingalloveragainineachconcatenation.
Notethatforafewconcatenations,typicallyupto4,theconcatenationmayperformevenbetterthanjoin()
.Ofcourseinsuchasmallvolume,whichisfaster.
Alternatives
Ofcourse,join()
isnottheonlywaytodothis.Youcandoitmanuallyifyouneedsomethingalittlemorecomplexthanjoin()
doesnotmeet.Maybeusinga bytearray * or a default list that are mutable (help, but not great because it may need new allocations, although minimized, does not need every change, depends on the skill of the programmer).
The Python page also shows you how to use %s
to get similar results. Formatting occurs in a function that manipulates everything in another data structure and only at the end that the final string is generated.
Some people like to use StringIO
to take care of this.
I answered this in more detail for other languages like Java . And also C # .