For example, I have a string with multiple occurrences of a
:
a = 'abcdefhanmakaoa'
If I use the find
method, I can only find the index of the first occurrence. Is there a native method / function that returns all occurrences?
For example, I have a string with multiple occurrences of a
:
a = 'abcdefhanmakaoa'
If I use the find
method, I can only find the index of the first occurrence. Is there a native method / function that returns all occurrences?
You have two ways to do this. One using regex, to create an iterator with all occurrences. The other is using brute force on a function, which in the case would look something like:
inicio = 0
termo = "a"
indexes = []
while inicio < len(a):
resultado = a[inicio:].find(termo)
if resultado == -1:
break
indexes.append(resultado + inicio)
inicio += resultado + 1
So the indexes
list will save all occurrences.
You can even save this in a function called findAll
, to use whenever you need to do this operation.
Just like @jsbueno mentioned, you can add the start parameter directly in the find () method, which would make the code look like this:
while inicio < len(a):
resultado = a.find(termo, inicio)
if resultado == -1:
break
indexes.append(resultado)
inicio = resultado + 1
return indexes
Creating 2 functions for each option above, and measuring with the timeit module:
print(timeit.timeit(fun1, number = 1000000))
print(timeit.timeit(fun2, number = 1000000))
You can see that the second method, using the find () function directly, instead of using the slice for strings, is more efficient. prints:
2.979384636011673
2.4164228980080225
For this specific case where you only want to fetch a single letter, you can scroll through the string with enumerate
, so you can have the index and its character at one index at a time.
And since you want to create a list with the result, you can use list comprehension :
a = 'abcdefhanmakaoa'
# "i" é o índice, "c" é o caractere naquele índice
indices = [i for i, c in enumerate(a) if c == 'a']
print(indices)
The line that creates the index list ( indices = [ ...
) is a list comprehension , and is equivalent to making a% traditional "%" of other languages:
indices = []
for i, c in enumerate(a):
if c == 'a':
indices.append(i)
Although they are equivalent, list comprehension , and more succinctly, is more form pythonic to do .
The output is a list with all indexes that correspond to a letter "a":
[0, 7, 10, 12, 14]
The code above works only for cases where you want to search for occurrences of a single letter.
For more complicated / general cases, where you want to search for occurrences of a word, for example (or more complicated criteria such as "begins with a lowercase letter or number and has at least N characters, etc."), an alternative is to use regex through modulefor
:
import re
a = 'abcdefhanmakaoa'
indices = [m.start(0) for m in re.finditer('a', a)]
print(indices)
re
returns all matches returned (also using the list comprehension syntax for the return already being in a list). Since the regex used is finditer
(the very letter "a"), a list with all the matches corresponding to this letter will be returned. Then, 'a'
returns the starting position of match . This gives you the indices of all the letters "a" in the string.
The index list is the same as in the first example. But as the expression being tested is very simple (the letter start(0)
), in this specific case I find it an exaggeration to use regex . But the alternative is recorded if you need more complicated cases than just picking a specific letter.
Another bonus to using regex in the most complicated cases is that you can also get the final index:
a = 'abcdeafabc'
indices = [(m.start(0), m.end(0)) for m in re.finditer('abc', a)]
print(indices)
In this case, I'm looking for the occurrences of a
and returning a list of tuples (note the parentheses around abc
and m.start(0)
, they delimit a tuple ), each tuple containing the initial and final index of m.end(0)
:
[(0, 3), (7, 10)]
Note that the indexes follow the start inclusive / end exclusive rule ("start, end, not even"). For example, the first occurrence of abc
corresponds to indexes 0, 1 and 2 of the string, but the return was abc
(the final index is not "included").
Of course, if you just want the initial index, just use (0, 3)
, as explained above.