It's not that simple: when JSON is still a string, that is, srialized - you can use find (it's a Python strings method), and find the position of the characters you want - but so you do not know anything about the structure of JSON, you will not know if the "ma" you found is "Camargo" that is inside show. [artists] .nome or if it is in a word that is key to a dictionary inside the json. '{"theme": "song setaneja"} would give result in the word "theme".
The correct thing is to load the se u jso in the form of a data string, and there have a function that searches recursively, throughout the tree, for the patterns that you request. Such a function can return the entire record (or just the value) -
These two functions can help you: the first one returns all occurrences where a match is found for what you are looking for - each associated with a "path" - that allows you to know where in JSON the occurrence was found. It is a generator, so use with for
or pass your call as an argument to list
.
The second allows you to search for JSON snippets using the "path" that is returned by the first function:
import re
def json_find(data, pattern, path=()):
if isinstance(data, (str, float, int, bool, type(None))):
if re.findall(pattern, str(data)):
yield data, path
elif isinstance(data, list):
for i, item in enumerate(data):
yield from json_find(item, pattern, path=path+(i,))
elif isinstance(data, dict):
for i, (key, value) in enumerate(data.items()):
yield from json_find(value, pattern, path=path+(key,))
else:
raise TypeError("Can't search patterns in instances of {}".format(type(data)))
def get_json_item_at(data, path):
if not path:
return data
return get_json_item_at(data[path[0]], path[1:])
And in interactive mode, if I put data as your example in variable "a", I can do:
In [141]: list(json_find(a, "Mariano"))
Out[141]:
[('Cesar Camargo Mariano e Helio Delmiro', (0, 'artist')),
('Cesar Camargo Mariano', (0, 'tracks', 0, 'author'))]
In [142]:
The output indicates that the word "Marian" has been found in two places - one at position 0 of the original list, and within that in the "artist" key, and the second occurrence at position 0 of the list, within it, at position 0, and in the "author" key.
The function that I put on toast allows you to, for example, from the location in the "author" key to be able to "climb through the tree" until it arrives at the registry information.
Using the full path, I have only the string where the match occurred:
In [142]: get_json_item_at(a, (0, 'tracks', 0, 'author'))
Out[142]: 'Cesar Camargo Mariano'
But if I remove the last items from the path, I can get the full registry:
In [143]: get_json_item_at(a, (0, 'tracks'))
Out[143]: [{'author': 'Cesar Camargo Mariano', 'time': "5'04", 'type': 'track'}]