I have a very large file in json, it has 5 gigs and has 652339 lines, I was thinking of using the Gson library in java.
I'd like to know, what is the best way to parse from file, since neither the json framework was able to extract it right. Example of a file line:
{"control": {"lang": {"lang": "pt", "d": 1395183935882, "v": 5}, "last": "UPDATE", "read": {"d": 1395183767992, "v": 3}, "update": {"d": 1395308552817, "v": 2}, "rule": {"entities": [80000, 84001, 80034, 84232, 84009, 84051, 84084, 80061], "d": 1395305209944, "v": 3}, "entities": {"entities": [80000, 84001, 80034, 84232, 84009, 84051, 84084, 80061]}, "terms": {"terms": [], "d": 1395249318552, "v": 3}, "coletas": [{"terms": [], "id": 97}]}, "picture": "https://fbexternal-a.akamaihd.net/safe_image.php?d=AQA10tlbPQBXIp4p&w=154&h=154&url=http%3A%2F%2Fimages.immedia.com.br%2F%2F9%2F9146_2_L.JPG", "story": "Georgevan Araujo compartilhou um link.", "updated_time": "2013-12-30T23:59:59", "from": {"name": "Georgevan Araujo", "id": "100000278536009"}, "description": "Segundo o ex-ministro da Fazenda, a prova de que o governo n\u00e3o tem nada de socialista \u00e9 que ele destruiu as suas duas principais empresas: a Petrobras e a Eletrobr\u00e1s", "caption": "www.infomoney.com.br", "privacy": {"value": ""}, "name": "\"O que o governo fez com a Petrobras foi uma trag\u00e9dia\", diz Delfim Netto", "application": {"namespace": "fbipad_", "name": "Facebook for iPad", "id": "173847642670370"}, "link": "http://www.infomoney.com.br/onde-investir/acoes/noticia/3086396/que-governo-fez-com-petrobras-foi-uma-tragedia-diz-delfim", "story_tags": {"0": [{"length": 16, "type": "user", "id": "100000278536009", "name": "Georgevan Araujo", "offset": 0}]}, "created_time": "2013-12-30T23:59:59", "_id": "100000278536009_719669731385638", "type": "link", "id": "100000278536009_719669731385638", "icon": "https://fbstatic-a.akamaihd.net/rsrc.php/v2/yD/r/aS8ecmYRys0.gif"}
I was thinking of:
- Split this file into several others and parse it one by one
- Create a database and put all the information in the database for use in the application
- Try to get rid of the json structure with a java application and read the file as executed
I think the above alternatives are not the best.