What JSON structure to use for large data volume without loss of performance?

8

I'm thinking of using JSON in a project, as it is highly accepted and there are many ready-made libraries that encode and decode it into other objects (arrays, for example), but there is something that worries me.

Suppose a Web Service returns values from a database table named clientes . Something like this:

"clientes":[
    {"nome":"João", "sobrenome":"Silva"}, 
    {"nome":"José", "sobrenome":"Barbosa"}, 
    {"nome":"Maria", "sobrenome":"Joana"}
]

Now, suppose the table has 10 million rows and I need the Web Service to return a JSON object with all of them.

The size of the object would be huge. So it would take a lot of bandwidth to transfer the information.

I could compress the JSON object using gzip, but this would generate another problem: a high cost of processing to compress the object.

I could invent a compact format and simply use it. But I would lose all the facilities offered by libraries dealing with JSON objects. Also, it would be a non-standard project, which would make it difficult to maintain.

Is there a solution to this dilemma?

I was thinking ... maybe there is a differentiated JSON format that is supported by libraries, and is specific to cases where the data names are constant.

For example, something like this:

"clientes":[
    {$"nome", "sobrenome"$}, 
    {"João", "Silva"}, 
    {"José", "Barbosa"}, 
    {"Maria", "Joana"}
]

Is there something like this?

If not, what would be the best solution? I need to return large objects and I would also like to use a common format to transfer them.

    
asked by anonymous 14.11.2014 / 02:28

3 answers

8

If the system is very well documented, nothing prevents you from removing everything that is not needed and doing just that:

[
    ["João", "Silva"], 
    ["José", "Barbosa"],
    ["Maria", "Joana"]
]

(line breaks are just for easy reading).

If you want a bit more complete, keeping all the data but without redundancy:

{
   "clientes": [
      "nomes": [ "João", "José", "Anônimo", "Maria" ],
      "sobrenomes": [ "Silva", "Barbosa", "", "Joana" ]
   ]
}

Do not just skip any empty parameters. Leave the "" in the missing position, so that the names do not go out of sync with the surnames.


Still, if you need to keep the field names separate for query, it might look something like this:

{
   "campos":
   [
       "nome", "sobrenome"
   ],
   "clientes":
   [
       ["João", "Silva"], 
       ["José", "Barbosa"],
       ["Maria", "Joana"]
   ]
}


It could divide you in a thousand other ways, but I believe that with these as a starting point, it is easy to solve the problem.

    
14.11.2014 / 02:53
6

First, JSON is a very compact object, imagine an improved XML, it is JSON. So you can seamlessly include many and many records without worrying too much about it, even you do not really need to send a string containing the JSON and then parse into it and transform in a JSON object, you can simply send the JSON object directly, most languages nowadays support this, so do not worry.

Regarding the structure you want, in a more "light" and without losing the context:

I recommend using this:

{

    "sobrenome":{
        "cliente":[
            "Silva",
            "Barbosa",
            "Joana"
        ]
    },
    "nome":{
        "cliente":[
            "João",
            "José",
            "Maria"
        ]
    }

}

But it is very important to highlight the words of @Bacco :

  

I could divide it in other ways, but it might be dangerous to confuse it when missing some parameter (I would have to leave the "" in the missing position, so I do not get out of sync):

{
   "nomes": [ "João", "José", "Anônimo", "Maria" ],
   "sobrenomes": [ "Silva", "Barbosa", "", "Joana" ]
}

But why should I use this framework?

Because in this way there is only Array's of data and not an Array with label's and data (causing label duplication to each record), such as an array with < in> labels and data:

{

    "cliente":[
        {
            "nome":"João",
            "sobrenome":"Silva"
        },
        {
            "nome":"José",
            "sobrenome":"Barbosa"
        },
        {
            "nome":"Maria",
            "sobrenome":"Joana"
        }
    ]

}

Note that the "name" and "last name" tags repeat every record on a huge object like yours would lead to a bigger performance problem .

Additional:

Since you mentioned Web Service first I believe you can store your json in a .json extension file containing just the json inside. And I can imagine that you have the possibility to work with javascript, so I recommend using $ .ajax () from jQuery that is nothing more than a request for the server that can request a JSON directly, for example:

$.ajax({
  dataType: "json",
  url: "arquivoJSON.json"
}).done(function(){
  alert( "sucesso" );
}).fail(function(){
  alert( "erro" );
});

And there's also a jQuery shorthand method called $.getJSON that would make the request even easier:

$.getJSON("arquivoJSON.json", function( json ) {
  console.log("Nome: "      + json.nome.cliente[0]);      //João
  console.log("Sobrenome: " + json.sobrenome.cliente[0]); //Silva
});

Conclusion:

JSON is the lightest way to retrieve encapsulated data from the server, so it is a technology developed for this purpose, to replace XML in most cases, and because it is a technology used for WEB and WEB applications to need performance, and speed of the user's internet, it really works out. It does not really matter what language you are using, you just need to be rescuing a JSON object in a straightforward way and with a well thought out structure, that you will not have major problems with performance and performance.

    
14.11.2014 / 12:03
3

For high-performance systems that require data exchange through a recognized yet flexible protocol, JSON is not the best solution. Note that I'm not talking about a web system that accesses server data via JavaScript.

There are several solutions for exchanging messages in formats that, in comparison to JSON, consume less time for serialization / deserialization and less bandwidth.

One of them is the BSON format. It is nothing more than the Binary JSON ( Binary JSON ). The proposal is to be lighter and more efficient than JSON, maintaining flexibility and compatibility. There are implementations for virtually all languages .

Although BSON is efficient, there are some scenarios where this format takes up more space for storage or memory than JSON. See this SOen response for more details.

Another more compact and faster alternative to JSON is the MessagePack format. It also has implementations for most languages.

In particular, I would do some testing with the MessagePack before deciding anything.

    
14.11.2014 / 20:15