Displaying massive amount of data

4

From the moment we work with a very large amount of data (eg more than 3 million records), and we need to display this data on the screen while the user is using the page, always having the best performance possible. / p>

  • Is partitioning this list of data into smaller groups and using threads and handlers to display those smaller groups valid?

  • What best practices for display?

  • What are the key factors that affect display performance and why?

  • How much is this database modeling important for programming in this case?

asked by anonymous 26.09.2014 / 15:06

2 answers

4

According to my experience, I list some practices that help in this regard. I'll quote you by trying to answer your questions:

  • Pagination Usage:
    • It is certainly a good practice, especially for the sake of reducing data traffic. Also consider the correct creation of indexes by which information can be paginated (or "sliced"). Here, too, a little "psychology", because it is always interesting that the user finds his information as soon as possible. Both this is better for him, and for the application he will have to work less. See Google, whose greatest asset is to prevent the vast majority of searches from needing to navigate to the second page.
  • Lean Query's:
    • For volumes at this level, you should ideally fetch and transmit only the information that will be presented from the data source. This might not give such a big impact on the server side, but certainly - in the case of a Web application - will impact the size of the data trafficked. If you are a server with high volume concurrent connections, these economies do differ with scale growth;
  • Information Grouping
    • Occasionally, you'll need to provide a summary to the user of these millions of information lines, such as totals, pending tasks, latest postings, and so on. For these cases, I recommend creating processes on the server that will periodically preprocess these summaries. An example could be Materialized (Indexed) Views in the database, or a self-scheduled process that creates these summarized tables; / li>
  • Cache
    • If much of the information presented is repeated in consecutive uses (eg Home that always has the same promotion items), you can use frameworks that reduce repetitive queries to the database. As an example in .NET, I quote the NHibernate, which has a cache for query's ;
  • Database Structure
    • Data modeling is certainly important because as it is structured it can increase or reduce the amount of work the database has to perform. But not only that, the way the database is mounted can help a lot. An example of good practice at the banks that support this feature is to separate the physical location (disk) from writing tables and indexes, so that both can work in parallel without being affected. A consulting time for a DBA can greatly help this;
  • Distributed Databases
    • Here we start to complicate a bit, but depending on the volume of access and amount of information can be an interesting output. In Non-SQL databases this is a bit easier, by creating "slices" ( Shards ) that can be processed in parallel on different servers.

Finally, keep in mind that maintaining good performance is cyclical. Needs (problems) change, and their solutions as well. See this site with examples on techniques adopted by the Youtube team to maintain the scalability of a site that receives more than 1 billion views / day .

Considering that if you start with good practices, the first step is to identify the biggest bottleneck, and work on it. Resolved, the biggest bottleneck happens to be another.

And so, forever, follow our journey ...:)

    
26.09.2014 / 16:45
0
  Is the division of this list of data into smaller groups and the use of threads and handlers to display those smaller groups valid?

Yes.

  

What best practices for display?

Do you really need to render the 3 million? I believe that no matter what you do, the performance will not be the best because of the amount of elements. I would work with a smaller set.

  

What are the main factors that affect display performance and why?

What affects performance is precisely how many elements the browser needs rendering.

  

How much is this database modeling important for programming in this case?

The bank influences only to obtain this data. 3 million is a relatively high amount, so it's worth checking if there are any indexes if the disk and data are not fragmented.

    
26.09.2014 / 16:01