Efficient data structure for competing problem of high scores

6

As part of a test I was asked to develop an application to manage high-score data in highly competitive environments. This application should not persist on disk or use libraries and external frameworks . Basically the requirements say:

  • Optimizing memory and CPU usage is very important
  • the service will receive a stellar number of simultaneous requests
  • For the http part we should use the HttpServer .

The application is basically a set of services rest for the user:

  • logar (should return a token and keep the user authenticated for 10 minutes)
  • post a high score for a certain level
  • Get a list of high scores for a certain level (in order)

The system should only return 15 high scores for each level, high scores are unique for each User / Level pair (ie for xo user phase y you should only have a high score ).

Given the functional and non-functional requirements, I thought of the following data structure to store high scores (with minimum retention):

// Mapa que liga um nivel a uma lista ordenada de High Scores
private static final Map<Integer, ConcurrentSkipListSet<HighScore>> scores = 
   new ConcurrentHashMap<>();
// Conjunto de usuários autenticados
private static final Set<AuthToken> authorized = 
   Collections.newSetFromMap(new ConcurrentHashMap<AuthToken, Boolean>());

While this data structure seems to me efficient in the sense of reducing retention, I do not know how much it ends up wasting memory internally.

In the same way, ConcurrentSkipListSet allows you to insert in log(n) and make the implementation of the method that returns trivial scores . On the other hand, to limit Set to top 15 scores by level (to avoid wasting memory and crashes ), I did some real gymnastics including using < AtomicInteger to count the amount of high scores high scores by level (the size method of ConcurrentSkipListSet is costly) and the need to synchronize multiple points with locks .

In short, I'm not convinced that I'm using the most efficient strategy to solve the problem, and I'm not used to thinking of such a low level ...

In the "real" world, you would solve this problem trivially with the use of a container Jetty >, a Servlet for the login by storing the authentication in the session and probably an embedded database as the H2 . Unfortunately I am not allowed to use any of this.

In this way, how would you approach such a problem given the constraints technology devices? Would they use what data structures (or would they develop their own?)? Is my strategy "good enough" for the problem or did I travel the solution?

PS This problem submitted to Code Review looks a lot like what I'm solving (and the solution was similar to mine too).

    
asked by anonymous 23.12.2013 / 14:35

1 answer

2

I do not have much experience with competing programming - particularly involving Java - so I can not suggest an efficient data structure, but I can give you some parameters to help you decide:

  • Do not worry about the complexity order of the structure, but its constant factor.

    You say that " ConcurrentSkipListSet allows ordered insertion in log(n) ", but is this even relevant in a list with only 15 elements? Wonder how many of us this list will create? (remember that a Java object has a fixed overhead of a 24 bytes if I remember well, plus a reference to HighScore and other internal references to the other nodes) And the faults in the prediction of deviation? ( branch prediction ) Not that I find this relevant, since the bottleneck of modern processors is not the CPU but the cache ...

    A simple array (or the equivalent thread-safe available) with references to HighScore s would occupy less space than 24 bytes + 4 or 8 bytes for each of the 15 references). And the performance to go through it would be good, even if it was messy (just go through it all remembering the lowest index, then add or replace the smaller as the total is less than or equal to 15) - mainly because the references will occupy a position contiguous memory (which avoids cache miss ). If the solution involves a critical section ( synchronize ), then the algorithm does not matter, since the lock will alone account for 90% of the execution time. By the way ...

  • synchronize is your worst enemy.

    Again, I'm not familiar with the features that Java offers for concurrent access, but if that's what ConcurrentSkipListSet can operate without using synchronize then it's better to use it than to do any "gambiarra" to to be able to access size in constant time. This method is only "costly" if its set is large, but log(n) in a list of 15 elements is almost the same as O(1) .

    Establishing a critical section always involves accessing the disk (a few% of CPU cycles the last time I checked), so avoid using 10.000 whenever possible . By the way ...

  • You do not have to wait for an operation to be completed before you finish request .

    When the user submits his high score , I suggest "queuing" for processing and moving on. Take a look at synchronize for example. If you have a thread dedicated to putting together all the submitted high scores (ie each request produces something, in section consume ), it is easier to choose data structures that perform well without having to use a lock

    If the user immediately requests the list for a certain level, you can always get a copy of that list (it is recommended that the thread only to update it in atomic form), if necessary update it with the last scores still unprocessed ( ConcurrentLinkedQueue ) and return. If you want to avoid this rework - and it's okay to occasionally get an outdated (unavoidable) list that does not include the last score the user posted (avoidable but brings more complexity) - just return the list in its current state.

  • Furthermore, I believe your proposed solution is adequate. In practice I'd suggest that instead of a set of access tokens, you map a token to a user (to prevent a user from updating the high score on the other), but if you were concerned about cheating anything you did it would be 100% guaranteed ... (since the data comes from the client side)

        
    23.12.2013 / 16:14