Limit aggregation per grouped item in mongo

3

I have a collection composed as follows, but with much more data.

{
  _id: ObjectId("db759d014f70743495ef1000"),
  tracked_item_origin: "winword",
  tracked_item_type: "Software",
  machine_user: "mmm.mmm",
  organization_id: ObjectId("a91864df4f7074b33b020000"),
  group_id: ObjectId("20ea74df4f7074b33b520000"),
  tracked_item_id: ObjectId("1a050df94f70748419140000"),
  tracked_item_name: "Word",
  duration: 9540,
}

{
  _id: ObjectId("2b769d014f70743495fa1000"),
  tracked_item_origin: "http://www.facebook.com",
  tracked_item_type: "Site",
  machine_user: "gabriel.mello",
  organization_id: ObjectId("a91864df4f7074b33b020000"),
  group_id: ObjectId("3f6a64df4f7074b33b040000"),
  tracked_item_id: ObjectId("6f3466df4f7074b33b080000"),
  tracked_item_name: "Facebook",
  duration: 7920,
}

I've done an aggregation that returns me in a grouped form all this data as follows:

{"_id"=>{"tracked_item_type"=>"Site", "tracked_item_name"=>"Twitter"}, "duration"=>288540},
{"_id"=>{"tracked_item_type"=>"Site", "tracked_item_name"=>"ANoticia"}, "duration"=>237300},
{"_id"=>{"tracked_item_type"=>"Site", "tracked_item_name"=>"Facebook"}, "duration"=>203460},
{"_id"=>{"tracked_item_type"=>"Software", "tracked_item_name"=>"Word"}, "duration"=>269760},
{"_id"=>{"tracked_item_type"=>"Software", "tracked_item_name"=>"Excel"}, "duration"=>204240}

The aggregation code is simple:

AgentCollector.collection.aggregate(
  {'$match' => {group_id: '20ea74df4f7074b33b520000'}},
  {'$group' => {
    _id: {tracked_item_type: '$tracked_item_type', tracked_item_name: '$tracked_item_name'},
    duration: {'$sum' => '$duration'}
  }},
  {'$sort' => {
    '_id.tracked_item_type' => 1,
    duration: -1
  }}
)

My problem, how can I limit to only 2 grouped Site items (tracked_item_type: "Site") and 2 Software items (tracked_item_type: "Software")?

    
asked by anonymous 20.02.2014 / 16:44

1 answer

1

I thought a lot in a simpler way to do this, but apparently the only possible way is to throw the result of that aggregation into a collection and then make the filters on top of that stored result.

Saving the report result to a new collection : The aggregate command does not have this type of option, so I can replace it with a mapReduce () that will do the same thing, however specifying the option "out":

db.trackeditems.mapReduce(
  function() {
      var key = {'tracked_item_type': this.tracked_item_type, 'tracked_item_name': this.tracked_item_name};
      emit(key, this.duration);
  },
  function(key, values) { 
      var totalDuration = 0; 
      for (var i = 0; i < values.length; ++i) { 
          totalDuration += values[i] 
      } 
      return totalDuration;
  },
  {
     out: "tracked_items_report",
     query: {group_id: ObjectId('20ea74df4f7074b33b520000')}
  }
)

Note: I'm not aware of the API in Ruby, so I'm just showing the example on the Mongo command line.

Querying the report

After executing this command you will have in your db the tracked_items_report collection with the result of the aggregation. From there you can do any query by ordering and limiting the records in order to achieve the desired result.

// Obtendo os 2 tops no tipo 'Site'
db.tracked_items_report.find({'_id.tracked_item_type': 'Site'}).sort({'value':-1}).limit(2)
// Obtendo os 2 tops no tipo 'Software'
db.tracked_items_report.find({'_id.tracked_item_type': 'Software'}).sort({'value':-1}).limit(2)

Yes, give a little more work, now there are 3 operations in the bank instead of just one (which according to your question would be ideal), but you also get "for free" an additional benefit that is to have the result of the aggregation (possibly a heavy transaction) saved as a cache for later queries.

    
25.02.2014 / 15:39