What is the most efficient way to implement GroupBy in Javascript?

6

I'm trying to implement a GroupBy with these parameters

function GroupBy(keySelector, elementSelector, comparer)
{
    // keySelector = function(e) { return e.ID }
    // elementSelector = function(e) { return e.Name }
    // comparer = { Equals: function(a,b) { return a==b }, GetHashCode:... }
}

However I do not know an efficient way to implement this.

I created a test in jsPerf here with linq.js and a method I created that does not use a comparer , so this only works on 'normal' types. ( Test these methods here )

Other libraries such as underscore and Lo-Dash do not have the comparer parameter so their implementation is not relevant.

The key of each group can be up to one class, I need a comparer to know if the key value is equal between objects.

I'm basically trying to copy the C # Linq GroupBy documented here .

Example input:

var arrComplex =
[
    { N: { Value: 10 }, Name: "Foo" },
    { N: { Value: 10 }, Name: "Bar" },
    { N: { Value: 20 }, Name: "Foo" },
    { N: { Value: 20 }, Name: "Bar" }
];

Output example (or something like this):

[
    {
       "Key": {"Value":10},
       "Elements":["Foo","Bar"]
    },
    {
        "Key": {"Value":20},
        "Elements":["Foo","Bar"]
    }
] 

Any ideas how to implement this?

    
asked by anonymous 17.12.2013 / 13:56

2 answers

0

I was able to implement this way:

First I need a way to get the HashCode of objects.

Object.prototype.GetHashCode = function () {
    var s = this instanceof Object ? JSON.stringify(this) : this.toString();

    var hash = 0;
    if (s.length === 0) return hash;
    for (var i = 0; i < s.length; ++i) {
        hash = ((hash << 5) - hash) + s.charCodeAt(i);
    }
    return hash;
};
Number.prototype.GetHashCode = function () { return this.valueOf(); };

Now my GroupBy stayed like this

function GroupBy(a, keySelector, elementSelector, comparer)
{
    // seta os valores padrões para os parâmetros opicionais
    elementSelector = elementSelector || function(e) { return e; };
    comparer = comparer ||
        {
            Equals: function(a,b) { return a==b },
            GetHashCode: function(e) { return e.GetHashCode(); }
        };

    var key, hashKey, reHashKey;

    // agrupa elementos pelo hashcode
    var hashs = {};
    for (var i = 0, n = a.length; i < n; ++i)
    {
        // para caso do hash ser igual, mas retornar false no Equals
        reHashKey = undefined;

        // obtêm a chave do objeto
        key = keySelector(a[i]);
        // pega o hashcode do objeto
        hashKey = comparer.GetHashCode(key);

        // se o hash já existe na lista
        // compara os valores com Equals
        // caso retorne falso, gera um hash unico
        if (typeof hashs[hashKey] !== "undefined")
            reHashKey = comparer.Equals(key, hashs[hashKey].Key) ? hashKey : hashKey + " " + i;

        // se foi gerado um hash novo, utiliza-se esse
        if (typeof reHashKey !== "undefined" && reHashKey !== hashKey)
            hashKey = reHashKey;

        // pega ou cria um grupo e adiciona os elementos
        hashs[hashKey] = hashs[hashKey] || { Key: key, Elements: [] };
        hashs[hashKey].Elements.push(a[i]);
    }

    return hashs;
}

To test

var arrComplex =
[
    { N: { Value: 10 }, Name: "Foo" },
    { N: { Value: 10 }, Name: "Bar" },
    { N: { Value: 20 }, Name: "Foo" },
    { N: { Value: 20 }, Name: "Bar" }
];
//

var x = GroupBy(arrComplex
        , function(e) { return e.N; }
        , function(e) { return e.Name; }
        , {
              Equals: function(a,b) { return a.Value == b.Value },
              GetHashCode: function(e) { return e.GetHashCode(); }
          }
);
//

console.log(x);

Example in jsFiddle

But, according to my tests , my implementation of GroupBy is slower than linq. js. It is only faster to convert the result to an array.

Test results

Nome                         op/s
---------------------------------
GroupBy                   163,261
GroupByToArray            152,382
linq.js groupBy           243,547
linq.js groupBy toArray    26,309
    
17.12.2013 / 15:50
5

Many libraries that implement GroupBy add as an additional requirement that the entry is already ordered by key . A key-ordering is O(N*log(N)) , but once done it is possible to group lists of any size quickly, in O(N) , as it is enough to compare the key of an element with the key of the last group found:

function GroupBySimples(a, keySelector, elementSelector, comparer)
{
    if ( a.length == 0 )
        return [];
    elementSelector = elementSelector || function(e) { return e; };
    comparer = comparer ||
        {
            Equals: function(a,b) { return a==b },
            GetHashCode: function(e) { return e.GetHashCode(); }
        };

    var ultimaChave = keySelector(a[0]);
    var ultimoElemento = { Key: ultimaChave, Elements: [elementSelector(a[0])] };
    var ret = [ultimoElemento];
    for (var i = 1, n = a.length; i < n; ++i)
    {
        var key = keySelector(a[i]);
        if ( comparer.Equals(key, ultimaChave) )
            ultimoElemento.Elements.push(elementSelector(a[i]));
        else {
            ultimaChave = keySelector(a[i]);
            ultimoElemento = { Key: key, Elements: [elementSelector(a[i])] };
            ret.push(ultimoElemento);
        }
    }

    return ret;
}

var arrComplexOrdenado = arrComplex.sort(function(a, b) {
    return a.N.Value - b.N.Value;
});
var resultArray = GroupBySimples(arrComplexOrdenado
        , function(e) { return e.N; }
        , function(e) { return e.Name; }
        , {
              Equals: function(a,b) { return a.Value == b.Value },
              GetHashCode: function(e) { return e.Value.GetHashCode(); }
          }
);

As you can see in this performance test , this method is more than% effective% as alternatives (using hashes).

Notes:

  • Are you sure that 3x is producing a correct result? His performance has remained about the same, with a twenty-five thousand times greater entry , something can only be wrong ...

  • 17.12.2013 / 17:19