I'm trying to write a function in python that takes two vectors, deletes duplicates in the first and adds the corresponding data in the second.
I mean trying to emulate the following MySQL query:
SELECT 'Price', sum('Volume') FROM 'Dataset' GROUP BY 'Price'
My code looks like this:
def Group(X, Y):
temp = dict.fromkeys(X, [0, 0])
L = range(len(X))
for i in L:
print('BEFORE: i=%s, temp=%s, temp[X[i]]=%s, X=%s, X[i]=%s, Y=%s, Y[i]=%s' %(i, temp, temp[X[i]], X, X[i], Y, Y[i]))
temp[X[i]][0] += Y[i]
temp[X[i]][1] += 1
print('AFTER: i=%s, temp=%s, temp[X[i]]=%s, X=%s, X[i]=%s, Y=%s, Y[i]=%s' %(i, temp, temp[X[i]], X, X[i], Y, Y[i]))
return temp
When calling the function, it should respond like this:
>>> a = [1, 1, 2]
>>> b = [3, 4, 5]
>>> Group(a, b)
BEFORE: i=0, temp={1: [0, 0], 2: [0, 0]}, temp[X[i]]=[0, 0], X=[1, 1, 2], X[i]=1, Y=[3, 4, 5], Y[i]=3
AFTER: i=0, temp={1: [3, 1], 2: [0, 0]}, temp[X[i]]=[3, 1], X=[1, 1, 2], X[i]=1, Y=[3, 4, 5], Y[i]=3
BEFORE: i=1, temp={1: [3, 1], 2: [0, 0]}, temp[X[i]]=[3, 1], X=[1, 1, 2], X[i]=1, Y=[3, 4, 5], Y[i]=4
AFTER: i=1, temp={1: [7, 2], 2: [0, 0]}, temp[X[i]]=[7, 2], X=[1, 1, 2], X[i]=1, Y=[3, 4, 5], Y[i]=4
BEFORE: i=2, temp={1: [7, 2], 2: [5, 1]}, temp[X[i]]=[5, 1], X=[1, 1, 2], X[i]=2, Y=[3, 4, 5], Y[i]=5
AFTER: i=2, temp={1: [7, 2], 2: [5, 1]}, temp[X[i]]=[5, 1], X=[1, 1, 2], X[i]=2, Y=[3, 4, 5], Y[i]=5
{1: [7, 2], 2: [5, 1]}
But it's responding like this:
>>> a = [1, 1, 2]
>>> b = [3, 4, 5]
>>> Group(a, b)
BEFORE: i=0, temp={1: [0, 0], 2: [0, 0]}, temp[X[i]]=[0, 0], X=[1, 1, 2], X[i]=1, Y=[3, 4, 5], Y[i]=3
AFTER: i=0, temp={1: [3, 1], 2: [3, 1]}, temp[X[i]]=[3, 1], X=[1, 1, 2], X[i]=1, Y=[3, 4, 5], Y[i]=3
BEFORE: i=1, temp={1: [3, 1], 2: [3, 1]}, temp[X[i]]=[3, 1], X=[1, 1, 2], X[i]=1, Y=[3, 4, 5], Y[i]=4
AFTER: i=1, temp={1: [7, 2], 2: [7, 2]}, temp[X[i]]=[7, 2], X=[1, 1, 2], X[i]=1, Y=[3, 4, 5], Y[i]=4
BEFORE: i=2, temp={1: [7, 2], 2: [7, 2]}, temp[X[i]]=[7, 2], X=[1, 1, 2], X[i]=2, Y=[3, 4, 5], Y[i]=5
AFTER: i=2, temp={1: [12, 3], 2: [12, 3]}, temp[X[i]]=[12, 3], X=[1, 1, 2], X[i]=2, Y=[3, 4, 5], Y[i]=5
{1: [12, 3], 2: [12, 3]}
In the first execution of the loop the function errs, because I want it to add the value in Y[0]
only in temp[1][0]
, not temp[2][0]
also.
Can anyone see what I'm doing wrong?
Thank you in advance!
[RESOLVED]
I want to thank you again for the valuable tips and answers! The solution I chose / found was:
def Group(X, Y, isSum):
temp = dict.fromkeys(X, (0, 0))
for i in range(len(X)):
temp[X[i]] = (temp[X[i]][0] + Y[i], temp[X[i]][1] + 1)
if not isSum:
for i in temp:
temp[i] = temp[i][0] / temp[i][1]
return temp
And the machine response was:
>>> a = [1, 1, 2, 3, 4]
>>> b = [5, 6, 7, 7, 8]
>>>
>>> Group(a, b, True)
{1: (11, 2), 2: (7, 1), 3: (7, 1), 4: (8, 1)}
>>> Group(b, a, False)
{8: 4.0, 5: 1.0, 6: 1.0, 7: 2.5}
>>>
I actually wanted two functions in one, to emulate these two queries:
SELECT 'Price', SUM('Volume') FROM 'Dataset' GROUP BY 'Price' ;
SELECT 'Volume', AVG('Price') FROM 'Dataset' GROUP BY 'Volume' ;
I apologize for having presented only half the problem.
Hugs to all!
[RESOLVED 2]
Here's an add-on, with real problem data:
Dataset = [
# Tempo, Volume e Preço do par BTCUSD em [https://www.bitfinex.com/]
(1426502898, 0.01, 292.76),
(1426503160, 0.01, 292.77),
(1426503169, 0.0224, 292.77),
(1426503180, 0.304111, 292.77),
(1426503188, 0.026, 292.77),
(1426503194, 0.388, 292.77),
(1426503287, 0.01, 292.76),
(1426503287.14286, 0.202, 292.7),
(1426503287.28571, 0.7082, 292.77),
(1426503287.42857, 1.12, 292.75),
(1426503287.57143, 2.57392822, 292.76),
(1426503287.71429, 3.095, 292.77),
(1426503287.85714, 7, 292.77),
(1426503292, 0.22, 292.7),
(1426503320, 0.5, 292.78),
(1426503340, 0.5, 292.78),
(1426503358, 0.5, 292.78),
(1426503377, 0.01, 292.76),
(1426503377.5, 0.49, 292.78),
(1426503378, 0.01, 292.75),
(1426503384, 0.01, 292.76),
(1426503385, 0.1, 292.78),
(1426503394, 0.01, 292.76),
(1426503395, 0.5, 292.78),
(1426503399, 0.01, 292.76),
(1426503402, 0.01, 292.76),
(1426503407, 0.01, 292.76),
(1426503414, 0.181549, 292.78),
(1426503414.5, 0.318451, 292.78),
(1426503415, 0.01, 292.76)
]
partition = lambda data, pos: [i[pos] for i in data]
def Group(X, Y, isSum):
temp = dict.fromkeys(X, (0, 0))
for i in range(len(X)):
temp[X[i]] = (temp[X[i]][0] + Y[i], temp[X[i]][1] + 1)
if not isSum:
for i in temp:
temp[i] = temp[i][0] / temp[i][1]
return temp
And the machine response was:
>>>
>>> Group(partition(Dataset, 2), partition(Dataset, 1), True)
{292.77: (11.553711, 8), 292.78: (3.09, 8), 292.76: (2.6639282199999985, 10), 292.75: (1.1300000000000001, 2), 292.7: (0.42200000000000004, 2)}
>>> Group(partition(Dataset, 1), partition(Dataset, 2), False)
{0.304111: 292.77, 1.12: 292.75, 2.57392822: 292.76, 3.095: 292.77, 0.181549: 292.78, 0.5: 292.78, 7: 292.77, 0.026: 292.77, 0.202: 292.7, 0.01: 292.76000000000005, 0.1: 292.78, 0.388: 292.77, 0.318451: 292.78, 0.49: 292.78, 0.0224: 292.77, 0.22: 292.7, 0.7082: 292.77}
>>>