Linear Regression in Various Products

2

I ran a simple regression to a database with a product (Product, Volume, Price). It turned perfectly. But I would like to run the same regression on a base with more products, however, I want to be able to choose the product I want to run the regression, see:

ex.

Produto | Volume | Preço

A

A         

B

B

I want to run regression only on product B.

  • How to do this?

Code

import pandas as pd

Pasta1 = pd.ExcelFile ('Pasta2.xlsx')
Daniel = pd.read_excel (Pasta1, 'Tela')


from scipy.stats import linregress

x= Daniel ['Preço']
y= Daniel ['Volume'] 
m, b, R, p, SEm = linregress (x, y)

pd.DataFrame ([m , b, R, p, SEm] , columns=['Valores'] , index=['declive', 
'ordenada_na_origem', 'coeficiente_de_correlação_(de_Pearson)', 'p-value', 
'erro_padrão'])

Result:

Valores

declive: 421.398071 

ordenada_na_origem: 1432.443189 

coeficiente_de_correlação_(de_Pearson): 0.331966 

p-value: 0.000003 

erro_padrão: 86.869651 
    
asked by anonymous 31.01.2018 / 13:08

2 answers

1

Given what appears to be your data, I was able to resolve it using the .loc of the pandas dataframe.

An example of how I did it:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,4),index=list('abadaf'),columns=list('ABCD'))
>>df1
          A         B         C         D
a -0.973031  0.305699  1.330237 -0.799858
b -0.879060  0.238690 -2.729635 -0.457865
a -2.001388  1.058163 -0.328737  0.134416
d  0.994644 -2.305340 -0.714434  0.298462
a -2.242108 -0.331434  0.969981  0.973202
f -0.483833  0.783812  0.925608  0.590251

>>df1.loc['a']
          A         B         C         D
a -0.973031  0.305699  1.330237 -0.799858
a -2.001388  1.058163 -0.328737  0.134416
a -2.242108 -0.331434  0.969981  0.973202

>> df1.loc['a','A']
a   -0.973031
a   -2.001388
a   -2.242108

Here the "product name" is as index . If you want to call data based on your values (strings or numbers), you can use .loc together with >

>> df1 = pd.DataFrame([['a',1,2,3],['b',2,3,4],['a',3,4,5],['c',4,5,6]],index=list('defg'),columns=list('higj'))
>> df1
   h  i  g  j
d  a  1  2  3
e  b  2  3  4
f  a  3  4  5
g  c  4  5  6

>> df1.h=='a'
d     True
e    False
f     True
g    False
Name: h, dtype: bool
>> df1.loc[ df1.h=='a',:]
   h  i  g  j
d  a  1  2  3
f  a  3  4  5
>> df1.loc[ df1.h=='a','i']
d    1
f    3
    
31.01.2018 / 15:28
0

With the help of Guto, I decided as follows:

import pandas as pd
import matplotlib.pyplot as plt

Pasta1 = pd.ExcelFile ('Pasta2.xlsx')
Daniel = pd.read_excel (Pasta1, 'Tela')


from scipy.stats import linregress

x= Daniel.loc [(Daniel ['Preço'] > 0) & (Daniel ['Produto'] == 'A')]
x1= x ['Preço']
y= Daniel.loc [(Daniel ['Volume'] > 0) & (Daniel ['Produto'] == 'A')]
y1= y ['Volume']
Produto_A = linregress (x1, y1)


x2= Daniel.loc [(Daniel ['Preço'] > 0) & (Daniel ['Produto'] == 'B')]
x3= x2 ['Preço']
y2= Daniel.loc [(Daniel ['Volume'] > 0) & (Daniel ['Produto'] == 'B')]
y3= y2 ['Volume']
Produto_B = linregress (x3, y3)


pd.DataFrame ([Produto_A, Produto_B] , index=['Valores', 'Valores2'])

Now I just need to find a way to run with more products, without having to create a block for each product.

    
01.02.2018 / 12:10