Error creating column in dataset Pandas

0

Hello,

I am creating a project in python using Pandas and I want to create a column whose values are the Closed - Open column, but an error occurs that I can not resolve.

My code:

import pandas as pd

dataset = pd.read_csv(r'Documents\Projeto\PETR4.csv', sep=',')
dataset['Date'] = pd.to_datetime(dataset['Date'])
dataset['Variation'] = dataset['Close'].sub(dataset['Open'])

The Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-309e31139274> in <module>()
----> 1 dataset['Variation'] = dataset['Close'].sub(dataset['Open'])

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\ops.py in flex_wrapper(self, other, level, fill_value, axis)
   1049             self._get_axis_number(axis)
   1050         if isinstance(other, ABCSeries):
-> 1051             return self._binop(other, op, level=level, fill_value=fill_value)
   1052         elif isinstance(other, (np.ndarray, list, tuple)):
   1053             if len(other) != len(self):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py in _binop(self, other, func, level, fill_value)
   1598 
   1599         with np.errstate(all='ignore'):
-> 1600             result = func(this_vals, other_vals)
   1601         name = _maybe_match_name(self, other)
   1602         result = self._constructor(result, index=new_index, name=name)

TypeError: unsupported operand type(s) for -: 'str' and 'str'

Example table rows:

Can you help me?

Thank you.

    
asked by anonymous 21.03.2018 / 18:01

1 answer

3

You probably downloaded this data from Yahoo Finance. I did the same and here they are:

Date,Open,High,Low,Close,Adj Close,Volume
2010-01-04,36.950001,37.320000,36.820000,37.320000,33.627335,13303600
2010-01-05,37.380001,37.430000,36.799999,37.000000,33.339001,21396400
2010-01-06,36.799999,37.500000,36.799999,37.500000,33.789528,18720600
2010-01-07,37.270000,37.450001,37.070000,37.150002,33.474155,10964600
2010-01-08,37.160000,37.389999,36.860001,36.950001,33.293945,14624200

The error message says that the error is because these variables were read as string:

TypeError: unsupported operand type(s) for -: 'str' and 'str'

I'm almost sure that the problem is that pandas are interpreting their numeric variables ( Open , Close , etc) as strings because the decimal separator must be wrongly specified or because of some other error , some trace, etc). This is because its Volume variable does not appear to be numeric.

If it is some error in the base you have to search. Because I downloaded PETR4's Yahoo Finance csv for all of 2010 and it did not give a problem.

But the easiest way to resolve this is by using the decimal option. Assuming it's dot '.' and not comma ',' you should write:

dataset = pd.read_csv(r'Documents\Projeto\PETR4.csv', sep=',', decimal='.')

If this is not enough, also set the thousands separator using thousands = ',' or mode appropriate to what appears in csv.

If it still does not work, you can try other options:

  • dtype = {'Open': np.float64, 'Close': np.float64}
  • converters , you can pass a dictionary of functions that clean the variables depending on the case

Note Usually what is used as price variation is the difference of the current closing price with the closing price of the previous period. If this is your case, you can do

  • To calculate the daily rate of return:

    dataset['Variation'] = dataset['Close'].pct_change()

  • For the daily return (in Reais):

    dataset['Variation'] = dataset['Close'].sub(dataset['Close'].shift(1))

22.03.2018 / 01:49