MemoryError in pandas

1

Hello I am using the pandas merge command in python3:

ibama_doadores_orig = pd.merge(eleitos_d_s_doadores, ibama, left_on='CPF_CNPJ_doador_originario_limpo', right_on='CPF_CNPJ_limpo')

But a message of MemoryError appears:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-20-e7b779815ee8> in <module>()
----> 1 ibama_doadores_orig = pd.merge(eleitos_d_s_doadores, ibama, left_on='CPF_CNPJ_doador_originario_limpo', right_on='CPF_CNPJ_limpo')

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator)
     52                          right_index=right_index, sort=sort, suffixes=suffixes,
     53                          copy=copy, indicator=indicator)
---> 54     return op.get_result()
     55 
     56 

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\reshape\merge.py in get_result(self)
    581             [(ldata, lindexers), (rdata, rindexers)],
    582             axes=[llabels.append(rlabels), join_index],
--> 583             concat_axis=0, copy=self.copy)
    584 
    585         typ = self.left._constructor

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   4830     blocks = [make_block(
   4831         concatenate_join_units(join_units, concat_axis, copy=copy),
-> 4832         placement=placement) for placement, join_units in concat_plan]
   4833 
   4834     return BlockManager(blocks, axes)

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\internals.py in <listcomp>(.0)
   4830     blocks = [make_block(
   4831         concatenate_join_units(join_units, concat_axis, copy=copy),
-> 4832         placement=placement) for placement, join_units in concat_plan]
   4833 
   4834     return BlockManager(blocks, axes)

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\internals.py in concatenate_join_units(join_units, concat_axis, copy)
   4937     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   4938                                          upcasted_na=upcasted_na)
-> 4939                  for ju in join_units]
   4940 
   4941     if len(to_concat) == 1:

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\internals.py in <listcomp>(.0)
   4937     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   4938                                          upcasted_na=upcasted_na)
-> 4939                  for ju in join_units]
   4940 
   4941     if len(to_concat) == 1:

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\internals.py in get_reindexed_values(self, empty_dtype, upcasted_na)
   5239             for ax, indexer in self.indexers.items():
   5240                 values = algos.take_nd(values, indexer, axis=ax,
-> 5241                                        fill_value=fill_value)
   5242 
   5243         return values

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\algorithms.py in take_nd(arr, indexer, axis, out, fill_value, mask_info, allow_fill)
   1465             out = np.empty(out_shape, dtype=dtype, order='F')
   1466         else:
-> 1467             out = np.empty(out_shape, dtype=dtype)
   1468 
   1469     func = _get_take_nd_function(arr.ndim, arr.dtype, out.dtype, axis=axis,

MemoryError: 

The file I want to generate will have 26 columns. Please, is there any way to avoid memory error? Or do I need to merge with just a few columns?

    
asked by anonymous 05.10.2017 / 14:24

1 answer

1

MemoryError on pandas happens when you try to load a very large dataframe into memory. Try to break your rendering in chunks of dataframes and concatenate them later!

    
31.01.2018 / 22:16