How to process data without the kernel die?

0

I want to process the data in a unsupervised.py file. However, every time I start, my computer almost freezes and it looks like the kernel dies. It seems to be generated because of a memory management error. In particular, it is when I execute the ** step 3 ** of the following function:

>>>train.shape
(130318, 4)
>>>len(dict_emb)
179862
>>>def process_data(train):

    print("step 1")
    train['sentences'] = train['context'].apply(lambda x: [item.raw for item in TextBlob(x).sentences])

    print("step 2")
    train["target"] = train.apply(get_target, axis = 1)

    print("step 3")
    train['sent_emb'] = train['sentences'].apply(
        lambda x: [dict_emb[item][0] 
        if item in dict_emb 
        else np.zeros(4096) for item in x)

>>>train = process_data(train)

Maybe it's a memory problem? Are there solutions online? For now, I'll try Google Collaboratory ...

Maybe turn this into a loop that will handle a problem per packet of lines? My attempt:

for i in range(0,len(train.shape[0]-200,200)):
    print(i)
    train['sent_emb'] = train['sentences'].iloc[i,i+200].apply(
        lambda x: [dict_emb[item][0] 
        if item in dict_emb 
        else np.zeros(4096) for item in x])      

But this gives me several errors:

step 1
step 2
step 3

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-d3e879a8c753> in <module>()
----> 1 train = process_data(train)

<ipython-input-25-7063894d5c9a> in process_data(train)
     10     #train['sent_emb'] = train['sentences'].apply(lambda x: [dict_emb[item][0] if item in\
     11     #                                                       dict_emb else np.zeros(4096) for item in x])
---> 12     train['quest_emb'] =[]
     13     for i in range(0,len(train.shape[0]-200,200)):
     14         print(i)

~/Documents/programming/mybot/mybotenv/lib/python3.5/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   3117         else:
   3118             # set column
-> 3119             self._set_item(key, value)
   3120 
   3121     def _setitem_slice(self, key, value):

~/Documents/programming/mybot/mybotenv/lib/python3.5/site-packages/pandas/core/frame.py in _set_item(self, key, value)
   3192 
   3193         self._ensure_valid_index(value)
-> 3194         value = self._sanitize_column(key, value)
   3195         NDFrame._set_item(self, key, value)
   3196 

~/Documents/programming/mybot/mybotenv/lib/python3.5/site-packages/pandas/core/frame.py in _sanitize_column(self, key, value, broadcast)
   3389 
   3390             # turn me into an ndarray
-> 3391             value = _sanitize_index(value, self.index, copy=False)
   3392             if not isinstance(value, (np.ndarray, Index)):
   3393                 if isinstance(value, list) and len(value) > 0:

~/Documents/programming/mybot/mybotenv/lib/python3.5/site-packages/pandas/core/series.py in _sanitize_index(data, index, copy)
   3999 
   4000     if len(data) != len(index):
-> 4001         raise ValueError('Length of values does not match length of ' 'index')
   4002 
   4003     if isinstance(data, ABCIndexClass) and not copy:

ValueError: Length of values does not match length of index
    
asked by anonymous 11.08.2018 / 12:48

0 answers