One way to do this, would be (I do not know if it's the most efficient, but it's possible and it works):
Find out where your tabs are, ie which lines have empty text values for each column, and save the indexes in a list
Run for each index of the DataFrame
list and separate the Series that you own in subseries according to the index (each subset containing an Action)
Reformat this linhasVazias
containing the resulting subset in the new format
Saving in the final df, you will receive the new information
Here is the code where I do these operations:
linhasVazias = df[(df['c0'] == "") & (df['c1'] == "") ].index.tolist()
df_final = pd.DataFrame({'c0': [], 'c1': [], 'c2': []})
anterior = -1
for i in linhasVazias:
# Separa a série relacionada
temp = df[anterior+1 : i]
# Cria a nova coluna com o nome da ação
temp['c2'] = temp.iloc[0][0]
# Remove a primeira linha, com o nome da ação
temp = temp.drop([anterior+1], axis = 0)
# Salva no novo dataFrame as linhas relacionadas
df_final = df_final.append(temp)
anterior = i
# Reseta os index no novo DataFrame, excluindo a coluna dos valores antigos
df_final = df_final.reset_index(drop = True)
OBS:
- here I used "c0", "c1" and "c2" for the name of the columns
- for your case, which has a very large DF, I do not know if the processing will be efficient, but worth the test.