I run a difference calculation on two columns of data frames that are in a third column. However, if the calculation is executed, the last one will not be stored in the dataframe.
def predictions(train):
print("cosine_sim")
train["cosine_sim"] = train.apply(cosine_sim, axis = 1)
print("diff")
i = 0
for index, row in train.iterrows():
i += 1
row["diff"] = row["quest_emb"] - row["sent_emb"]
if i % 10000 == 0:
print("row ",i)
print("row[\"diff\"] ",row["diff"])
print("euclidean_dis")
print(train)
Then the first print (" row [\ "diff \"] ", row [" diff "]) na
row i 'gives me:
row 10000
row["diff"] [[-0.00541345 -0.00239381 0.00431296 ... -0.01337912 -0.0073709
0. ]]
row 20000
row["diff"] [[-0.03855522 -0.00136002 -0.02514186 ... -0.06655771 -0.02910786
-0.02423212]
[-0.03762216 -0.031567 -0.01083523 ... -0.01431298 -0.03401132
-0.01916602]]
But the resulting column is filled with NaN
:
sent_emb \
0 [[0.030376578, 0.044331014, 0.081356354, 0.062...
1 [[0.030376578, 0.044331014, 0.081356354, 0.062...
2 [[0.030376578, 0.044331014, 0.081356354, 0.062...
3 [[0.030376578, 0.044331014, 0.081356354, 0.062...
...
16289 [[0.035860058, 0.049851194, 0.0662197, 0.02581...
quest_emb \
0 [[0.01491953, 0.021973763, 0.021364095, 0.0393...
1 [[0.04444952, 0.028005758, 0.030357722, 0.0375...
2 [[0.03949683, 0.04509903, 0.018089347, 0.07667...
3 [[0.03284301, 0.01849968, 0.020346267, 0.03835...
...
16289 [[0.03924892, 0.04188699, 0.025356837, 0.04136...
cosine_sim diff
0 [0.1401391625404358, 0.11776834726333618, 0.09... NaN
1 [0.12254136800765991, 0.08665323257446289, 0.0... NaN
2 [0.09432470798492432, 0.06841456890106201, 0.0... NaN
3 [0.1274968981742859, 0.09279131889343262, 0.08... NaN
...
16289 [0.060139477252960205, 0.07225644588470459, 0.... NaN
I also tried a function, but it did not even create the column.