It seems to me from the description of your problem that you are dealing with a predictive problem, and more precisely, it is the problem of repairing the incomplete values of a set of data using the information contained in it . It is a common problem and known in the data science literature and the suggestions in general are to treat the problem as a normal classification or regression problem where the target variables will be the variables with incomplete values that you want to complete. p>
There are other ways in the literature to treat incomplete values, for example, the summary techniques here . However, since you've already decided to try to predict incomplete values by similarity, this link provides an easy example of how to implement a Linear Descriminant Analysis model for this purpose, using the machine learning library Scikit-Learn . I transcribe the specific part of the code below:
from pandas import read_csv
import numpy
from sklearn.preprocessing import Imputer
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
dataset = read_csv('pima-indians-diabetes.csv', header=None)
# mark zero values as missing or NaN
dataset[[1,2,3,4,5]] = dataset[[1,2,3,4,5]].replace(0, numpy.NaN)
# split dataset into inputs and outputs
values = dataset.values
X = values[:,0:8]
y = values[:,8]
# fill missing values with mean column values
imputer = Imputer()
transformed_X = imputer.fit_transform(X)
# evaluate an LDA model on the dataset using k-fold cross validation
model = LinearDiscriminantAnalysis()
kfold = KFold(n_splits=3, random_state=7)
result = cross_val_score(model, transformed_X, y, cv=kfold, scoring='accuracy')
print(result.mean())