Retrieve variable values inside a class in a Python function (partial_fit)

0

I'm creating a new algorithm to run along with the algorithms provided by the sklearn package in python, however the dataset used is extremely large and I'm using the partial_fit function (Example: Naive Bayes link ) so you can get blocks from the dataset and run the training / test. However, this function is retrieved several times and some variables can not lose the value after they are returned to Main, so that it is possible to update the values at each block increment.

And I'm in doubt on how to store this value inside the function without it being restarted with each new call ??? Note that I do not want to return them to Main, but rather that they are stored inside the function so that I can retrieve them later. And without using GLOBAL variable.

Ex: Code

Code Snippet:

for i, (X_train_text, y_train) in enumerate(minibatch_iterators):
    tick = time.time()
    X_train = vectorizer.transform(X_train_text)
    total_vect_time += time.time() - tick

    for cls_name, cls in partial_fit_classifiers.items():
        tick = time.time()
        # update estimator with examples in the current mini-batch

        # função é chamada várias vezes
        cls.partial_fit(X_train, y_train, classes=all_classes)

        # accumulate test accuracy stats
        cls_stats[cls_name]['total_fit_time'] += time.time() - tick
        cls_stats[cls_name]['n_train'] += X_train.shape[0]
        cls_stats[cls_name]['n_train_pos'] += sum(y_train)
        tick = time.time()
        cls_stats[cls_name]['accuracy'] = cls.score(X_test, y_test)
        cls_stats[cls_name]['prediction_time'] = time.time() - tick
        acc_history = (cls_stats[cls_name]['accuracy'],
                       cls_stats[cls_name]['n_train'])
        cls_stats[cls_name]['accuracy_history'].append(acc_history)
        run_history = (cls_stats[cls_name]['accuracy'],
                       total_vect_time + cls_stats[cls_name]['total_fit_time'])
        cls_stats[cls_name]['runtime_history'].append(run_history)

NOTE: See that cls.partial_fit is called several times by more than one Classifier and at the end a new block of the dataset is incremented and again the classifiers are called, however the variables do not lose values allocated within their functions. In the case of Naive Bayes it still continues with the values of the last update call. (Ex of variables updated online Naive Bayes: mean and standard deviation)

Follow Video for explanation: Link

    
asked by anonymous 23.04.2018 / 06:17

0 answers