Is it possible to train a model when I have only one of the mapped classes?

4

I have a large dataset (~ 1,700,000) that I would like to sort through. I also have a not so small sample (~ 8,000) classified as one of these classes (say, condition A), but I have none (zero) of the other classes (say, conditions B to Z). In addition, all variables are categorical.

Although there are many categories, I have only one interest (the one that I have some sample, condition A).

Can I train the model with only Type A observations? If not, how should I overcome this problem?

Is it reasonable to change the shape of the problem to a binary type rating (type A would be TRUE and the other types FALSE)? In this case, can I randomly take some of the unclassified observations and assume that the condition is FALSE? I know that most of the unclassified observations would be type B to Z (in the binary FALSE case).

Thanks in advance.

    
asked by anonymous 12.11.2016 / 19:05

1 answer

1

You can turn the problem into binary if your guess that among the ungraded most is false, as you say in your question. (Okay, it would not have any positive inside the unclassified ones, but if it's quite small, it probably will not mess up)

  

I know that most of the unclassified observations would be of type   B to Z (in the FALSE binary case).

In addition, many classifiers use this when using the one-vs-rest < in>

According to the discussion of the comments, I point out:

  • If there are observations of condition A within your 1.7M bank and your 8,000 sample is not a subsample of the 1.7M pool, this is probably not the best approach.
  • If the amount of condition A observations of the 1.7M set is really small, this method, despite being biased, will perform better than by randomly selecting a class.
17.11.2016 / 12:34