I have a large dataset (~ 1,700,000) that I would like to sort through. I also have a not so small sample (~ 8,000) classified as one of these classes (say, condition A), but I have none (zero) of the other classes (say, conditions B to Z). In addition, all variables are categorical.
Although there are many categories, I have only one interest (the one that I have some sample, condition A).
Can I train the model with only Type A observations? If not, how should I overcome this problem?
Is it reasonable to change the shape of the problem to a binary type rating (type A would be TRUE and the other types FALSE)? In this case, can I randomly take some of the unclassified observations and assume that the condition is FALSE? I know that most of the unclassified observations would be type B to Z (in the binary FALSE case).
Thanks in advance.