I have a vector, called istrain , with names:
istrain = c("carri", "challeng", ...)
And I want to turn them into columns of a dataframe, testSparse, which contains word frequency frequencies in comments, something like:
testSparse$cool = c(0,0,0,0,13,252,...)
testSparse$court= c(0,0,12,143,53,...)
the dataframe testSparse , after the operation, would have the columns:
testSparse$carri = c(0,0,0,0,0,...)
testSparse$challeng = c(0,0,0,0,0,...)
The manual mode is very time-consuming, since the new column vector has more than 100 occurrences, has anyone ever made or knows a package that does something like this, but more quickly?
Note: R language is dataframes of pre-processing for decision trees for text mining, and the new column vector is the difference between the final training and test corpus, and the function caused by these modifications seeks to be more generic, to be able to be applied to new text bases, using the same decision tree to check if the comment is offensive or not, but the new text base has words that were not previously, and does not have some that already exist. the frequency of new words, therefore, must be 0, and with these added, it is possible to pass the new base and predict if it is an offensive comment or not, among several classes of "offensiveness."