text mining - list of word frequencies using R -
i have been using tm package run text analysis. problem creating list words , frequencies associated same
library(tm) library(rweka) txt <- read.csv("hw.csv",header=t) df <- do.call("rbind", lapply(txt, as.data.frame)) names(df) <- "text" mycorpus <- corpus(vectorsource(df$text)) mystopwords <- c(stopwords('english'),"originally", "posted") mycorpus <- tm_map(mycorpus, removewords, mystopwords) #building tdm btm <- function(x) ngramtokenizer(x, weka_control(min = 3, max = 3)) mytdm <- termdocumentmatrix(mycorpus, control = list(tokenize = btm))
i typically use following code generating list of words in frequency range
frq1 <- findfreqterms(mytdm, lowfreq=50)
is there way automate such dataframe words , frequency?
the other problem face converting term document matrix data frame. working on large samples of data, run memory errors. there simple solution this?
try this
data("crude") mytdm <- as.matrix(termdocumentmatrix(crude)) freqmat <- data.frame(st = rownames(mytdm), freq = rowsums(mytdm), row.names = null) head(freqmat, 10) # st freq # 1 "(it) 1 # 2 "demand 1 # 3 "expansion 1 # 4 "for 1 # 5 "growth 1 # 6 "if 1 # 7 "is 2 # 8 "may 1 # 9 "none 2 # 10 "opec 2
Comments
Post a Comment