text mining - list of word frequencies using R -

March 15, 2013

i have been using tm package run text analysis. problem creating list words , frequencies associated same

library(tm) library(rweka)  txt <- read.csv("hw.csv",header=t)  df <- do.call("rbind", lapply(txt, as.data.frame)) names(df) <- "text"  mycorpus <- corpus(vectorsource(df$text)) mystopwords <- c(stopwords('english'),"originally", "posted") mycorpus <- tm_map(mycorpus, removewords, mystopwords)  #building tdm  btm <- function(x) ngramtokenizer(x, weka_control(min = 3, max = 3)) mytdm <- termdocumentmatrix(mycorpus, control = list(tokenize = btm))

i typically use following code generating list of words in frequency range

frq1 <- findfreqterms(mytdm, lowfreq=50)

is there way automate such dataframe words , frequency?

the other problem face converting term document matrix data frame. working on large samples of data, run memory errors. there simple solution this?

try this

data("crude") mytdm <- as.matrix(termdocumentmatrix(crude)) freqmat <- data.frame(st = rownames(mytdm),                        freq = rowsums(mytdm),                        row.names = null) head(freqmat, 10) #            st freq # 1       "(it)    1 # 2     "demand    1 # 3  "expansion    1 # 4        "for    1 # 5     "growth    1 # 6         "if    1 # 7         "is    2 # 8        "may    1 # 9       "none    2 # 10      "opec    2

Search This Blog

Copy

text mining - list of word frequencies using R -

Comments

Post a Comment

Popular posts from this blog

matlab - Deleting rows with specific rules -

asp.net - redirect .aspx with query string to html page using htaccess -

image - ClassNotFoundException when add a prebuilt apk into system.img in android -