I'm getting Twitter data with the twitteR
package for r but the tweets are coming with encoding problem. Does anyone know how to work around this problem?
library(twitteR)
library(stringr)
library(ROAuth)
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
setwd("XXXXXXXXX")
download.file(url="http://curl.haxx.se/ca/cacert.pem",destfile="cacert.pem")
cred <- OAuthFactory$new(consumerKey='XXXXXXXXXXXXXX',
consumerSecret='XXXXXXXXXXXX',
requestURL='https://api.twitter.com/oauth/request_token',
accessURL='https://api.twitter.com/oauth/access_token',
authURL='http://api.twitter.com/oauth/authorize')
cred$handshake(cainfo="cacert.pem")
registerTwitterOAuth(cred)
tweets = searchTwitter("#Copa2014", n=200, cainfo="cacert.pem")
Tweets.text = laply(tweets,function(t)t$getText())
The data is coming this way, with problems in accents and cedillas:
head(Tweets.text)
[1] "Não fui sorteado dessa vez, mas dia 12/03 começa uma nova fase de vendas... #copa2014"
[2] "RT @obsate: @RodP13 @gugakuerten A #Copa2014 virou a Geni; todo mundo bate nela. Agora a copa tem de resolver todos os problemas do BR. Pia…"
[3] "@RodP13 @gugakuerten A #Copa2014 virou a Geni; todo mundo bate nela. Agora a copa tem de resolver todos os problemas do BR. Piada!"
[4] "Nem pra saude! \"@mordomoeugenio: Bilhão de reais pra ensino público não tem né #copa2014 #JN\""
[5] "RT @soldadonofront: \"@fsouzajrJuca: Quanto mais eu leio sobre esses grupos que protestam contra a Copa, mais eu simpatizo com a #Copa2014.\""
[6] "\"@fsouzajrJuca: Quanto mais eu leio sobre esses grupos que protestam contra a Copa, mais eu simpatizo com a #Copa2014.\""
I'm using:
Rstudio 0.98.501
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32 / x64 (64-bit)
PS: The problem is apparently occurring in Windows 7. Following instructions from Luis Cipriani and running the code on Linux, there were no encoding problems. The question still remains to avoid problems in Windows,