I have an algorithm that does a search for old Twitter tweets between two dates. My goal is to return all tweets. The code was like in the question of this post a> (may be a parameter) a few days ago and was working, but due to some changes to the Twitter page I started having problems like in the previous post itself.
In an attempt to solve the problem I made some changes, but another type of exception is now being launched. One curious thing I noticed is that every time I run the algorithm (with the same search parameters), a different number of tweets is being returned.
Today's code looks like this:
public static List<Tweet> getTweets(String username, String since, String until, String querySearch) {
List<Tweet> results = new ArrayList<Tweet>();
try {
String refreshCursor = null;
while (true) {
String response = getURLResponse(username, since, until, querySearch, refreshCursor);
System.out.println(response);
JSONObject json = new JSONObject(response);
refreshCursor = json.getString("min_position");
Document doc = Jsoup.parse((String) json.get("items_html"));
Elements tweets = doc.select("div.js-stream-tweet");
if (tweets.size() == 0) {
break;
}
for (Element tweet : tweets) {
String usernameTweet = tweet.select("span.username.js-action-profile-name b").text();
String txt = tweet.select("p.js-tweet-text").text().replaceAll("[^\u0000-\uFFFF]", "");
int retweets = Integer.valueOf(tweet.select("span.ProfileTweet-action--retweet span.ProfileTweet-actionCount").attr("data-tweet-stat-count").replaceAll(",", ""));
int favorites = Integer.valueOf(tweet.select("span.ProfileTweet-action--favorite span.ProfileTweet-actionCount").attr("data-tweet-stat-count").replaceAll(",", ""));
long dateMs = Long.valueOf(tweet.select("small.time span.js-short-timestamp").attr("data-time-ms"));
Date date = new Date(dateMs);
Tweet t = new Tweet(usernameTweet, txt, date, retweets, favorites);
results.add(t);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return results;
}
Method that requests the page:
private static String getURLResponse(String from, String since, String until, String querySearch, String scrollCursor) throws Exception {
String appendQuery = "";
if (from != null) {
appendQuery += "from:"+from;
}
if (since != null) {
appendQuery += " since:"+since;
}
if (until != null) {
appendQuery += " until:"+until;
}
if (querySearch != null) {
appendQuery += " "+querySearch;
}
String url = String.format("https://twitter.com/i/search/timeline?f=realtime&q=%s&src=typd&max_position=%s", URLEncoder.encode(appendQuery, "UTF-8"), scrollCursor);
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
con.setRequestMethod("GET");
BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
return response.toString();
}
Exception:
java.io.FileNotFoundException: https://twitter.com/i/search/timeline?f=realtime&q=+since%3A2014-10-08+until%3A2014-10-10+dilma&src=typd&max_position=TWEET-520341224046469121-520363066366496768-BD1UO2FFu9QAAAAAAAAETAAAAAcAAAASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
at Manager.TweetManager.getURLResponse(TweetManager.java:58)
at Manager.TweetManager.getTweets(TweetManager.java:121)
at Main.Main.main(Main.java:54)
JSON downloaded: JSON