This week in Data Without Borders we covered scraping the Twitter Streaming API, cleaning up the data and converting JSON to .csv in Python, and how to do some neat tricks in R to glean some info from the data.

The most exciting part about this week’s assignment was learning how to compare two lists with each other, and how to remove the entries from one of them. The dataset is tweets containing “Libya.”

In this example, we have a list of words that appear in the Twitter users descriptions. We have to remove all the stop words (i.e. “the”, “and”, “I”, etc) so we get a better idea of how these Twitter users describe themselves. The gist from my assignment walks through the steps: