Extracting data from news articles: Australian pollution by postcode
This article is originally published at https://nsaunders.wordpress.com
The recent ABC News article Australia’s pollution mapped by postcode reveals nation’s dirty truth is interesting. It contains a searchable table, which is useful if you want to look up your own suburb. However, I was left wanting more: specifically, the raw data and some nice maps.
So here’s how I got them, using R.
The full details are in this Github repository. There you’ll find the code to generate this report.
Essentially, the procedure goes like this:
- Use
rvest
to create a data frame from the data table in the online article - Clean and pre-process the data using
dplyr
- Join the pollution data with geospatial data derived from a shapefile of Australian postal areas
- Filter by postcode range for the city of interest
- And finally plot maps using
ggplot2
Rather than copying/pasting/formatting code here, I encourage you to look at the report.
Result: maps, like the one on the right. I sometimes think R makes this kind of thing almost too easy.
Thanks for visiting r-craft.org
This article is originally published at https://nsaunders.wordpress.com
Please visit source website for post related comments.