
Problem: | National Parks are not fun when they’re packed with people |
Solution: | Collect and display monthly visitor statistics so that it’s possible to spot the ideal months to visit national parks. |
Results: |
|
Status: | |
Skills Used: | api; Excel |
Languages Used: | Visual Basic for Applications; R |
I am a big fan of national parks. One of the first things that got me interested in data science was my desire to understand visitor patterns at national parks I intended to visit so I could choose a time to go that had the minimal number of people but still decent weather to take photos. In looking for a single spot to find these data, I stumbled across the National Park Service’s visitor statistics portal and their API.
This was at the very beginning of my foray into data science, before I learned R. So I spent about 5 days (including two full days on a weekend) writing Visual Basic for Applications (VBA) scripts in Excel to download data from the API and sort the data in the format required for Google Public Data Explorer. Recently, I redid these scripts in R. It took me about 4 hours and it is now entirely reproducible. That is, I can run it every year and update the plots.
I created several different versions of these plots, including ones that show the full time series of data for each park dating back to January 1979 and ones that just show averages for the last 5 years. The example below shows the full time series narrowed down to the past three years and also showing predictions for the following year. Initially, I used Excel’s forecast function.
I have recently updated my Excel code to make use of R and my new programming knowledge. I now have an end-to-end solution that downloads the data, processes and reshapes it, makes predictions (now using Facebook’s prophet package for R), and even packages it into a zip file all in a single click so that all I have to do is upload it to the Public Data Explorer site.
If you’d like to check out different parks you can visit the main data explorer tool.
As I mentioned above, I was also interested in weather when I travel so I created another Google Public Dataset from climate data I downloaded from NOAA, this time for the 59 flagship national parks. Here is an example of average monthly low temperature for the locations from the previous graph. Useful for seeing how much I’m going to have to bundle up for sunrise photos.