I was recently asked to develop some reservoir forecasting tools for the San Francisco Public Utilities Commission, partly in response to the severe drought conditions facing California. I was asked to develop some simple tools to provide a workflow for pulling streamflow data, computing flow duration curves and fitting probability distributions to the data. The data products would be used for reservoir management optimization and risk analysis.
The first step is to actually get the data. In my case, my sources are streamflow data provided by the USGS waterData package and ensemble forecasts developed by the California Nevada River Forecast Center (CNRFC). I’m not going to go over downloading data using the waterData package as the documentation is fairly well developed; in particular, I recommend the vignette hosted on CRAN. The CNRFC data is accessible on the web, and the procedure is instructive for basic file management in R.
So, let’s download the data. CNRFC trace data is available as .zip files that can be accessed via the url http://www.cnrfc.noaa.gov/csv/xxxxxxxx12_CentralCoast_hefs_csv_daily.zip
where xxxxxxxx is a datestring of format YYYYMMDD. To download this file in R, we first have to create a placeholder file. Since I don’t want to worry about cleaning up after myself and explicitly deleting the files I create, I’ll use the built-in functions tempfile()
and tempdir()
to place the files in R’s default temporary directory, and then download today’s data:
The data has now been downloaded to a temporary file, and the full path is contained in tf
. You’ll notice I’ve also explicitly created a temporary directory, which I’ll use for extracting the data from the archive. The unzip()
function can be used to query the contents of a zip file and extract files to a specified location. In my case, the zip file will always contain a single file in CSV format.
A word of caution: if you are using an older version of R (i.e. older than v3.0.2), fname
will actually be returned as a factor (this is a bug) and you will need to convert it to a string using as.character()
before passing it to the second call to unzip()
.
Now I have extracted the file to the temporary directory, and I’m spoiled for choice when it comes to text reading functions in R. I’m going to use read.csv()
since I know my file is in CSV format. However, my data files have two header lines and this creates a problem: all the data in the dataframe produced by read.csv()
will be in string format because the second header line will be considered as data. I therefore need to do a little bit of extra formatting work. In addition, I need to make sure that I pass stringsAsFactors=FALSE
to read.csv()
; one of R’s quirks is that the numeric value of factor is not the same as that of a string, e.g. as.numeric(factor('6')) != as.numeric('6')
.
Now I have an quick way of getting CNRFC data into R, and the next time I close R the files in the temporary directory will automatically be cleaned up. Easy, right?
Comments
Want to leave a comment? Visit this post's issue page on GitHub (you'll need a GitHub account).