It’s been about 8 months since I started working with the
USACE Portland District,
and while I’ve found myself doing a lot of Python (and Jython)
scripting I’ve still had opportunities to build some R packages.
One effort I took on was to rewrite the
dssrip
package, written by Evan Heisman and others to read
HEC-DSS
files into R. While dssrip
worked well, it required some manual
configuration to get set up and did not pass the various R checks
required to be published on CRAN. What started out as
a modest effort to clean up the source code to meet CRAN requirements
ballooned into a complete redesign of the package, including adding
DSS write support and automating the installation of the required Java
libraries. I’m pretty happy with the result, and it gave me the
opportunity to dive into the
HEC-DSS Java interface
(which has helped a lot with my Jython projects!).
Like the original dssrip
package, my rewrite package
dssrip2
uses
rJava
to access the HEC-DSS Java classes and methods. Dealing with the DSS
file handles is not trivial; you don’t want to be constantly closing
and re-opening DSS files as this can significantly slow down
operations, but you also don’t want to open the same file twice as can
cause DSS to go into “multi-user access” mode (which also slows down
operations). My original design was to make the user deal with opening
and closing DSS files explicitly. This worked fine but didn’t sit
well with me for a few reasons; it felt clunky to have to create a file
handle and pass that to the various read/write functions rather than
letting the user simply pass a filename, and I didn’t like that
the user was exposed to the underlying Java object reference. It also
wasn’t very safe, since a user could easily overwrite the file
reference by accident. What I really wanted was a way
to store the file handles internally in the package and retrieve them
when the user supplied a file path (and create a new handle when
necessary).
The simplest implementation of the “file store” would be to simply
have a named list of file handles: provide the name, return the
file handle itself. However, anyone who writes R packages will
eventually discover that
package namespaces are sealed on load.
This presents a conundrum, as you can’t simply create a placeholder
variable in the package on load as any attempts to modify that
variable later will result in an error message stating that
you “cannot add bindings to a locked environment”. In the past I’ve
used tricks like creating an environment in the package
.onLoad()
function
which can be accessed and modified, but I was intrigued by the way the
ggplot2
package implements its
last_plot()
function:
# copied from https://github.com/tidyverse/ggplot2/blob/04a5ef274e912bee76180154b25d8bca0206feb1/R/plot-last.R
.plot_store <- function() {
.last_plot <- NULL
list(
get = function() .last_plot,
set = function(value) .last_plot <<- value
)
}
.store <- .plot_store()
set_last_plot <- function(value) .store$set(value)
last_plot <- function() .store$get()
The plot_store()
function is essentially a wrapper around
a .last_plot
variable and functions for accessing or overwriting
its value. The package then creates a .store
object on load by
calling plot_store()
which provides access to the set()
and
get()
functions. Because the .last_plot
variable
is created by the call to plot_store()
, it does not get locked
in the same way that the package namespace does.
It took trivial modifications to turn this into a file handle store.
I also added functions to list file handles in the store and
“drop” file handles (call the DSS file handle’s close()
method and
remove the reference from the store).
.file_store <- function() {
.file_list <- list()
list(
get = function(filepath) {
.file_list[[filepath]]
},
set = function(filepath, ...) {
if (!(filepath %in% names(.file_list))) {
.file_list[[filepath]] <<- .jcall("hec/heclib/dss/HecDss",
"Lhec/heclib/dss/HecDss;", method = "open", filepath, ...)
}
},
drop = function(filepath) {
.file_list[[filepath]]$close()
.file_list[[filepath]] <<- NULL
},
list = function() {
names(.file_list)
}
)
}
.store <- .file_store()
The logic is pretty simple. The set()
function calls the Java logic
needed to create a file handle and stores the handle in .file_list
using the supplied filepath as the element name. If there is already
an entry in the store for a given file path, the set()
function does
nothing. The get()
function simply returns the file handle for the
supplied path. In the package code, I use a strict implementation of
normalizePath()
to ensure file paths are consistently formatted regardless of how the
supplied path is formatted or whether it is an absolute or relative
path. One hiccup I discovered is that
normalizePath(..., mustWork = FALSE)
does not always provide the same
letter case for the expanded directory, so my implementation
additionally calls normalizePath(..., mustWork = TRUE)
on the directory of the supplied path to ensure paths for
new files are constructed consistently with those for existing
files.
With this approach, users only ever need to provide file paths and don’t need to think about managing file handles explicitly. I still provide user-visible functions to close one or all of the DSS files in the store, but most of the time users won’t have to think about managing DSS files at all. This is an incredibly simple and flexible approach to create session stores in R packages, and I can already see other use cases for caching results in R packages.
Comments
Want to leave a comment? Visit this post's issue page on GitHub (you'll need a GitHub account).