aRastoCAT

About
Installation
Demo Data
- NetCDF
- Shapefile
Spatial aggregation
- Aggregating a NetCDF
Temporal aggregation

About

aRastoCAT is a Raster Climate data Aggregation Tool to aggregate climate data provided in NetCDF or binary format. Climate date is often provided in a gridded form and NetCDF is a standard format to store such data. Hydrological model applications often require weather data in form of mean values for specific spatial units with specific time intervals. aRastoCAT provides automated routines to aggregate such gridded climate data by calculating weighted means for spatial units given in a polygon shape file and returning a table with timeseries of the analyzed climate variable for each polygon unit provided. To result in the required temporal resolution the time series can be aggregated temporally in a further step. Finally aRastoCAT provides functionality to generate weather input files from the aggregated climate data for the Soil and Water Assessment Tool (SWAT).

Installation

To install aRastoCAT the package devtools is required in order to request the package directly from the github repository.

install.packages(c("devtools")
devtools::install_github('chrisschuerz/aRastoCAT', dependencies = TRUE)

Demo Data

The package provides a demo NCDF and a demo ESRI shape file of a small catchment to show the required input data structure and to demonstrate the functionality. To access the demo files execute the following:

# Load required packages
library(aRastoCAT)
library(ncdf4)
library(rgdal)
library(tibble)

# Paths to the demo files
ncdf_pth <- system.file("extdata", "ncdf_demo.nc", package = "aRastoCAT")
basin_pth <- system.file("extdata", "basin_demo.shp", package = "aRastoCAT")

NetCDF

The minimum structure of the NCDF file requires:

an array for the climate variable with the dimensions lat, lon and time
matrices with the latitude and longitude values of the grid center points
a vector providing the time steps of the time dimension of the climate data

This is the structure of the demo data set:

nc_demo <- nc_open(filename = ncdf_pth)
nc_demo

## File C:/Program Files/R/userLIB/aRastoCAT/extdata/ncdf_demo.nc (NC_FORMAT_CLASSIC):
## 
##      4 variables (excluding dimension variables):
##         float pr[x,y,time]   
##             units: mm
##             _FillValue: -999
##             long_name: precipitation
##         double lon[x,y]   
##             units: degrees_east
##             _FillValue: -999
##             long_name: longitude coordinate
##         double lat[x,y]   
##             units: degrees_north
##             _FillValue: -999
##             long_name: latitude coordinate
##         double time_bnds[time]   
##             units: days
##             _FillValue: -999
##             long_name: time
## 
##      3 dimensions:
##         x  Size:16
##             units: degrees_east
##             long_name: x
##         y  Size:21
##             units: degrees_north
##             long_name: y
##         time  Size:365
##             units: days since 1949-12-01 00:00:00
##             long_name: time

Shapefile

The provided basin shape file is a polygon shape providing the boundaries of eight subunits. To load the demo shape file execute the following code:

basin_shp <- readOGR(basin_pth, layer = "basin_demo")

## OGR data source with driver: ESRI Shapefile 
## Source: "C:/Program Files/R/userLIB/aRastoCAT/extdata/basin_demo.shp", layer: "basin_demo"
## with 8 features
## It has 20 fields

The attribute table of the shape file is structured as follows:

as_tibble(basin_shp@data[,1:5])

## # A tibble: 8 × 5
##   OBJECTID GRIDCODE Subbasin    Area     Slo1
## *    <int>    <int>    <int>   <dbl>    <dbl>
## 1        2        2        2 1509.34 33.64209
## 2        3        3        3 2033.08 40.13984
## 3        4        4        4 1440.24 25.34164
## 4        5        5        5 1371.40 37.46342
## 5        6        6        6 2471.59 28.25830
## 6        7        7        7 1670.92 28.38715
## 7        8        8        8 1451.72 27.62589
## 8       10       10       10 2103.23 39.30143

Spatial aggregation

Independend of the file format, the workflow of the spatial aggregation is the same. The raster data is basically a 3 dimensional array with latitude, longitude, and time as its coordinates. With the spatial aggregation weighted mean values of the gridded climate variable are calculated, where the weights are given by the area of each pixel that is included in the area of a respective polygon subunit. The aggregation results in a tibble providing the date in form of integer values for year, month, day, hour, and second and mean values for each subunit and time step.

Aggregating a NetCDF

To aggregate the NCDF data spatially for the basin subunits use the function aggregate_ncdf(). The following minimum example demonstrates the application and shows the resulting output.

crs_nc <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

pr_demo <- aggregate_ncdf(ncdf_pth = ncdf_pth,
                          basin_shp = basin_shp,
                          ncdf_crs = crs_nc,
                          shp_index = "Subbasin",
                          var_lbl = "pr")

As the reference system is not provided by the NetCDF file the user has to provide it as a character input variable. In this example the WGS84 is the reference system (most likely the standard for ncdf). The attribute table of the shape file can provide multiple features for each polygon. Therefore, the user has to define the column that provides the individual subunit index. In this example tha name of the column is ‘Subbasin’.

The function output is a tibble with the aggregated time series for each basin subunit.

as_tibble(pr_demo)

## # A tibble: 365 × 13
##     year   mon   day  hour   min Subbasin_1 Subbasin_2 Subbasin_3
##    <dbl> <dbl> <int> <int> <int>      <dbl>      <dbl>      <dbl>
## 1   1971     1     1     0     0  3.0777803  3.1834916  3.0351961
## 2   1971     1     2     0     0  0.6410431  0.7616843  0.6009436
## 3   1971     1     3     0     0  8.0503434  8.4411669  7.9988060
## 4   1971     1     4     0     0  2.3377722  2.2709139  2.3779599
## 5   1971     1     5     0     0 13.8275426 13.3580502 14.3865211
## 6   1971     1     6     0     0 23.8227509 24.0437188 24.2452624
## 7   1971     1     7     0     0 11.9045988 12.1182448 12.0813359
## 8   1971     1     8     0     0  0.1375243  0.1543433  0.1800734
## 9   1971     1     9     0     0  3.4654535  3.3840917  3.6226699
## 10  1971     1    10     0     0  0.0000000  0.0000000  0.0000000
## # ... with 355 more rows, and 5 more variables: Subbasin_4 <dbl>,
## #   Subbasin_5 <dbl>, Subbasin_6 <dbl>, Subbasin_7 <dbl>, Subbasin_8 <dbl>

Temporal aggregation

The temporal resolution of aggregate_ncdf() depends on the time steps provided by the NetCDF. Often a specific time interval is required for the weather input of a model, or the model should be operated with specific time steps. To aggregate the tibble above temporally, simply use the function aggregate_time() and define the required time interval with the variable time_int. Depending on the type of variable a different temporal aggregation is needed (e.g. accumulated precipitation, or mean/min/max daily temperature). Therefore, the user has to define the aggregation function with the variable aggr_fun. Further arguments for the function are given after aggr_fun (e.g. na.rm = TRUE to exclude NA values from the calculation).

pr_mon <- aggregate_time(pr_demo, time_int = "mon", drop_col = FALSE, aggr_fun = sum, na.rm = TRUE)

pr_mon

## # A tibble: 12 × 13
##     year   mon   day  hour   min Subbasin_1 Subbasin_2 Subbasin_3
##    <dbl> <dbl> <dbl> <dbl> <dbl>      <dbl>      <dbl>      <dbl>
## 1   1971     1     0     0     0  140.79863  143.71394  139.65610
## 2   1971     2     0     0     0   37.70157   38.28138   37.71856
## 3   1971     3     0     0     0   80.20148   82.22621   78.74637
## 4   1971     4     0     0     0   17.81484   17.37269   17.87067
## 5   1971     5     0     0     0   15.98037   18.71793   13.68930
## 6   1971     6     0     0     0  160.02498  159.12320  160.96138
## 7   1971     7     0     0     0  115.80406  115.49595  117.90692
## 8   1971     8     0     0     0   29.49210   27.35580   32.45428
## 9   1971     9     0     0     0   40.26003   40.78696   41.86554
## 10  1971    10     0     0     0   58.35671   57.56754   60.39207
## 11  1971    11     0     0     0   59.99861   59.95148   59.62219
## 12  1971    12     0     0     0   14.23836   15.32084   13.40524
## # ... with 5 more variables: Subbasin_4 <dbl>, Subbasin_5 <dbl>,
## #   Subbasin_6 <dbl>, Subbasin_7 <dbl>, Subbasin_8 <dbl>

pr_max <- aggregate_time(pr_demo, time_int = "year", drop_col = TRUE, aggr_fun = max, na.rm = TRUE)

pr_max

## # A tibble: 1 × 9
##    year Subbasin_1 Subbasin_2 Subbasin_3 Subbasin_4 Subbasin_5 Subbasin_6
##   <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1  1971   33.70678   40.39706   32.21597   37.55593   38.16956   34.48719
## # ... with 2 more variables: Subbasin_7 <dbl>, Subbasin_8 <dbl>

Christoph Schürz