Diurnal, day of the week and monthly variation
trend_variation.Rd
Plots the diurnal, day of the week and monthly variation for different variables, typically pollutant concentrations. Four separate plots are produced.
Usage
trend_variation(
mydata,
pollutant = "nox",
local_tz = NULL,
normalise = FALSE,
type = "default",
group = NULL,
difference = FALSE,
statistic = "mean",
conf_int = 0.95,
b = 100,
ci = TRUE,
alpha = 0.3,
return = "ensemble"
)
Arguments
- mydata
A data frame of hourly (or higher temporal resolution data). Must include a
date
field and at least one variable to plot.- pollutant
Name of variable to plot. Two or more pollutants can be plotted, in which case a form like
pollutant = c("nox", "co")
should be used.- local_tz
Should the results be calculated in local time that includes a treatment of daylight savings time (DST)? The default is not to consider DST issues, provided the data were imported without a DST offset. Emissions activity tends to occur at local time e.g. rush hour is at 8 am every day. When the clocks go forward in spring, the emissions are effectively released into the atmosphere typically 1 hour earlier during the summertime i.e. when DST applies. When plotting diurnal profiles, this has the effect of “smearing-out” the concentrations. Sometimes, a useful approach is to express time as local time. This correction tends to produce better-defined diurnal profiles of concentration (or other variables) and allows a better comparison to be made with emissions/activity data. If set to
FALSE
then GMT is used. Examples of usage includelocal.tz = "Europe/London"
,local.tz = "America/New_York"
. SeecutData
andimport
for more details.- normalise
Should variables be normalised? The default is
FALSE
. IfTRUE
then the variable(s) are divided by their mean values. This helps to compare the shape of the diurnal trends for variables on very different scales.- type
type
determines how the data are split i.e. conditioned, and then plotted. The default is will produce a single plot using the entire data. Type can be one of the built-in types as detailed incutData
e.g. “season”, “year”, “weekday” and so on. For example,type = "season"
will produce four plots --- one for each season.It is also possible to choose
type
as another variable in the data frame. If that variable is numeric, then the data will be split into four quantiles (if possible) and labelled accordingly. If type is an existing character or factor variable, then those categories/levels will be used directly. This offers great flexibility for understanding the variation of different variables and how they depend on one another.Only one
type
is allowed intimeVariation
.- group
This sets the grouping variable to be used. For example, if a data frame had a column
site
settinggroup = "site"
will plot all sites together in each panel. See examples below.- difference
If two pollutants are chosen then setting
difference = TRUE
will also plot the difference in means between the two variables aspollutant[2] - pollutant[1]
. Bootstrap 95\ the difference in means are also calculated. A horizontal dashed line is shown at y = 0. The difference can also be calculated if there is a column that identifies two groups e.g. having usedsplitByDate
. In this case it is possible to calltimeVariation
with the optiongroup = "split.by"
anddifference = TRUE
.- statistic
Can be “mean” (default) or “median”. If the statistic is ‘mean’ then the mean line and the 95\ interval in the mean are plotted by default. If the statistic is ‘median’ then the median line is plotted together with the 5/95 and 25/75th quantiles are plotted. Users can control the confidence intervals with
conf.int
.- conf_int
The confidence intervals to be plotted. If
statistic = "mean"
then the confidence intervals in the mean are plotted. Ifstatistic = "median"
then theconf.int
and1 - conf.int
quantiles are plotted.conf.int
can be of length 2, which is most useful for showing quantiles. For exampleconf.int = c(0.75, 0.99)
will yield a plot showing the median, 25/75 and 5/95th quantiles.- b
Number of bootstrap replicates to use. Can be useful to reduce this value when there are a large number of observations available to increase the speed of the calculations without affecting the 95\ interval calculations by much.
- ci
Should confidence intervals be shown? The default is
TRUE
. Setting this toFALSE
can be useful if multiple pollutants are chosen where over-lapping confidence intervals can over complicate plots.- alpha
The alpha transparency used for plotting confidence intervals. 0 is fully transparent and 1 is opaque. The default is 0.4
- return
What should the function return? One of: * "ensemble" --- all four time variation panels assembled as a patchwork object (default). * "day_hour", "day", "hour", "month" --- a single time variation panel. * "list" --- a list of the four time variation panels, which may be useful if users wish to assemble them in a different way or with other plots entirely. * "data" --- the raw data used to create the time variation panels.
Details
The variation of pollutant concentrations by hour of the day and day of the week, etc., can reveal many interesting features that relate to source types and meteorology. For traffic sources, there are often important differences in the way vehicles vary by vehicles type, e.g., less heavy vehicles at weekends.
The plots also show the 95\ confidence intervals in the mean are calculated through bootstrap simulations, which will provide more robust estimates of the confidence intervals (particularly when there are relatively few data).
The function can handle multiple pollutants and uses the flexible type
option to provide separate panels for each 'type' --- see cutData
for
more details. It can also accept a group
option which is useful if
data are stacked. This will work in a similar way to having multiple
pollutants in separate columns.
The option difference
will calculate the difference in means of two
pollutants together with bootstrap estimates of the 95\
in the difference in the mean. This works in two ways: either two pollutants
are supplied in separate columns, e.g., pollutant = c("no2", "o3")
or there
are two unique values of group
. The difference is calculated as the
second pollutant minus the first and is labelled as such. Considering
differences in this way can provide many useful insights and is particularly
useful for model evaluation when information is needed about where a model
differs from observations by many different time scales. The manual contains
various examples of using difference = TRUE
.
Note also that the timeVariation
function works well on a subset of
data and in conjunction with other plots. For example, a
polarPlot
may highlight an interesting feature for a particular
wind speed/direction range. By filtering for those conditions
timeVariation
can help determine whether the temporal variation of
that feature differs from other features --- and help with source
identification.
In addition, timeVariation
will work well with other variables if
available. Examples include meteorological and traffic flow data.
Depending on the choice of statistic, a subheading is added. Users can
control the text in the subheading through the use of sub
e.g.
sub = ""
will remove any subheading.
See also
Other time series and trend functions:
trend_calendar()
,
trend_level()
,
trend_prop()