Title: | Density Surface Modelling of Distance Sampling Data |
---|---|
Description: | Density surface modelling of line transect data. A Generalized Additive Model-based approach is used to calculate spatially-explicit estimates of animal abundance from distance sampling (also presence/absence and strip transect) data. Several utility functions are provided for model checking, plotting and variance estimation. |
Authors: | David L. Miller, Eric Rexstad, Louise Burt, Mark V. Bravington, Sharon Hedley, Megan Ferguson, Natalie Kelly. |
Maintainer: | Laura Marshall <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.3.3.9001 |
Built: | 2024-11-20 04:35:32 UTC |
Source: | https://github.com/distanceDevelopment/dsm |
dsm
implements spatial models for distance sampling data. Models for
detectability can be fitted using packages mrds
or Distance
. dsm
fits
generalized additive models to spatially-referenced data. See Miller et al
(2013) for an introduction.
Further information on distance sampling methods and example code is available at http://distancesampling.org/R/.
For help with distance sampling and this package, there is a Google Group https://groups.google.com/forum/#!forum/distance-sampling.
A example analyses are available at http://examples.distancesampling.org.
Hedley, S. and S. T. Buckland. 2004. Spatial models for line transect sampling. JABES 9:181-199.
Miller, D. L., Burt, M. L., Rexstad, E. A., Thomas, L. (2013), Spatial models for distance sampling data: recent developments and future directions. Methods in Ecology and Evolution, 4: 1001-1010. doi: 10.1111/2041-210X.12105 (Open Access)
Wood, S.N. 2006. Generalized Additive Models: An Introduction with R. CRC/Chapman & Hall.
Takes the transect data and works out how many blocks of a given size (in segment terms) fit into each.
block.info.per.su(block.size, data, name.su)
block.info.per.su(block.size, data, name.su)
block.size |
number of segments per block |
data |
data used to build the model |
name.su |
names of the sampling units (i.e., transects) |
a data.frame
with the following columns
name
the sample unit name (e.g. transect label)
num.seg
number of segments in that transect
num.block
number of blocks available
start.block
block number for first block
end.block
block number for last block
num.req
number of blocks needed for the unit
Internal function to check that supplied data.frames
have the correct
columns and checks that sample labels are all unique.
check.cols(ddf.obj, segment.data, observation.data, segment.area)
check.cols(ddf.obj, segment.data, observation.data, segment.area)
ddf.obj |
a |
segment.data |
segment data as defined in |
observation.data |
observation data as defined in |
segment.area |
area of segments |
nothing, but throws an error if something went wrong
David Lawrence Miller
Fits a density surface model (DSM) to detection adjusted counts from a
spatially-referenced distance sampling analysis. dsm
takes observations of
animals, allocates them to segments of line (or strip transects) and
optionally adjusts the counts based on detectability using a supplied
detection function model. A generalized additive model, generalized mixed
model or generalized linear model is then used to model these adjusted
counts based on a formula involving environmental covariates.
dsm( formula, ddf.obj, segment.data, observation.data, engine = "gam", convert.units = 1, family = quasipoisson(link = "log"), group = FALSE, control = list(keepData = TRUE), availability = 1, segment.area = NULL, weights = NULL, method = "REML", ... )
dsm( formula, ddf.obj, segment.data, observation.data, engine = "gam", convert.units = 1, family = quasipoisson(link = "log"), group = FALSE, control = list(keepData = TRUE), availability = 1, segment.area = NULL, weights = NULL, method = "REML", ... )
formula |
formula for the surface. This should be a valid formula. See "Details", below, for how to define the response. |
ddf.obj |
result from call to |
segment.data |
segment data, see |
observation.data |
observation data, see |
engine |
which fitting engine should be used for the DSM
( |
convert.units |
conversion factor to multiply the area of the segments by. See 'Units' below. |
family |
response distribution (popular choices include
|
group |
if |
control |
the usual |
availability |
an estimate of availability bias. For count models used
to multiply the effective strip width (must be a vector of length 1 or
length the number of rows in |
segment.area |
if |
weights |
weights for each observation used in model fitting. The
default, |
method |
The smoothing parameter estimation method. Default is
|
... |
anything else to be passed straight to |
The response (LHS of formula
) can be one of the following (with
restrictions outlined below):
count
count in each segment
abundance.est
estimated abundance per segment, estimation is via a
Horvitz-Thompson estimator
density.est
density per segment
The offset used in the model is dependent on the response:
count
area of segment multiplied by average probability of detection
in the segment
abundance.est
area of the segment
density
zero
The count
response can only be used when detection function covariates
only vary between segments/points (not within). For example, weather
conditions (like visibility or sea state) or foliage cover are usually
acceptable as they do not change within the segment, but animal sex or
behaviour will not work. The abundance.est
response can be used with any
covariates in the detection function.
In the density case, observations can be weighted by segment areas via the
weights=
argument. By default (weights=NULL
), when density is estimated
the weights are set to the segment areas (using segment.area
or by
calculated from detection function object metadata and Effort
data).
Alternatively weights=1
will set the weights to all be equal. A third
alternative is to pass in a vector of length equal to the number of
segments, containing appropriate weights.
A example analyses are available at http://examples.distancesampling.org.
a glm
, gam
, gamm
or
bam
object, with an additional element, $ddf
which holds the
detection function object.
It is often the case that distances are collected in metres and segment
lengths are recorded in kilometres. dsm
allows you to provide a conversion
factor (convert.units
) to multiply the areas by. For example: if distances
are in metres and segment lengths are in kilometres setting
convert.units=1000
will lead to the analysis being in metres. Setting
convert.units=1/1000
will lead to the analysis being in kilometres. The
conversion factor will be applied to segment.area
if that is specified.
For large models, engine="bam"
with method="fREML"
may be useful. Models
specified for bam
should be as gam
. Read bam
before using
this option; this option is considered EXPERIMENTAL at the moment. In
particular note that the default basis choice (thin plate regression
splines) will be slow and that in general fitting is less stable than when
using gam
. For negative binomial response, theta must be
specified when using bam
.
David L. Miller
Hedley, S. and S. T. Buckland. 2004. Spatial models for line transect sampling. JABES 9:181-199.
Miller, D. L., Burt, M. L., Rexstad, E. A., Thomas, L. (2013), Spatial models for distance sampling data: recent developments and future directions. Methods in Ecology and Evolution, 4: 1001-1010. doi: 10.1111/2041-210X.12105 (Open Access)
Wood, S.N. 2006. Generalized Additive Models: An Introduction with R. CRC/Chapman & Hall.
## Not run: library(Distance) library(dsm) # load the Gulf of Mexico dolphin data (see ?mexdolphins) data(mexdolphins) # fit a detection function and look at the summary hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) summary(hr.model) # fit a simple smooth of x and y to counts mod1 <- dsm(count~s(x,y), hr.model, segdata, obsdata) summary(mod1) # predict over a grid mod1.pred <- predict(mod1, preddata, preddata$area) # calculate the predicted abundance over the grid sum(mod1.pred) # plot the smooth plot(mod1) ## End(Not run)
## Not run: library(Distance) library(dsm) # load the Gulf of Mexico dolphin data (see ?mexdolphins) data(mexdolphins) # fit a detection function and look at the summary hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) summary(hr.model) # fit a simple smooth of x and y to counts mod1 <- dsm(count~s(x,y), hr.model, segdata, obsdata) summary(mod1) # predict over a grid mod1.pred <- predict(mod1, preddata, preddata$area) # calculate the predicted abundance over the grid sum(mod1.pred) # plot the smooth plot(mod1) ## End(Not run)
Once a DSM has been fitted to data, this function can be used to check for autocorrelation in the residuals.
dsm_cor( dsm.obj, Transect.Label = "Transect.Label", Segment.Label = "Segment.Label", max.lag = 10, resid.type = "scaled.pearson", fun = cor, ylim = c(0, 1), subset = "all", ... )
dsm_cor( dsm.obj, Transect.Label = "Transect.Label", Segment.Label = "Segment.Label", max.lag = 10, resid.type = "scaled.pearson", fun = cor, ylim = c(0, 1), subset = "all", ... )
dsm.obj |
a fitted dsm object. |
Transect.Label |
label for the transect (default: |
Segment.Label |
label for the segments (default: |
max.lag |
maximum lag to calculate at. |
resid.type |
the type of residuals used, see
|
fun |
the function to use, by default |
ylim |
user defined limits in y direction. |
subset |
which subset of the data should the correlation function be calculated on? |
... |
other options to pass to |
a plot or a vector of fun
applied at the lags.
Within each Transect.Label
, segments will be sorted
according to their Segment.Label
s. This may require some time to get right
for your particular data. If one has multiple surveys where transects are
revisited, for example, one may want to make Transect.Label
a unique
transect-survey identifier. Neither label need to be included in the model,
they must just be present in the $data
field in the model. This usually
means that they have to be in the segment data passed to dsm
.
The current iteration of this function will only plot correlations nicely, other things are up to you but you can get the function to return the data (by assigning the result to an object).
If there are NA values in the residuals then the correlogram will not be
calculated. This usually occurs due to NA
values in the covariates (so the
smoother will not have fitted values there). Code like
any(is.na(dsm.obj$data))
might be helpful.
David L. Miller
library(Distance) library(dsm) # load the data, see ?mexdolphins data(mexdolphins) # fit a model hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) mod1 <- dsm(count~s(x,y), hr.model, segdata, obsdata) # look at lag 1 differences up to a maximum of lag 9, using deviance # residuals dsm_cor(mod1, resid.type="deviance", max.lag=9, Segment.Label="Sample.Label")
library(Distance) library(dsm) # load the data, see ?mexdolphins data(mexdolphins) # fit a model hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) mod1 <- dsm(count~s(x,y), hr.model, segdata, obsdata) # look at lag 1 differences up to a maximum of lag 9, using deviance # residuals dsm_cor(mod1, resid.type="deviance", max.lag=9, Segment.Label="Sample.Label")
If one is willing to assume the the detection function and spatial model are independent, this function will produce estimates of variance of predictions of abundance, using the result that squared coefficients of variation will add.
dsm_var_gam( dsm.obj, pred.data, off.set, seglen.varname = "Effort", type.pred = "response" )
dsm_var_gam( dsm.obj, pred.data, off.set, seglen.varname = "Effort", type.pred = "response" )
dsm.obj |
a model object fitted by |
pred.data |
either: a single prediction grid or list of prediction
grids. Each grid should be a |
off.set |
a a vector or list of vectors with as many elements as there
are in |
seglen.varname |
name for the column which holds the segment length
(default value |
type.pred |
should the predictions be on the "response" or "link"
scale? (default |
a list
with elements
model
the fitted model object
pred.var
variance of the regions given in pred.data
.
bootstrap
logical, always FALSE
model
the fitted model with the extra term
dsm.object
the original model (dsm.obj
above)
David L. Miller
## Not run: library(Distance) library(dsm) # load the Gulf of Mexico dolphin data (see ?mexdolphins) data(mexdolphins) # fit a detection function and look at the summary hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) summary(hr.model) # fit a simple smooth of x and y mod1 <- dsm(count~s(x, y), hr.model, segdata, obsdata) # Calculate the variance # this will give a summary over the whole area in mexdolphins$preddata mod1.var <- dsm_var_gam(mod1, preddata, off.set=preddata$area) ## End(Not run)
## Not run: library(Distance) library(dsm) # load the Gulf of Mexico dolphin data (see ?mexdolphins) data(mexdolphins) # fit a detection function and look at the summary hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) summary(hr.model) # fit a simple smooth of x and y mod1 <- dsm(count~s(x, y), hr.model, segdata, obsdata) # Calculate the variance # this will give a summary over the whole area in mexdolphins$preddata mod1.var <- dsm_var_gam(mod1, preddata, off.set=preddata$area) ## End(Not run)
Estimate the variance in abundance over an area using a moving block bootstrap. Two procedures are implemented, one incorporating detection function uncertainty, one not.
dsm_var_movblk( dsm.object, pred.data, n.boot, block.size, off.set, ds.uncertainty = FALSE, samp.unit.name = "Transect.Label", progress.file = NULL, bs.file = NULL, bar = TRUE )
dsm_var_movblk( dsm.object, pred.data, n.boot, block.size, off.set, ds.uncertainty = FALSE, samp.unit.name = "Transect.Label", progress.file = NULL, bs.file = NULL, bar = TRUE )
dsm.object |
object returned from |
pred.data |
either: a single prediction grid or list of prediction
grids. Each grid should be a |
n.boot |
number of bootstrap resamples. |
block.size |
number of segments in each block. |
off.set |
a a vector or list of vectors with as many elements as there
are in |
ds.uncertainty |
incorporate uncertainty in the detection function? See Details, below. Note that this feature is EXPERIMENTAL at the moment. |
samp.unit.name |
name sampling unit to resample (default 'Transect.Label'). |
progress.file |
path to a file to be used (usually by Distance) to
generate a progress bar (default |
bs.file |
path to a file to store each bootstrap round. This stores all
of the bootstrap results rather than just the summaries, enabling
outliers to be detected and removed. (Default |
bar |
should a progress bar be printed to screen? (Default |
Setting ds.uncertainty=TRUE
will incorporate detection function
uncertainty directly into the bootstrap. This is done by generating
observations from the fitted detection function and then re-fitting a new
detection function (of the same form), then calculating a new effective
strip width. Rejection sampling is used to generate the observations
(except in the half-normal case) so the procedure can be rather slow. Note
that this is currently not supported with covariates in the detection
function.
Setting ds.uncertainty=FALSE
will incorporate detection function
uncertainty using the delta method. This assumes that the detection
function and the spatial model are INDEPENDENT. This is probably not
reasonable.
## Not run: library(Distance) library(dsm) # load the Gulf of Mexico dolphin data (see ?mexdolphins) data(mexdolphins) # fit a detection function and look at the summary hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) summary(hr.model) # fit a simple smooth of x and y mod1 <- dsm(count~s(x, y), hr.model, segdata, obsdata) summary(mod1) # calculate the variance by 500 moving block bootstraps mod1.movblk <- dsm_var_movblk(mod1, preddata, n.boot = 500, block.size = 3, samp.unit.name = "Transect.Label", off.set = preddata$area, bar = TRUE, bs.file = "mexico-bs.csv", ds.uncertainty = TRUE) ## End(Not run)
## Not run: library(Distance) library(dsm) # load the Gulf of Mexico dolphin data (see ?mexdolphins) data(mexdolphins) # fit a detection function and look at the summary hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) summary(hr.model) # fit a simple smooth of x and y mod1 <- dsm(count~s(x, y), hr.model, segdata, obsdata) summary(mod1) # calculate the variance by 500 moving block bootstraps mod1.movblk <- dsm_var_movblk(mod1, preddata, n.boot = 500, block.size = 3, samp.unit.name = "Transect.Label", off.set = preddata$area, bar = TRUE, bs.file = "mexico-bs.csv", ds.uncertainty = TRUE) ## End(Not run)
To ensure that uncertainty from the detection function is correctly propagated to the final variance estimate of abundance, this function uses a method first detailed in Williams et al (2011), further explanation is given in Bravington et al. (2021).
dsm_var_prop( dsm.obj, pred.data, off.set, seglen.varname = "Effort", type.pred = "response" )
dsm_var_prop( dsm.obj, pred.data, off.set, seglen.varname = "Effort", type.pred = "response" )
dsm.obj |
a model object fitted by |
pred.data |
either: a single prediction grid or list of prediction
grids. Each grid should be a |
off.set |
a a vector or list of vectors with as many elements as there
are in |
seglen.varname |
name for the column which holds the segment length
(default value |
type.pred |
should the predictions be on the "response" or "link"
scale? (default |
The idea is to refit the spatial model but including an extra random effect. This random effect has zero mean and hence to effect on point estimates. Its variance is the Hessian of the detection function. Variance estimates then incorporate detection function uncertainty. Further mathematical details are given in the paper in the references below.
Many prediction grids can be supplied by supplying a list of data.frame
s
to the function.
Note that this routine simply calls dsm_varprop
. If you
don't require multiple prediction grids, the other routine will probably be
faster.
This routine is only useful if a detection function with covariates has been used in the DSM.
a list
with elements
model
the fitted model object
pred.var
variance of each region given in pred.data
bootstrap
logical, always FALSE
pred.data
as above
off.set
as above
model
the fitted model with the extra term
dsm.object
the original model, as above
model.check
simple check of subtracting the coefficients of the two
models to see if there is a large difference
deriv
numerically calculated Hessian of the offset
The summary output from the function includes a simply diagnostic that shows
the average probability of detection from the "original" fitted model (the
model supplied to this function; column Fitted.model
) and the probability
of detection from the refitted model (used for variance propagation; column
Refitted.model
) along with the standard error of the probability of
detection from the fitted model (Fitted.model.se
), at the unique values of
any factor covariates used in the detection function (for continuous
covariates the 5%, 50% and 95% quantiles are shown). If there are large
differences between the probabilities of detection then there are
potentially problems with the fitted model, the variance propagation or
both. This can be because the fitted model does not account for enough of
the variability in the data and in refitting the variance model accounts for
this in the random effect.
Note that this routine is only useful if a detection function has been used
in the DSM. It cannot be used when the abundance.est
or density.est
responses are used. Importantly this requires that if the detection function
has covariates, then these do not vary within a segment (so, for example
covariates like sex cannot be used).
Mark V. Bravington, Sharon L. Hedley. Bugs added by David L. Miller.
Bravington, M. V., Miller, D. L., & Hedley, S. L. (2021). Variance Propagation for Density Surface Models. Journal of Agricultural, Biological and Environmental Statistics. https://doi.org/10.1007/s13253-021-00438-2
Williams, R., Hedley, S.L., Branch, T.A., Bravington, M.V., Zerbini, A.N. and Findlay, K.P. (2011). Chilean Blue Whales as a Case Study to Illustrate Methods to Estimate Abundance and Evaluate Conservation Status of Rare Species. Conservation Biology 25(3), 526-535.
Calculate the uncertainty in predictions from a fitted DSM, including uncertainty from the detection function.
dsm_varprop( model, newdata = NULL, trace = FALSE, var.type = "Vp", var_type = NULL )
dsm_varprop( model, newdata = NULL, trace = FALSE, var.type = "Vp", var_type = NULL )
model |
a fitted |
newdata |
the prediction grid. Set to |
trace |
for debugging, see how the scale parameter estimation is going. |
var.type |
which variance-covariance matrix should be used ( |
var_type |
deprecated, use |
When we make predictions from a spatial model, we also want to know the uncertainty about that abundance estimate. Since density surface models are 2 (or more) stage models, we need to incorporate the uncertainty from the earlier stages (i.e. the detection function) into our "final" uncertainty estimate.
This function will refit the spatial model but include the Hessian of the offset as an extra term. Variance estimates using this new model can then be used to calculate the variance of predicted abundance estimates which incorporate detection function uncertainty. Importantly this requires that if the detection function has covariates, then these do not vary within a segment (so, for example covariates like sex cannot be used).
For more information on how to construct the prediction grid data.frame
,
newdata
, see predict.dsm
.
This routine is only useful if a detection function with covariates has been used in the DSM.
Note that we can use var.type="Vc"
here (see gamObject
), which is the
variance-covariance matrix for the spatial model, corrected for smoothing
parameter uncertainty. See Wood, Pya & S\"afken (2016) for more
information.
Models with fixed scale parameters (e.g., negative binomial) do not require an extra round of optimisation.
a list
with elements:
old_model
fitted model supplied to the function as model
refit
refitted model object, with extra term
pred
point estimates of predictions at newdata
var
total variance calculated over all of newdata
ses
standard error for each prediction cell in newdata
if newdata=NULL
then the last three entries are NA
.
The summary output from the function includes a simply diagnostic that shows
the average probability of detection from the "original" fitted model (the
model supplied to this function; column Fitted.model
) and the probability
of detection from the refitted model (used for variance propagation; column
Refitted.model
) along with the standard error of the probability of
detection from the fitted model (Fitted.model.se
), at the unique values of
any factor covariates used in the detection function (for continuous
covariates the 5%, 50% and 95% quantiles are shown). If there are large
differences between the probabilities of detection then there are
potentially problems with the fitted model, the variance propagation or
both. This can be because the fitted model does not account for enough of
the variability in the data and in refitting the variance model accounts for
this in the random effect.
David L. Miller, based on code from Mark V. Bravington and Sharon L. Hedley.
Bravington, M. V., Miller, D. L., & Hedley, S. L. (2021). Variance Propagation for Density Surface Models. Journal of Agricultural, Biological and Environmental Statistics. https://doi.org/10.1007/s13253-021-00438-2
Williams, R., Hedley, S.L., Branch, T.A., Bravington, M.V., Zerbini, A.N. and Findlay, K.P. (2011). Chilean Blue Whales as a Case Study to Illustrate Methods to Estimate Abundance and Evaluate Conservation Status of Rare Species. Conservation Biology 25(3), 526-535.
Wood, S.N., Pya, N. and S\"afken, B. (2016) Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association, 1-45.
Two data.frame
s must be provided to dsm
. They are referred to as
observation.data
and segment.data
.
The segment.data
table has the sample identifiers which define the
segments, the corresponding effort (line length) expended and the
environmental covariates that will be used to model abundance/density.
observation.data
provides a link table between the observations used in
the detection function and the samples (segments), so that we can aggregate
the observations to the segments (i.e., observation.data
is a "look-up
table" between the observations and the segments).
observation.data
- the observation data.frame
must have (at least) the
following columns:
object
unique object identifier
Sample.Label
the identifier for the segment where observation occurred
size
the size of each observed group (e.g., 1 if all animals occurred
individually)
distance
distance to observation
One can often also use observation.data
to fit a detection function (so
additional columns for detection function covariates are allowed in this
table).
segment.data
: the segment data.frame
must have (at least) the following
columns:
Effort
the effort (in terms of length of the segment)
Sample.Label
identifier for the segment (unique!)
??? environmental covariates, for example location (projected latitude and longitude), and other relevant covariates (sea surface temperature, foliage type, altitude, bathymetry etc).
If multiple detection functions are to be used, then a column named ddfobj
must be included in observation.data
and segment.data
. This lets the
model know which detection function each observation is from. These are
numeric and ordered as the ddf.obj
argument to dsm
, e.g.,
ddf.obj=list(ship_ddf, aerial_ddf)
means ship detections have ddfobj=1
and aerial detections have ddfobj=2
in the observation data.
When using mrds
models that include mark-recapture components (currently
independent observer and trial modes are supported) then the format of the
observation data needs to be checked to ensure that observations are not
duplicated. The observer
column is also required in the
observation.data
.
Independent observer mode only unique observations (unique object IDs) are required.
Trial mode only observations made by observer 1 are required.
This function is deprecated, use dsm_cor.
dsm.cor( dsm.obj, Transect.Label = "Transect.Label", Segment.Label = "Segment.Label", max.lag = 10, resid.type = "scaled.pearson", fun = cor, ylim = c(0, 1), subset = "all", ... )
dsm.cor( dsm.obj, Transect.Label = "Transect.Label", Segment.Label = "Segment.Label", max.lag = 10, resid.type = "scaled.pearson", fun = cor, ylim = c(0, 1), subset = "all", ... )
dsm.obj |
a fitted dsm object. |
Transect.Label |
label for the transect (default: |
Segment.Label |
label for the segments (default: |
max.lag |
maximum lag to calculate at. |
resid.type |
the type of residuals used, see
|
fun |
the function to use, by default |
ylim |
user defined limits in y direction. |
subset |
which subset of the data should the correlation function be calculated on? |
... |
other options to pass to |
This function is deprecated, use dsm_var_gam.
dsm.var.gam( dsm.obj, pred.data, off.set, seglen.varname = "Effort", type.pred = "response" )
dsm.var.gam( dsm.obj, pred.data, off.set, seglen.varname = "Effort", type.pred = "response" )
dsm.obj |
a model object fitted by |
pred.data |
either: a single prediction grid or list of prediction
grids. Each grid should be a |
off.set |
a a vector or list of vectors with as many elements as there
are in |
seglen.varname |
name for the column which holds the segment length
(default value |
type.pred |
should the predictions be on the "response" or "link"
scale? (default |
This function is deprecated, use dsm_var_movblk.
dsm.var.movblk( dsm.object, pred.data, n.boot, block.size, off.set, ds.uncertainty = FALSE, samp.unit.name = "Transect.Label", progress.file = NULL, bs.file = NULL, bar = TRUE )
dsm.var.movblk( dsm.object, pred.data, n.boot, block.size, off.set, ds.uncertainty = FALSE, samp.unit.name = "Transect.Label", progress.file = NULL, bs.file = NULL, bar = TRUE )
dsm.object |
object returned from |
pred.data |
either: a single prediction grid or list of prediction
grids. Each grid should be a |
n.boot |
number of bootstrap resamples. |
block.size |
number of segments in each block. |
off.set |
a a vector or list of vectors with as many elements as there
are in |
ds.uncertainty |
incorporate uncertainty in the detection function? See Details, below. Note that this feature is EXPERIMENTAL at the moment. |
samp.unit.name |
name sampling unit to resample (default 'Transect.Label'). |
progress.file |
path to a file to be used (usually by Distance) to
generate a progress bar (default |
bs.file |
path to a file to store each bootstrap round. This stores all
of the bootstrap results rather than just the summaries, enabling
outliers to be detected and removed. (Default |
bar |
should a progress bar be printed to screen? (Default |
This function is deprecated, use dsm_var_prop.
dsm.var.prop( dsm.obj, pred.data, off.set, seglen.varname = "Effort", type.pred = "response" )
dsm.var.prop( dsm.obj, pred.data, off.set, seglen.varname = "Effort", type.pred = "response" )
dsm.obj |
a model object fitted by |
pred.data |
either: a single prediction grid or list of prediction
grids. Each grid should be a |
off.set |
a a vector or list of vectors with as many elements as there
are in |
seglen.varname |
name for the column which holds the segment length
(default value |
type.pred |
should the predictions be on the "response" or "link"
scale? (default |
Create a detection function object for strip/plot surveys for use in density surface models.
dummy_ddf(object, size = 1, width, left = 0, transect = "line")
dummy_ddf(object, size = 1, width, left = 0, transect = "line")
object |
numeric vector of object identifiers, relating to the |
size |
group size for each observation (default all groups size 1) |
width |
right truncation |
left |
left truncation (default 0, no left truncation) |
transect |
|
David L Miller
When using dsm.var.movblk
if ds.uncertainty=TRUE
, this
procedure generates data from the fitted detection function (assuming that
it is correct).
generate.ds.uncertainty(ds.object)
generate.ds.uncertainty(ds.object)
ds.object |
a fitted detection function object (as returned by a call
to |
This function changes the random number generator seed. To avoid any
potential side-effects, use something like: seed <- get(".Random.seed",envir=.GlobalEnv)
before running code and
assign(".Random.seed",seed,envir=.GlobalEnv)
after.
David L. Miller
Not usually used on its own, called from within
dsm.var.movblk
.
generate.mb.sample( num.blocks.required, block.size, which.blocks, dsm.data, unit.info, n.units )
generate.mb.sample( num.blocks.required, block.size, which.blocks, dsm.data, unit.info, n.units )
num.blocks.required |
number of blocks that we need. |
block.size |
number of segments per block. |
which.blocks |
which blocks should be sampled. |
dsm.data |
the |
unit.info |
result of calling |
n.units |
number of sampling units. |
vector of log-residuals
Convert longitude and latitude co-ordinates to kilometres west-east and
south-north from axes through (lon0
,lat0
) using the
"spherical law of cosines".
latlong2km(lon, lat, lon0 = sum(range(lon))/2, lat0 = sum(range(lat))/2)
latlong2km(lon, lat, lon0 = sum(range(lon))/2, lat0 = sum(range(lat))/2)
lon |
longitude |
lat |
latitude |
lon0 |
longitude reference point (defaults to mean longitude) |
lat0 |
latitude reference point (defaults to mean latitude) |
WARNING: This is an approximate procedure for converting between latitude/
longitude and Northing/Easting. Consider using projection conversions
available in packages sp
, sf
and rgdal
for better results.
list with elements km.e
and km.n
.
Simon N. Wood
This routine simply creates a grid of knots (in the correct format) to be used as in the "internal" part of the soap film smoother
make.soapgrid(bnd, n.grid)
make.soapgrid(bnd, n.grid)
bnd |
|
n.grid |
either one number giving the number of points along the |
a list with elements x
and y
, containing the knot locations.
David L Miller
Data from a combination of several NOAA shipboard surveys conducted on pan-tropical spotted dolphins in the Gulf of Mexico. The data consist of 47 observations of groups of dolphins. The group size was recorded, as well as the Beaufort sea state at the time of the observation. Coordinates for each observation and bathymetry data were also available as covariates for the analysis. A complete example analysis (and description of the data) is provided at http://distancesampling.org/R/vignettes/mexico-analysis.html.
Halpin, P.N., A.J. Read, E. Fujioka, B.D. Best, B. Donnelly, L.J. Hazen, C. Kot, K. Urian, E. LaBrecque, A. Dimatteo, J. Cleary, C. Good, L.B. Crowder, and K.D. Hyrenbach. 2009. OBIS-SEAMAP: The world data center for marine mammal, sea bird, and sea turtle distributions. Oceanography 22(2):104-115
NOAA Southeast Fisheries Science Center. 1996. Report of a Cetacean Survey of Oceanic and Selected Continental Shelf Waters of the Northern Gulf of Mexico aboard NOAA Ship Oregon II (Cruise 220)
Given a covariate, calculate the observed and expected counts for each unique value of the covariate. This can be a useful goodness of fit check for DSMs.
obs_exp(model, covar, cut = NULL)
obs_exp(model, covar, cut = NULL)
model |
a fitted |
covar |
covariate to aggregate by ( |
cut |
vector of cut points to aggregate at. If not supplied, the unique
values of |
One strategy for model checking is to calculate observed and expected counts at different aggregations of the variable. If these match well then the model fit is good.
data.frame
with values of observed and expected counts.
David L Miller, on the suggestion of Mark Bravington.
## Not run: library(Distance) library(dsm) # example with the Gulf of Mexico dolphin data data(mexdolphins) hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) mod1 <- dsm(count~s(x,y), hr.model, segdata, obsdata) ## End(Not run)
## Not run: library(Distance) library(dsm) # example with the Gulf of Mexico dolphin data data(mexdolphins) hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) mod1 <- dsm(count~s(x,y), hr.model, segdata, obsdata) ## End(Not run)
Plot the effect of each smooth in the model spatially. For each term in the model, plot its effect in space. Plots are made on the same scale, so that the relative influence of each smooth can be seen.
plot_pred_by_term(dsm.obj, data, location.cov = c("x", "y"))
plot_pred_by_term(dsm.obj, data, location.cov = c("x", "y"))
dsm.obj |
fitted |
data |
data to use to plot (often the same as the prediction grid), data
should also include |
location.cov |
deprecated, use |
a ggplot2
plot
David L Miller (idea taken from inlabru
)
## Not run: library(Distance) library(dsm) # load the Gulf of Mexico dolphin data and fit a model data(mexdolphins) hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) mod1 <- dsm(count~s(x,y) + s(depth), hr.model, segdata, obsdata) preddata$width <- preddata$height <- sqrt(preddata$area) # make the plot plot_pred_by_term(mod1, preddata, c("x","y")) # better plot would be # library(viridis) # plot_pred_by_term(mod1, preddata, c("x","y")) + scale_fill_viridis() ## End(Not run)
## Not run: library(Distance) library(dsm) # load the Gulf of Mexico dolphin data and fit a model data(mexdolphins) hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) mod1 <- dsm(count~s(x,y) + s(depth), hr.model, segdata, obsdata) preddata$width <- preddata$height <- sqrt(preddata$area) # make the plot plot_pred_by_term(mod1, preddata, c("x","y")) # better plot would be # library(viridis) # plot_pred_by_term(mod1, preddata, c("x","y")) + scale_fill_viridis() ## End(Not run)
Note that the prediction data set must have x
and y
columns even if
these were not used in the model.
## S3 method for class 'dsm.var' plot( x, poly = NULL, limits = NULL, breaks = NULL, legend.breaks = NULL, xlab = "x", ylab = "y", observations = TRUE, plot = TRUE, boxplot.coef = 1.5, x.name = "x", y.name = "y", gg.grad = NULL, ... )
## S3 method for class 'dsm.var' plot( x, poly = NULL, limits = NULL, breaks = NULL, legend.breaks = NULL, xlab = "x", ylab = "y", observations = TRUE, plot = TRUE, boxplot.coef = 1.5, x.name = "x", y.name = "y", gg.grad = NULL, ... )
x |
a |
poly |
a |
limits |
limits for the fill colours |
breaks |
breaks for the colour fill |
legend.breaks |
breaks as they should be displayed |
xlab |
label for the |
ylab |
label for the |
observations |
should observations be plotted? |
plot |
actually plot the map, or just return a |
boxplot.coef |
control trimming (as in
|
x.name |
name of the variable to plot as the |
y.name |
name of the variable to plot as the |
gg.grad |
optional |
... |
any other arguments |
a plot
In order to get plotting to work with dsm_var_prop
and
dsm_var_gam
, one must first format the data correctly
since these functions are designed to compute very general summaries. One
summary is calculated for each element of the list pred
supplied to
dsm_var_prop
and dsm_var_gam
.
For a plot of uncertainty over a prediction grid, pred
(a data.frame
),
say, we can create the correct format by simply using pred.new <- split(pred,1:nrow(pred))
.
David L. Miller
dsm_var_prop
, dsm_var_gam
,
dsm_var_movblk
Make predictions of density or abundance outside (or inside) the covered area.
## S3 method for class 'dsm' predict(object, newdata = NULL, off.set = NULL, type = "response", ...)
## S3 method for class 'dsm' predict(object, newdata = NULL, off.set = NULL, type = "response", ...)
object |
a fitted |
newdata |
spatially referenced covariates e.g. altitude, depth,
distance to shore, etc. Covariates in the |
off.set |
area of each of the cells in the prediction grid. Should be
in the same units as the segments/distances given to |
type |
what scale should the results be on. The default is
|
... |
any other arguments passed to |
If newdata
is not supplied, predictions are made for the data that built
the model. Note that the order of the results will not necessarily be the
same as the segdata
(segment data) data.frame
that was supplied to
dsm
.
The area off.set
is used if that argument is supplied, otherwise it will
look for the areas in the column named off.set
in the newdata
. Either
way the link function (usually log
) will be applied to the offsets, so
there is no need to log them before passing them to this function.
predicted values on the response scale by default (unless type
is
specified, in which case see predict.gam
).
David L. Miller
predict.gam
, dsm_var_gam
,
dsm_var_prop
Prediction function for dummy detection functions. The function returns as
many 1s as there are rows in newdata
. If esw=TRUE
then the
strip width is returned.
## S3 method for class 'fake_ddf' predict( object, newdata = NULL, compute = FALSE, int.range = NULL, esw = FALSE, ... )
## S3 method for class 'fake_ddf' predict( object, newdata = NULL, compute = FALSE, int.range = NULL, esw = FALSE, ... )
object |
model object |
newdata |
how many 1s should we return? |
compute |
unused, compatibility with |
int.range |
unused, compatibility with |
esw |
should the strip width be returned? |
... |
for S3 consistency |
David L Miller
This method just gives a short description of the fitted model. Use the
summary.dsm
method for more information.
## S3 method for class 'dsm' print(x, ...)
## S3 method for class 'dsm' print(x, ...)
x |
a model fitted by |
... |
unspecified and unused arguments for S3 consistency |
NULL
David L. Miller
This method only provides a short summary, see
summary.dsm_varprop
.
## S3 method for class 'dsm_varprop' print(x, ...)
## S3 method for class 'dsm_varprop' print(x, ...)
x |
a |
... |
unspecified and unused arguments for S3 consistency |
David L. Miller
This method only provides a short summary, use the
summary.dsm.var
method for information.
## S3 method for class 'dsm.var' print(x, ...)
## S3 method for class 'dsm.var' print(x, ...)
x |
a |
... |
unspecified and unused arguments for S3 consistency |
NULL
David L. Miller
See summary.dsm_varprop
for information.
## S3 method for class 'summary.dsm_varprop' print(x, ...)
## S3 method for class 'summary.dsm_varprop' print(x, ...)
x |
a summary of |
... |
unspecified and unused arguments for S3 consistency |
NULL
David L. Miller
See summary.dsm.var
for information.
## S3 method for class 'summary.dsm.var' print(x, ...)
## S3 method for class 'summary.dsm.var' print(x, ...)
x |
a summary of |
... |
unspecified and unused arguments for S3 consistency |
NULL
David L. Miller
Reproduces the "Resids vs. linear pred" plot from
gam.check
but using randomised quantile residuals, a la
Dunn and Smyth (1996). Checks for heteroskedasticity as as usual, looking
for "funnel"-type structures in the points, which is much easier with
randomised quantile residuals than with deviance residuals, when your model
uses a count distribution as the response.
rqgam_check(gam.obj, ...)
rqgam_check(gam.obj, ...)
gam.obj |
|
... |
arguments passed on to all plotting functions |
Note that this function only works with negative binomial and Tweedie response distributions.
Earlier versions of this function produced the full
gam.check
output, but this was confusing as only one of
the plots was really useful. Checks of k
are not computed, these need to
be done using gam.check
.
just plots!
Based on code by Natalie Kelly, bugs added by Dave Miller
library(Distance) library(dsm) library(tweedie) # load the Gulf of Mexico dolphin data (see ?mexdolphins) data(mexdolphins) # fit a detection function and look at the summary hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) # fit a simple smooth of x and y with a Tweedie response with estimated # p parameter mod1 <- dsm(count~s(x, y), hr.model, segdata, obsdata, family=tw()) rqgam_check(mod1)
library(Distance) library(dsm) library(tweedie) # load the Gulf of Mexico dolphin data (see ?mexdolphins) data(mexdolphins) # fit a detection function and look at the summary hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) # fit a simple smooth of x and y with a Tweedie response with estimated # p parameter mod1 <- dsm(count~s(x, y), hr.model, segdata, obsdata, family=tw()) rqgam_check(mod1)
This function is deprecated, use rqgam_check.
rqgam.check(gam.obj, ...)
rqgam.check(gam.obj, ...)
gam.obj |
|
... |
arguments passed on to all plotting functions |
Gives a brief summary of a fitted dsm
object.
## S3 method for class 'dsm' summary(object, ...)
## S3 method for class 'dsm' summary(object, ...)
object |
a model fitted by |
... |
other arguments passed to |
a summary object
David L. Miller
Gives a brief summary of a fitted dsm_varprop
variance
object.
## S3 method for class 'dsm_varprop' summary(object, alpha = 0.05, ...)
## S3 method for class 'dsm_varprop' summary(object, alpha = 0.05, ...)
object |
a |
alpha |
alpha level for confidence intervals (default 0.05 to give a 95% confidence internal) |
... |
unused arguments for S3 compatibility |
a summary object
David L. Miller
Gives a brief summary of a fitted dsm
variance object.
## S3 method for class 'dsm.var' summary( object, alpha = 0.05, boxplot.coef = 1.5, bootstrap.subregions = NULL, ... )
## S3 method for class 'dsm.var' summary( object, alpha = 0.05, boxplot.coef = 1.5, bootstrap.subregions = NULL, ... )
object |
a |
alpha |
alpha level for confidence intervals (default 0.05 to give a 95\ confidence intervals) |
boxplot.coef |
the value of |
bootstrap.subregions |
list of vectors of logicals or indices for
subregions for which variances need to be calculated (only for bootstraps
(see |
... |
unused arguments for S3 compatibility |
a summary object
David L. Miller
Trim the variance estimates from the bootstrap. This is defined as the percentage defined as amount necessary to bring median and trimmed mean within 8% of each other these are defined as 'outliers'.
trim.var(untrimmed.bootstraps, boxplot.coef = 1.5)
trim.var(untrimmed.bootstraps, boxplot.coef = 1.5)
untrimmed.bootstraps |
(usually the |
boxplot.coef |
the value of |
trimmed variance
Louise Burt
Plot measures of how much one term in the model could be explained by another. When values are high, one should consider re-running variable selection with one of the offending variables removed to check for stability in term selection.
vis_concurvity(model, type = "estimate")
vis_concurvity(model, type = "estimate")
model |
fitted model |
type |
concurvity measure to plot, see |
These methods are considered somewhat experimental at this time. Consult
concurvity
for more information on how concurvity
measures are calculated.
David L Miller
## Not run: library(Distance) library(dsm) # load the Gulf of Mexico dolphin data (see ?mexdolphins) data(mexdolphins) # fit a detection function and look at the summary hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) # fit a simple smooth of x and y to counts mod1 <- dsm(count~s(x,y)+s(depth), hr.model, segdata, obsdata) # visualise concurvity using the "estimate" metric vis_concurvity(mod1) ## End(Not run)
## Not run: library(Distance) library(dsm) # load the Gulf of Mexico dolphin data (see ?mexdolphins) data(mexdolphins) # fit a detection function and look at the summary hr.model <- ds(distdata, truncation=6000, key = "hr", adjustment = NULL) # fit a simple smooth of x and y to counts mod1 <- dsm(count~s(x,y)+s(depth), hr.model, segdata, obsdata) # visualise concurvity using the "estimate" metric vis_concurvity(mod1) ## End(Not run)
This function is deprecated, use vis_concurvity.
vis.concurvity(model, type = "estimate")
vis.concurvity(model, type = "estimate")
model |
fitted model |
type |
concurvity measure to plot, see |