Title: | Convert Distance for Windows projects to R analyses |
---|---|
Description: | Take projects built using Distance for Windows and create R scripts which duplicate the analysis. Optionally build a test suite that checks analysis results from Distance with the equivalent R results. |
Authors: | David Miller [aut, cre] |
Maintainer: | David Miller <[email protected]> |
License: | GPL |
Version: | 0.0.6.9003 |
Built: | 2024-11-21 03:20:19 UTC |
Source: | https://github.com/distanceDevelopment/readdst |
This package read data and model definitions from a Distance for Windows project (.dst
and .dat
files) and converts models to run in the R package mrds
.
Usually, a workflow will look something like that below, centred around the functions convert_project
and run_analysis
. See also the vignette shipped with the package for example output.
## Not run: library(readdst) # load the golftees sample project and convert it project <- system.file("Golftees-example", package="readdst") project <- paste0(project,"/Golftees") converted <- convert_project(project) # run the first analysis in the project and look at model summary analysis_1 <- run_analysis(converted[[1]], debug=TRUE) summary(analysis_1) ## End(Not run)
## Not run: library(readdst) # load the golftees sample project and convert it project <- system.file("Golftees-example", package="readdst") project <- paste0(project,"/Golftees") converted <- convert_project(project) # run the first analysis in the project and look at model summary analysis_1 <- run_analysis(converted[[1]], debug=TRUE) summary(analysis_1) ## End(Not run)
Use the DataLayers table to work out the hierarchy of the tables and layers in the database held by Distance.
build_layer_hierarchy(data_file)
build_layer_hierarchy(data_file)
data_file |
a data file to load the database from |
David L Miller
Take each analysis in a Distance for Windows project and convert the model definition to an mrds
model, data and data filters are also extracted and associated with the relevant models.
convert_project(project)
convert_project(project)
project |
a path to a project (path to the |
an object of class converted_distance_analyses
(if there are analyses defined), an object of class converted_distance_data
(if no analyses are present in the project). Either way an attribute called "flatfile"
is also returned with a flat version of the data.
Only CDS/MCDS/MRDS analyses are supported.
Model names are as they are in Distance for Windows (so if you have nonsensical names in Distance for Windows they will be the same in R).
David L Miller
converted_distance_analyses readdst-package
Once convert_project
has been run on a project, two types of object are created: first an object of class converted_distance_analyses
, which is just a list of converted_distance_analysis
objects.
converted_distance_analysis
contain all the information necessary to run a Distance for Windows model in R. Each object has the following elements:
call
string with the call to ddf
to build and run the model
aic.select
maximum number of terms to select by AIC if AIC term selection has been enabled (for key plus adjustment terms models only)
status
what the status of this model was in Distance for Windows (see "Status" below)
env
an environment
that contains data needed to run the model (data
containing entire dataset in flatfile form, obs.table
containing the observation table, sample.table
is the sample table, reg.table
is the region table and units
is a matrix describing conversion factor of distance measures (effort and detection distance) to areal measurements (for density))
filter
string used to subset the data to get the same filter as in Distance for Windows
group_size
describes how size bias adjustment is conducted, and the level of hierarchy at which E(s) is computed
detection_by
level of design hierarchy at which detection function is computed (e.g. pooled across strata)
gof_intervals
if binning is done for GOF testing, cutpoints are provided here
estimation
what sort of weighted average is used to compute region-level density estimate
name
the name for this analysis, as used in Distance for Windows
ID
the ID number for this analysis, as used in Distance for Windows
The status
code is taken from Distance for Windows to indicate whether the analysis has been run yet and what the outcome was. Status codes are as follows:
0
analyses has not been run in Distance for Windows yet
1
analysis ran without errors or warnings
2
analysis ran with warnings
3
analysis ran with errors
Note that an analysis that runs with error in Distance for Windows may run fine in R and an analysis that runs fine in Distance for Windows may not work in R. In the latter case, please consider submitting this a a bug to github.com/distancedevelopment/distance-bugs.
If convert_project
has been run on a project, but there are no analyses present in the project, then a list of the data will be returned. The list has one element for each data filter which was present in the project.
Each element of the list has the following tables in it:
This is an internal routine and shouldn't be necessary in normal analyses.
create_bins(data, cutpoints)
create_bins(data, cutpoints)
data |
|
cutpoints |
vector of cutpoints for the bins |
data data
with two extra columns distbegin
and
distend
.
David L. Miller
This function is a wrapper around either calls to RODBC
(on Windows) or mdb.get
(on Unix-a-like systems). Given a database file name it will return either the contents of the table (as data.frame
), if table=NULL
then it will return all tables and if table=TRUE
then it will return a character vector of table names.
db_get(file, table = NULL)
db_get(file, table = NULL)
file |
the path to the database file to access |
table |
the table to extract (if |
a data.frame
with the contents of a database table
Currently not implemented on Windows systems.
David L Miller
Take the "data filters" applied by Distance for Windows to the data and use them to subset the data.
filter_data(data, data_filter)
filter_data(data, data_filter)
data |
the data to be filtered |
data_filter |
a data filter to be parsed (output from |
a list with two elements, the data and the filter string
David L Miller
Extracts the relevant tables from the Distance for Windows database to build data that can be used with mrds
or Distance
.
get_data(data_file)
get_data(data_file)
data_file |
the path to a |
a "flatfile" compatible data.frame
containing all of the information necessary to make a stratified abundance/density estimate.
David L Miller
Takes the "Definition" column from a table, converts it to character
and puts each row in a list with names as the corresponding ID of that row.
get_definitions(file, table)
get_definitions(file, table)
file |
the name of the |
table |
which table to extract the "Description" column from |
a list
of definitions, each element of which is a character vector.
David L Miller
At the moment only extracts the AIC and likelihood
get_stats(project_file, stats_table)
get_stats(project_file, stats_table)
project_file |
path to project file |
stats_table |
a |
Codes used to determine the meanings of statistics are given at https://github.com/DistanceDevelopment/readdst/wiki/distance-results-codes.
David L Miller
Obtain a list of conversion to SI units from the units that the measurements in a Distance for Windows project are in.
get_unit_conversion(data_file)
get_unit_conversion(data_file)
data_file |
Distance for Windows project data file |
a data.frame
with columns Variable
, Units
and Conversion
, giving the variable name, the units it is measured in and the conversation factor to SI units.
David L Miller
Distance for Windows includes a few different methods for accounting for group (or cluster) size (at the adundance/density estimation stage). These include using the mean group size for all observations or using a regression of size against distance or log size against distance.
group_size_est(data, group_size, model)
group_size_est(data, group_size, model)
data |
the data for this analysis |
group_size |
the |
model |
a fitted model |
estimated cluster sizes (numeric
vector of length nrow(data)
), or NULL
if there were no instructions on how to estimate group/cluster size
David L Miller
This function calls make_model
to create the call to ddf
it also creates an environment with the data necessary to perform the call.
make_analysis(this_analysis, model_definitions, data_filters, data, transect)
make_analysis(this_analysis, model_definitions, data_filters, data, transect)
this_analysis |
an analysis from Distance |
model_definitions |
a list of model definitions |
data_filters |
a list of data filters |
data |
the data to use with the model (see |
transect |
the transect type |
a list with the following elements: a character string specifying a call to ddf
, an environment to run it in, the name of the analysis and it's ID.
David L Miller
ddf
Build the control
options for a ddf
call.
make_control(md)
make_control(md)
md |
model definition data to parse |
character string describing the control
list
From a model definition build the dsmodel
part of the model.
make_dsmodel(md)
make_dsmodel(md)
md |
a model definition |
a character string starting with "dsmodel=
" or NULL
if no dsmodel
component in this model
David L Miller
Build a formula, ensuring that the correct terms are factors
make_formula(md_formula, md_factors)
make_formula(md_formula, md_factors)
md_formula |
"Formula" data from a model definition |
md_factors |
"Factors" data from a model definition |
a character string specifying a formula, starting with "formula=~
"
David L Miller
From a model definition build the dsmodel
part of the model.
make_meta.data(df, transect, data)
make_meta.data(df, transect, data)
df |
a data filter object |
transect |
type of transect |
data |
the data used in the model |
a character string starting with meta.data=
David L Miller
Reproduce the corresponding call to ddf
to reproduce an analysis from Distance for Windows.
make_model(this_analysis, model_definitions, data_filters, transect, data)
make_model(this_analysis, model_definitions, data_filters, transect, data)
this_analysis |
an analysis from Distance |
model_definitions |
a list of model definitions |
data_filters |
a list of data filters |
transect |
the transect type |
data |
the data |
a character string specifying a call to ddf
David L Miller
From a model definition build the mrmodel
part of the model.
make_mrmodel(md)
make_mrmodel(md)
md |
a model definition |
a character string starting with "mrmodel=
" or NULL
if there is no mrmodel
component in this model.
David L Miller
In Distance for Windows, one can choose to estimate the detection function by stratum. In this case more than one detection function is returned when run_analysis
is used to run the analysis. In order to test the statistics stored in the Distance for Windows project, one must first combine the resulting models (and their corresponding abundance and density estimates). This function performs these operations.
merge_results(models, analysis)
merge_results(models, analysis)
models |
a |
analysis |
an analysis specification (to inform us on the stratification to be used. |
a list including the "combined" model, summary and density/abundance estimates (dht
output). Note that these are almost definitely not valid objects for their respective classes, they are only to be used to test statistics.
David L Miller
This takes a fitted mrds
model object and returns a string that describes the detection function fitted and the fitted model's AIC.
model_description(model)
model_description(model)
model |
a fitted model |
a string describing the model
David L Miller
Run model selection for a given analysis. The returned object is exactly as if the model has been run using ddf
, so anything that can normally be done with a ddf
object can be done with the return.
model_selection(analysis, debug = FALSE)
model_selection(analysis, debug = FALSE)
analysis |
a converted analysis |
debug |
display the call and name of the model before it is run, print AIC selection details |
Model selection is performed via AIC.
fitted ddf
object
David L Miller
Given a data filter "Definition", pre-processed by get_definitions
, extract the useful information from it.
parse_definition.data_filter(df)
parse_definition.data_filter(df)
df |
a definition |
named list of defintions
A definition consists either of a key=value pair or a name then key=value pairs separated by \
and terminated with ;
.
Note that this function should be called for a single definition, usually using lapply
.
David L Miller
Given data from a "Definition", pre-processed by get_definitions
, extract the useful information from it.
parse_definition.model(df)
parse_definition.model(df)
df |
a definition (vector of character strings) |
a list
of list
s
See the "MCDS Command Language" section of the Distance manual for more information.
Note that this function should be called for a single definition, usually using lapply
.
David L Miller
Prints a table of the analyses that have been converted and their status from Distance for Windows.
## S3 method for class 'converted_distance_analyses' print(x, ...)
## S3 method for class 'converted_distance_analyses' print(x, ...)
x |
converted distance analyses |
... |
unused additional args for S3 compatibility |
Prints details of an analyses that has been converted.
## S3 method for class 'converted_distance_analysis' print(x, ...)
## S3 method for class 'converted_distance_analysis' print(x, ...)
x |
converted distance analyses |
... |
unused additional args for S3 compatibility |
This is simply a print
method to nicely ouput the results of test_stats
.
## S3 method for class 'distance_stats_table' print(x, ..., digits = NULL)
## S3 method for class 'distance_stats_table' print(x, ..., digits = NULL)
x |
the result of a call to |
just prints the results
David L Miller
Take a single converted analysis and run the model contained therein.
run_analysis(analysis, debug = FALSE)
run_analysis(analysis, debug = FALSE)
analysis |
a converted analysis |
debug |
display the call and name of the model before it is run, print AIC selection details |
A previous call to convert_project
will return a list of projects. Only one analysis at a time can be run with run_analysis
. If you wish to run all the analyses in the project, see the code below using lapply
.
If an analysis needs to select the number of adjustment terms (for key plus adjustment detection functions) by AIC, then that selection is done at this stage.
fitted ddf
object
David L Miller
## Not run: library(readdst) # load and convert the golftees project project <- system.file("Golftees-example", package="readdst") project <- paste0(project, "/Golftees") converted <- convert_project(project) # run the first analysis analysis_1 <- run_analysis(converted[[1]], debug=TRUE) # look at the resulting model output summary(analysis_1) # run all the analyses in a project all_analyses_run <- lapply(converted, run_analysis) ## End(Not run)
## Not run: library(readdst) # load and convert the golftees project project <- system.file("Golftees-example", package="readdst") project <- paste0(project, "/Golftees") converted <- convert_project(project) # run the first analysis analysis_1 <- run_analysis(converted[[1]], debug=TRUE) # look at the resulting model output summary(analysis_1) # run all the analyses in a project all_analyses_run <- lapply(converted, run_analysis) ## End(Not run)
Set column names in data to be as in formulae
set_covar_names(data, covnames)
set_covar_names(data, covnames)
data |
a |
covnames |
the covariates that are factors |
To use get_stats
we need a set of statistics to test. We also require their codes (to look up in the Distance for Windows database) and their equivalent values in mrds
(or how to calculate those values). This function provides such a table.
stats_table(engine = "CDS")
stats_table(engine = "CDS")
engine |
which engine do we need to compute stats for? |
a data.frame
with statistics Distance for Windows collects that have equivalents in mrds
. The data.frame
has three columns: Code
, the numeric code for the statistic (as used in the Distance for Windows database); Name
, the short name for this statistic; MRDS
, the operation required to obtain the equivalent statistic in mrds
; Description
, a short description of the statistic.
Data for this table (numeric code and descriptions) is from the DistIni.mdb
which is shipped with Distance for Windows. See also https://github.com/distancedevelopment/readdst/wiki/distance-results-codes.
Note that the Cramer-von Mises p-value as recorded in Distance for Windows is only recorded to the nearest 0.1.
David L Miller
Tests the results stored in the Distance for Windows project file against those generated from running the same analysis in R.
test_stats(analysis, statuses = 1, tolerance = 0.01)
test_stats(analysis, statuses = 1, tolerance = 0.01)
analysis |
a converted (but not run) analysis |
statuses |
for which statuses should tests be run? See "Status", below (Defaults to |
tolerance |
the tolerance of the test (default 0.01) |
A previous call to convert_project
will return a list of projects. Only one analysis at a time can be run with test_stats
. If you wish to run all the analyses in the project, you can use lapply
.
a data.frame
with five columns: Statistic
, a description of the tested statistic; Distance_value
the value of the statistic stored by Distance for Windows; mrds_value
the value of the statistic calculated by mrds
; Difference
the proportional difference between the previous two columns (computed using all.equal
); Pass
a series of ticks, indicating that the value in the Difference
column is less than tolerance
.
The status
code is taken from Distance for Windows to indicate whether the analysis has been run yet and what the outcome was. Status codes are as follows:
0
analyses has not been run in Distance for Windows yet
1
analysis ran without errors or warnings
2
analysis ran with warnings
3
analysis ran with errors
If an analysis has a status of 0 or 3 there will usually not be any statistics attached to the analysis, so no tests will be run.
Note that an analysis that runs with error in Distance for Windows may run fine in R and an analysis that runs fine in Distance for Windows may not work in R. In the latter case, please consider submitting this a a bug to github.com/distancedevelopment/distance-bugs.
Tests all available statistics.
## Not run: library(readdst) # load the golftees sample project and convert it project <- system.file("Golftees-example", package="readdst") project <- paste0(project,"/Golftees") converted <- convert_project(project) # run tests for analysis 1 test_stats(converted[[1]]) ## End(Not run)
## Not run: library(readdst) # load the golftees sample project and convert it project <- system.file("Golftees-example", package="readdst") project <- paste0(project,"/Golftees") converted <- convert_project(project) # run tests for analysis 1 test_stats(converted[[1]]) ## End(Not run)
Given distance sampling survey data in flatfile format, convert it to the four tables required by dht
.
unflatfile(data)
unflatfile(data)
list of the four data.frame
s described in "Details".
region.table data.frame
with two columns: Region.Label
, label for the region; Area
, area of the region. region.table
has one row for each stratum. If there is no stratification then region.table
has one entry with Area
corresponding to the total survey area.
sample.table data.frame
mapping the regions to the samples (i.e. transects). There are three columns: Sample.Label
, label for the sample; Region.Label
, label for the region that the sample belongs to.; Effort
, the effort expended in that sample (e.g. transect length).
obs.table data.frame
mapping the individual observations (objects) to regions and samples. There should be three columns: object
, the observation ID; Region.Label
, label for the region that the sample belongs to; Sample.Label
, label for the sample.
data a data.frame
containing at least a column called distance
. NOTE! If there is a column called size
in the data then it will be interpreted as group/cluster size.
Based on checkdata
from package Distance
.
David L. Miller
Returns a table of conversions between the units used in Distance for Windows. This is extracted from the DistIni.mdb
default database.
units_table()
units_table()
David L Miller