Package 'readdst' reference manual

Title:	Convert Distance for Windows projects to R analyses
Description:	Take projects built using Distance for Windows and create R scripts which duplicate the analysis. Optionally build a test suite that checks analysis results from Distance with the equivalent R results.
Authors:	David Miller [aut, cre]
Maintainer:	David Miller <[email protected]>
License:	GPL
Version:	0.0.6.9003
Built:	2025-03-21 03:18:22 UTC
Source:	https://github.com/distanceDevelopment/readdst

Convert Distance for Windows analyses to R code

Description

This package read data and model definitions from a Distance for Windows project (.dst and .dat files) and converts models to run in the R package mrds.

Details

Usually, a workflow will look something like that below, centred around the functions convert_project and run_analysis . See also the vignette shipped with the package for example output.

Examples

## Not run: 
library(readdst)
# load the golftees sample project and convert it
project <- system.file("Golftees-example", package="readdst")
project <- paste0(project,"/Golftees")
converted <- convert_project(project)

# run the first analysis in the project and look at model summary
analysis_1 <- run_analysis(converted[[1]], debug=TRUE)
summary(analysis_1)

## End(Not run)
## Not run: 
library(readdst)
# load the golftees sample project and convert it
project <- system.file("Golftees-example", package="readdst")
project <- paste0(project,"/Golftees")
converted <- convert_project(project)

# run the first analysis in the project and look at model summary
analysis_1 <- run_analysis(converted[[1]], debug=TRUE)
summary(analysis_1)

## End(Not run)

Work out the layer hierarchy in the Distance database

Description

Use the DataLayers table to work out the hierarchy of the tables and layers in the database held by Distance.

Usage

build_layer_hierarchy(data_file)
build_layer_hierarchy(data_file)

Arguments

data_file

a data file to load the database from

Author(s)

David L Miller

Convert a Distance for Windows project to be run in R

Description

Take each analysis in a Distance for Windows project and convert the model definition to an mrds model, data and data filters are also extracted and associated with the relevant models.

Usage

convert_project(project)
convert_project(project)

Arguments

project

a path to a project (path to the dst file with ".dst" removed from the end of the path)

Value

an object of class converted_distance_analyses (if there are analyses defined), an object of class converted_distance_data (if no analyses are present in the project). Either way an attribute called "flatfile" is also returned with a flat version of the data.

Details

Only CDS/MCDS/MRDS analyses are supported.

Model names are as they are in Distance for Windows (so if you have nonsensical names in Distance for Windows they will be the same in R).

Author(s)

David L Miller

Converted analysis objects

Description

Once convert_project has been run on a project, two types of object are created: first an object of class converted_distance_analyses, which is just a list of converted_distance_analysis objects.

Details

converted_distance_analysis contain all the information necessary to run a Distance for Windows model in R. Each object has the following elements:

call string with the call to ddf to build and run the model
aic.select maximum number of terms to select by AIC if AIC term selection has been enabled (for key plus adjustment terms models only)
status what the status of this model was in Distance for Windows (see "Status" below)
env an environment that contains data needed to run the model (data containing entire dataset in flatfile form, obs.table containing the observation table, sample.table is the sample table, reg.table is the region table and units is a matrix describing conversion factor of distance measures (effort and detection distance) to areal measurements (for density))
filter string used to subset the data to get the same filter as in Distance for Windows
group_size describes how size bias adjustment is conducted, and the level of hierarchy at which E(s) is computed
detection_by level of design hierarchy at which detection function is computed (e.g. pooled across strata)
gof_intervals if binning is done for GOF testing, cutpoints are provided here
estimation what sort of weighted average is used to compute region-level density estimate
name the name for this analysis, as used in Distance for Windows
ID the ID number for this analysis, as used in Distance for Windows

Status

The status code is taken from Distance for Windows to indicate whether the analysis has been run yet and what the outcome was. Status codes are as follows:

0 analyses has not been run in Distance for Windows yet
1 analysis ran without errors or warnings
2 analysis ran with warnings
3 analysis ran with errors

Note that an analysis that runs with error in Distance for Windows may run fine in R and an analysis that runs fine in Distance for Windows may not work in R. In the latter case, please consider submitting this a a bug to github.com/distancedevelopment/distance-bugs.

Converted distance data

Description

If convert_project has been run on a project, but there are no analyses present in the project, then a list of the data will be returned. The list has one element for each data filter which was present in the project. Each element of the list has the following tables in it:

Create bins from a set of binned distances and a set of cutpoints.

Description

This is an internal routine and shouldn't be necessary in normal analyses.

Usage

create_bins(data, cutpoints)
create_bins(data, cutpoints)

Arguments

`data`	`data.frame` with at least the column `distance`.
`cutpoints`	vector of cutpoints for the bins

Value

data data with two extra columns distbegin and distend.

Author(s)

David L. Miller

Get data from the Distance project database

Description

This function is a wrapper around either calls to RODBC (on Windows) or mdb.get (on Unix-a-like systems). Given a database file name it will return either the contents of the table (as data.frame), if table=NULL then it will return all tables and if table=TRUE then it will return a character vector of table names.

Usage

db_get(file, table = NULL)
db_get(file, table = NULL)

Arguments

`file`	the path to the database file to access
`table`	the table to extract (if `NULL` all tables are extracted, if `TRUE` a list of tables names are extracted)

Value

a data.frame with the contents of a database table

Note

Currently not implemented on Windows systems.

Author(s)

David L Miller

Filter the data

Description

Take the "data filters" applied by Distance for Windows to the data and use them to subset the data.

Usage

filter_data(data, data_filter)
filter_data(data, data_filter)

Arguments

`data`	the data to be filtered
`data_filter`	a data filter to be parsed (output from `parse_definition.data_filter`)

Value

a list with two elements, the data and the filter string

Author(s)

David L Miller

Extract data from a Distance database

Description

Extracts the relevant tables from the Distance for Windows database to build data that can be used with mrds or Distance.

Usage

get_data(data_file)
get_data(data_file)

Arguments

data_file

the path to a DistData.mdb file.

Value

a "flatfile" compatible data.frame containing all of the information necessary to make a stratified abundance/density estimate.

Author(s)

David L Miller

Extract definition information from tables

Description

Takes the "Definition" column from a table, converts it to character and puts each row in a list with names as the corresponding ID of that row.

Usage

get_definitions(file, table)
get_definitions(file, table)

Arguments

`file`	the name of the `.mdb` file
`table`	which table to extract the "Description" column from

Value

a list of definitions, each element of which is a character vector.

Author(s)

David L Miller

Extract saved statistics for analyses

Description

At the moment only extracts the AIC and likelihood

Usage

get_stats(project_file, stats_table)
get_stats(project_file, stats_table)

Arguments

`project_file`	path to project file
`stats_table`	a `data.frame` containing possible statistics

Details

Codes used to determine the meanings of statistics are given at https://github.com/DistanceDevelopment/readdst/wiki/distance-results-codes.

Author(s)

David L Miller

Get the unit conversions for the data

Description

Obtain a list of conversion to SI units from the units that the measurements in a Distance for Windows project are in.

Usage

get_unit_conversion(data_file)
get_unit_conversion(data_file)

Arguments

data_file

Distance for Windows project data file

Value

a data.frame with columns Variable, Units and Conversion, giving the variable name, the units it is measured in and the conversation factor to SI units.

Author(s)

David L Miller

Estimate group size

Description

Distance for Windows includes a few different methods for accounting for group (or cluster) size (at the adundance/density estimation stage). These include using the mean group size for all observations or using a regression of size against distance or log size against distance.

Usage

group_size_est(data, group_size, model)
group_size_est(data, group_size, model)

Arguments

`data`	the data for this analysis
`group_size`	the `group_size` element of an analysis object
`model`	a fitted model

Value

estimated cluster sizes (numeric vector of length nrow(data)), or NULL if there were no instructions on how to estimate group/cluster size

Author(s)

David L Miller

Make an analysis

Description

This function calls make_model to create the call to ddf it also creates an environment with the data necessary to perform the call.

Usage

make_analysis(this_analysis, model_definitions, data_filters, data, transect)
make_analysis(this_analysis, model_definitions, data_filters, data, transect)

Arguments

`this_analysis`	an analysis from Distance
`model_definitions`	a list of model definitions
`data_filters`	a list of data filters
`data`	the data to use with the model (see `get_data` and `unflatfile`)
`transect`	the transect type

Value

a list with the following elements: a character string specifying a call to ddf, an environment to run it in, the name of the analysis and it's ID.

Author(s)

David L Miller

Make the control element of a call to `ddf`

Description

Build the control options for a ddf call.

Usage

make_control(md)
make_control(md)

Arguments

`md`	model definition data to parse

Value

character string describing the control list

Build a dsmodel call

Description

From a model definition build the dsmodel part of the model.

Usage

make_dsmodel(md)
make_dsmodel(md)

Arguments

`md`	a model definition

Value

a character string starting with "dsmodel=" or NULL if no dsmodel component in this model

Author(s)

David L Miller

Build a dsmodel or mrmodel formula

Description

Build a formula, ensuring that the correct terms are factors

Usage

make_formula(md_formula, md_factors)
make_formula(md_formula, md_factors)

Arguments

`md_formula`	"Formula" data from a model definition
`md_factors`	"Factors" data from a model definition

Value

a character string specifying a formula, starting with "formula=~"

Author(s)

David L Miller

Build the model meta.data

Description

From a model definition build the dsmodel part of the model.

Usage

make_meta.data(df, transect, data)
make_meta.data(df, transect, data)

Arguments

`df`	a data filter object
`transect`	type of transect
`data`	the data used in the model

Value

a character string starting with meta.data=

Author(s)

David L Miller

Build a distance sampling analysis

Description

Reproduce the corresponding call to ddf to reproduce an analysis from Distance for Windows.

Usage

make_model(this_analysis, model_definitions, data_filters, transect, data)
make_model(this_analysis, model_definitions, data_filters, transect, data)

Arguments

`this_analysis`	an analysis from Distance
`model_definitions`	a list of model definitions
`data_filters`	a list of data filters
`transect`	the transect type
`data`	the data

Value

a character string specifying a call to ddf

Author(s)

David L Miller

Build a mrmodel call

Description

From a model definition build the mrmodel part of the model.

Usage

make_mrmodel(md)
make_mrmodel(md)

Arguments

`md`	a model definition

Value

a character string starting with "mrmodel=" or NULL if there is no mrmodel component in this model.

Author(s)

David L Miller

Merge results from stratified analyses

Description

In Distance for Windows, one can choose to estimate the detection function by stratum. In this case more than one detection function is returned when run_analysis is used to run the analysis. In order to test the statistics stored in the Distance for Windows project, one must first combine the resulting models (and their corresponding abundance and density estimates). This function performs these operations.

Usage

merge_results(models, analysis)
merge_results(models, analysis)

Arguments

`models`	a `list` of model (`ddf`/`ds` objects)
`analysis`	an analysis specification (to inform us on the stratification to be used.

Value

a list including the "combined" model, summary and density/abundance estimates (dht output). Note that these are almost definitely not valid objects for their respective classes, they are only to be used to test statistics.

Author(s)

David L Miller

Make a user-readable model description string

Description

This takes a fitted mrds model object and returns a string that describes the detection function fitted and the fitted model's AIC.

Usage

model_description(model)
model_description(model)

Arguments

model

a fitted model

Value

a string describing the model

Author(s)

David L Miller

Model selection for key plus adjustment models

Description

Run model selection for a given analysis. The returned object is exactly as if the model has been run using ddf, so anything that can normally be done with a ddf object can be done with the return.

Usage

model_selection(analysis, debug = FALSE)
model_selection(analysis, debug = FALSE)

Arguments

`analysis`	a converted analysis
`debug`	display the call and name of the model before it is run, print AIC selection details

Details

Model selection is performed via AIC.

Value

fitted ddf object

Author(s)

David L Miller

Parse a Definition of a data filter

Description

Given a data filter "Definition", pre-processed by get_definitions, extract the useful information from it.

Usage

parse_definition.data_filter(df)
parse_definition.data_filter(df)

Arguments

`df`	a definition

Value

named list of defintions

Details

A definition consists either of a key=value pair or a name then key=value pairs separated by \ and terminated with ;.

Note that this function should be called for a single definition, usually using lapply.

Author(s)

David L Miller

Parse a Definition

Description

Given data from a "Definition", pre-processed by get_definitions, extract the useful information from it.

Usage

parse_definition.model(df)
parse_definition.model(df)

Arguments

`df`	a definition (vector of character strings)

Value

a list of lists

Details

See the "MCDS Command Language" section of the Distance manual for more information.

Note that this function should be called for a single definition, usually using lapply.

Author(s)

David L Miller

Converted distance analyses table

Description

Prints a table of the analyses that have been converted and their status from Distance for Windows.

Usage

## S3 method for class 'converted_distance_analyses'
print(x, ...)
## S3 method for class 'converted_distance_analyses'
print(x, ...)

Arguments

`x`	converted distance analyses
`...`	unused additional args for S3 compatibility

Print a converted distance analysis

Description

Prints details of an analyses that has been converted.

Usage

## S3 method for class 'converted_distance_analysis'
print(x, ...)
## S3 method for class 'converted_distance_analysis'
print(x, ...)

Arguments

`x`	converted distance analyses
`...`	unused additional args for S3 compatibility

Print tested statistics

Description

This is simply a print method to nicely ouput the results of test_stats.

Usage

## S3 method for class 'distance_stats_table'
print(x, ..., digits = NULL)
## S3 method for class 'distance_stats_table'
print(x, ..., digits = NULL)

Arguments

`x`	the result of a call to `test_stats`

Value

just prints the results

Author(s)

David L Miller

Run a converted distance sampling analysis

Description

Take a single converted analysis and run the model contained therein.

Usage

run_analysis(analysis, debug = FALSE)
run_analysis(analysis, debug = FALSE)

Arguments

`analysis`	a converted analysis
`debug`	display the call and name of the model before it is run, print AIC selection details

Details

A previous call to convert_project will return a list of projects. Only one analysis at a time can be run with run_analysis. If you wish to run all the analyses in the project, see the code below using lapply.

If an analysis needs to select the number of adjustment terms (for key plus adjustment detection functions) by AIC, then that selection is done at this stage.

Value

fitted ddf object

Author(s)

David L Miller

Examples

## Not run: 
library(readdst)

# load and convert the golftees project
project <- system.file("Golftees-example", package="readdst")
project <- paste0(project, "/Golftees")
converted <- convert_project(project)

# run the first analysis
analysis_1 <- run_analysis(converted[[1]], debug=TRUE)

# look at the resulting model output
summary(analysis_1)

# run all the analyses in a project
all_analyses_run <- lapply(converted, run_analysis)

## End(Not run)
## Not run: 
library(readdst)

# load and convert the golftees project
project <- system.file("Golftees-example", package="readdst")
project <- paste0(project, "/Golftees")
converted <- convert_project(project)

# run the first analysis
analysis_1 <- run_analysis(converted[[1]], debug=TRUE)

# look at the resulting model output
summary(analysis_1)

# run all the analyses in a project
all_analyses_run <- lapply(converted, run_analysis)

## End(Not run)

Set column names in data to be as in formulae

Description

Set column names in data to be as in formulae

Usage

set_covar_names(data, covnames)
set_covar_names(data, covnames)

Arguments

`data`	a `data.frame` of the data to be modelled
`covnames`	the covariates that are factors

Generate table of possible statistics to test

Description

To use get_stats we need a set of statistics to test. We also require their codes (to look up in the Distance for Windows database) and their equivalent values in mrds (or how to calculate those values). This function provides such a table.

Usage

stats_table(engine = "CDS")
stats_table(engine = "CDS")

Arguments

engine

which engine do we need to compute stats for?

Value

a data.frame with statistics Distance for Windows collects that have equivalents in mrds. The data.frame has three columns: Code, the numeric code for the statistic (as used in the Distance for Windows database); Name, the short name for this statistic; MRDS, the operation required to obtain the equivalent statistic in mrds; Description, a short description of the statistic.

Details

Data for this table (numeric code and descriptions) is from the DistIni.mdb which is shipped with Distance for Windows. See also https://github.com/distancedevelopment/readdst/wiki/distance-results-codes.

Additional notes

Note that the Cramer-von Mises p-value as recorded in Distance for Windows is only recorded to the nearest 0.1.

Author(s)

David L Miller

Test to see if Distance for Windows and R get the same results

Description

Tests the results stored in the Distance for Windows project file against those generated from running the same analysis in R.

Usage

test_stats(analysis, statuses = 1, tolerance = 0.01)
test_stats(analysis, statuses = 1, tolerance = 0.01)

Arguments

`analysis`	a converted (but not run) analysis
`statuses`	for which statuses should tests be run? See "Status", below (Defaults to `1`: analysis that ran without error or warning in Distance for Windows).
`tolerance`	the tolerance of the test (default 0.01)

Details

A previous call to convert_project will return a list of projects. Only one analysis at a time can be run with test_stats. If you wish to run all the analyses in the project, you can use lapply.

Value

a data.frame with five columns: Statistic, a description of the tested statistic; Distance_value the value of the statistic stored by Distance for Windows; mrds_value the value of the statistic calculated by mrds; Difference the proportional difference between the previous two columns (computed using all.equal); Pass a series of ticks, indicating that the value in the Difference column is less than tolerance.

Status

The status code is taken from Distance for Windows to indicate whether the analysis has been run yet and what the outcome was. Status codes are as follows:

0 analyses has not been run in Distance for Windows yet
1 analysis ran without errors or warnings
2 analysis ran with warnings
3 analysis ran with errors

If an analysis has a status of 0 or 3 there will usually not be any statistics attached to the analysis, so no tests will be run.

Note

Tests all available statistics.

Examples

## Not run: 
library(readdst)
# load the golftees sample project and convert it
project <- system.file("Golftees-example", package="readdst")
project <- paste0(project,"/Golftees")
converted <- convert_project(project)

# run tests for analysis 1
test_stats(converted[[1]])

## End(Not run)
## Not run: 
library(readdst)
# load the golftees sample project and convert it
project <- system.file("Golftees-example", package="readdst")
project <- paste0(project,"/Golftees")
converted <- convert_project(project)

# run tests for analysis 1
test_stats(converted[[1]])

## End(Not run)

Take a flatfile data.frame and make dht-compatible data.frames

Description

Given distance sampling survey data in flatfile format, convert it to the four tables required by dht.

Usage

unflatfile(data)
unflatfile(data)

Value

list of the four data.frames described in "Details".

Details

region.table data.frame with two columns: Region.Label, label for the region; Area, area of the region. region.table has one row for each stratum. If there is no stratification then region.table has one entry with Area corresponding to the total survey area.
sample.table data.frame mapping the regions to the samples (i.e. transects). There are three columns: Sample.Label, label for the sample; Region.Label, label for the region that the sample belongs to.; Effort, the effort expended in that sample (e.g. transect length).
obs.table data.frame mapping the individual observations (objects) to regions and samples. There should be three columns: object, the observation ID; Region.Label, label for the region that the sample belongs to; Sample.Label, label for the sample.
data a data.frame containing at least a column called distance. NOTE! If there is a column called size in the data then it will be interpreted as group/cluster size.

Note

Based on checkdata from package Distance.

Author(s)

David L. Miller

Generate table of unit conversions

Description

Returns a table of conversions between the units used in Distance for Windows. This is extracted from the DistIni.mdb default database.

Usage

units_table()
units_table()

Author(s)

David L Miller

Package 'readdst'

Help Index

Convert Distance for Windows analyses to R code

Description

Details

Examples

Work out the layer hierarchy in the Distance database

Description

Usage

Arguments

Author(s)

Convert a Distance for Windows project to be run in R

Description

Usage

Arguments

Value

Details

Author(s)

See Also

Converted analysis objects

Description

Details

Status

Converted distance data

Description

Create bins from a set of binned distances and a set of cutpoints.

Description

Usage

Arguments

Value

Author(s)

Get data from the Distance project database

Description

Usage

Arguments

Value

Note

Author(s)

Filter the data

Description

Usage

Arguments

Value

Author(s)

Extract data from a Distance database

Description

Usage

Arguments

Value

Author(s)

Extract definition information from tables

Description

Usage

Arguments

Value

Author(s)

Extract saved statistics for analyses

Description

Usage

Arguments

Details

Author(s)

Get the unit conversions for the data

Description

Usage

Arguments

Value

Author(s)

Estimate group size

Description

Usage

Arguments

Value

Author(s)

Make an analysis

Description

Usage

Arguments

Value

Author(s)

Make the control element of a call to `ddf`