Errors thrown by camera trap data set

From a message to the list on [09Feb21](https://groups.google.com/g/distance-sampling/c/5biRY2yrwm0) regarding challenges with CTDS analysis.

I think there are 3 errors generated by a data set with these characteristics:
- flatfile *without* an `object` field
- *and* truncation that causes transects with detections to become transects without detections

First problem is easily remedied by adding 'object' field to flatfile, however, I don't understand the need to have [this line of code](https://github.com/DistanceDevelopment/Distance/blob/8b5eece0ac301ff4ff8060db22276e8a5b9f375f/R/safetruncate.R#L24) in `safetruncate`; `object` is not a mandatory field in `flatfile` I presume
- incidentally [this line](https://github.com/DistanceDevelopment/Distance/blob/8b5eece0ac301ff4ff8060db22276e8a5b9f375f/R/safetruncate.R#L19) could also be simplified because the last expression is equivalent to `fsl` defined 5 lines earlier.  Just for simplification, unrelated to the problem at hand.

***

Second problem seems deeper, but possibly related to the handling of these "phantom transects" that arise when truncation robs them of their detections.  After assigning an `object` field to the offending flatfile, the code runs without generating the previous warnings, but returns this result from `dht2`

```
> deer_dht_act
Summary statistics:
 .Label Area CoveredArea    Effort   n  k ER se.ER cv.ER
  Total   16    4395.157 196934400 666 57  0   NaN   NaN
```

ER is reported as zero, but it really is not (characteristic of CTDS analyses, when effort is measured in seconds, encounter rates are on the order of 10^-6, hence do not format well.  The more worrying matter is the NaN for SE(ER).

The flow of function calls is

`dht2` -> `er_var_f` -> `varn`

`er_var_f` is using the "classic" encounter rate variance formula "P2" by default for CTDS analysis.  Hence, `er_var_f` is calling `varn` at [this location](https://github.com/DistanceDevelopment/Distance/blob/8b5eece0ac301ff4ff8060db22276e8a5b9f375f/R/ER_var_f.R#L57) in this manner

```
        mutate(ER_var = varn(.data$Effort, .data$transect_n_observations,
                             type=er_est)) %>%
```
My (unproven) hypothesis is the cause of the NaN associated with encounter rate variances comes from the manner in which `.data$transect_n_observations` are computed for the "phantom" transects [here](https://github.com/DistanceDevelopment/Distance/blob/8b5eece0ac301ff4ff8060db22276e8a5b9f375f/R/dht2.R#L482).

```
             transect_n_observations = length(na.omit(unique(.data$object))),
```

My guess is that from this line of code, `transect_n_observations` receives a value that might create problems for `varn`, but that is only a guess.  Maybe this is not where the problem lies, but I didn't chase the rabbit any further down the hole than this.  

***

The final problem I encountered when passing through this analysis is a surprising change in degrees of freedom associated with abundance estimate confidence intervals computed by `dht2`

```
> deer_dht <- dht2(deer_hr, flatfile=deer,strat_formula=~1,sample_fraction = 0.111, convert_units = conunits)
> deer_dht
Summary statistics:
 .Label Area CoveredArea    Effort   n  k ER se.ER cv.ER
  Total   16    4395.157 196934400 666 57  0   NaN   NaN

Abundance estimates:
 .Label Estimate    se    cv LCI UCI  df
  Total       10 1.096 0.109   8  12 663

Component percentages of variance:
 .Label Detection  ER
  Total       NaN NaN

> deer$activity <- 0.46734925
> deer$activity.SE <- 0.03099745
> activity <- unique(deer[ , c("activity","activity.SE")])
> names(activity) <- c("rate", "SE")
> (mult <- list(creation=activity))

> deer_dht_act <- dht2(deer_hr, flatfile=deer,strat_formula=~1,
+                      sample_fraction = 0.111,multipliers = mult, convert_units = conunits)
> deer_dht_act
Summary statistics:
 .Label Area CoveredArea    Effort   n  k ER se.ER cv.ER
  Total   16    4395.157 196934400 666 57  0   NaN   NaN

Abundance estimates:
 .Label Estimate    se    cv LCI UCI       df
  Total       21 2.742 0.128  17  28 1240.099
```
Two calls to `dht2` first without a multiplier, second with a multiplier, in which the degrees of freedom for the multiplier is unspecified.  Notice the `df` is 663 without multipliers and 1240 when the multiplier is included.  Perhaps that is correct, and matters little in this situation where the number of detections is so large, but it struck me as suspicious.

***
Of the three, the second is the most troubling.  I do not include the data causing the problem as they were shared by the user.  I think the phenomenon can be duplicated through the use of the `DuikerCameraTraps` data set in the `Distance` package if truncation is sufficiently extreme, as in

```
safetruncation(DuikerCameraTraps, 15, 3)
```

strong right truncation causes camera station B3 to lose all 8 of its detections triggering the warning:
```
Warning message:
In `[<-.data.frame`(`*tmp*`, flatfile$Sample.Label %in% sl_diff,  :
  provided 7 variables to replace 6 variables
```
because [this line of code](https://github.com/DistanceDevelopment/Distance/blob/8b5eece0ac301ff4ff8060db22276e8a5b9f375f/R/safetruncate.R#L24) tries to assign `NA` to the `object` field that does not exist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors thrown by camera trap data set #83

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Errors thrown by camera trap data set #83

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions