From a message to the list on 09Feb21 regarding challenges with CTDS analysis.
I think there are 3 errors generated by a data set with these characteristics:
- flatfile without an
object field
- and truncation that causes transects with detections to become transects without detections
First problem is easily remedied by adding 'object' field to flatfile, however, I don't understand the need to have this line of code in safetruncate; object is not a mandatory field in flatfile I presume
- incidentally this line could also be simplified because the last expression is equivalent to
fsl defined 5 lines earlier. Just for simplification, unrelated to the problem at hand.
Second problem seems deeper, but possibly related to the handling of these "phantom transects" that arise when truncation robs them of their detections. After assigning an object field to the offending flatfile, the code runs without generating the previous warnings, but returns this result from dht2
> deer_dht_act
Summary statistics:
.Label Area CoveredArea Effort n k ER se.ER cv.ER
Total 16 4395.157 196934400 666 57 0 NaN NaN
ER is reported as zero, but it really is not (characteristic of CTDS analyses, when effort is measured in seconds, encounter rates are on the order of 10^-6, hence do not format well. The more worrying matter is the NaN for SE(ER).
The flow of function calls is
dht2 -> er_var_f -> varn
er_var_f is using the "classic" encounter rate variance formula "P2" by default for CTDS analysis. Hence, er_var_f is calling varn at this location in this manner
mutate(ER_var = varn(.data$Effort, .data$transect_n_observations,
type=er_est)) %>%
My (unproven) hypothesis is the cause of the NaN associated with encounter rate variances comes from the manner in which .data$transect_n_observations are computed for the "phantom" transects here.
transect_n_observations = length(na.omit(unique(.data$object))),
My guess is that from this line of code, transect_n_observations receives a value that might create problems for varn, but that is only a guess. Maybe this is not where the problem lies, but I didn't chase the rabbit any further down the hole than this.
The final problem I encountered when passing through this analysis is a surprising change in degrees of freedom associated with abundance estimate confidence intervals computed by dht2
> deer_dht <- dht2(deer_hr, flatfile=deer,strat_formula=~1,sample_fraction = 0.111, convert_units = conunits)
> deer_dht
Summary statistics:
.Label Area CoveredArea Effort n k ER se.ER cv.ER
Total 16 4395.157 196934400 666 57 0 NaN NaN
Abundance estimates:
.Label Estimate se cv LCI UCI df
Total 10 1.096 0.109 8 12 663
Component percentages of variance:
.Label Detection ER
Total NaN NaN
> deer$activity <- 0.46734925
> deer$activity.SE <- 0.03099745
> activity <- unique(deer[ , c("activity","activity.SE")])
> names(activity) <- c("rate", "SE")
> (mult <- list(creation=activity))
> deer_dht_act <- dht2(deer_hr, flatfile=deer,strat_formula=~1,
+ sample_fraction = 0.111,multipliers = mult, convert_units = conunits)
> deer_dht_act
Summary statistics:
.Label Area CoveredArea Effort n k ER se.ER cv.ER
Total 16 4395.157 196934400 666 57 0 NaN NaN
Abundance estimates:
.Label Estimate se cv LCI UCI df
Total 21 2.742 0.128 17 28 1240.099
Two calls to dht2 first without a multiplier, second with a multiplier, in which the degrees of freedom for the multiplier is unspecified. Notice the df is 663 without multipliers and 1240 when the multiplier is included. Perhaps that is correct, and matters little in this situation where the number of detections is so large, but it struck me as suspicious.
Of the three, the second is the most troubling. I do not include the data causing the problem as they were shared by the user. I think the phenomenon can be duplicated through the use of the DuikerCameraTraps data set in the Distance package if truncation is sufficiently extreme, as in
safetruncation(DuikerCameraTraps, 15, 3)
strong right truncation causes camera station B3 to lose all 8 of its detections triggering the warning:
Warning message:
In `[<-.data.frame`(`*tmp*`, flatfile$Sample.Label %in% sl_diff, :
provided 7 variables to replace 6 variables
because this line of code tries to assign NA to the object field that does not exist
From a message to the list on 09Feb21 regarding challenges with CTDS analysis.
I think there are 3 errors generated by a data set with these characteristics:
objectfieldFirst problem is easily remedied by adding 'object' field to flatfile, however, I don't understand the need to have this line of code in
safetruncate;objectis not a mandatory field inflatfileI presumefsldefined 5 lines earlier. Just for simplification, unrelated to the problem at hand.Second problem seems deeper, but possibly related to the handling of these "phantom transects" that arise when truncation robs them of their detections. After assigning an
objectfield to the offending flatfile, the code runs without generating the previous warnings, but returns this result fromdht2ER is reported as zero, but it really is not (characteristic of CTDS analyses, when effort is measured in seconds, encounter rates are on the order of 10^-6, hence do not format well. The more worrying matter is the NaN for SE(ER).
The flow of function calls is
dht2->er_var_f->varner_var_fis using the "classic" encounter rate variance formula "P2" by default for CTDS analysis. Hence,er_var_fis callingvarnat this location in this mannerMy (unproven) hypothesis is the cause of the NaN associated with encounter rate variances comes from the manner in which
.data$transect_n_observationsare computed for the "phantom" transects here.My guess is that from this line of code,
transect_n_observationsreceives a value that might create problems forvarn, but that is only a guess. Maybe this is not where the problem lies, but I didn't chase the rabbit any further down the hole than this.The final problem I encountered when passing through this analysis is a surprising change in degrees of freedom associated with abundance estimate confidence intervals computed by
dht2Two calls to
dht2first without a multiplier, second with a multiplier, in which the degrees of freedom for the multiplier is unspecified. Notice thedfis 663 without multipliers and 1240 when the multiplier is included. Perhaps that is correct, and matters little in this situation where the number of detections is so large, but it struck me as suspicious.Of the three, the second is the most troubling. I do not include the data causing the problem as they were shared by the user. I think the phenomenon can be duplicated through the use of the
DuikerCameraTrapsdata set in theDistancepackage if truncation is sufficiently extreme, as instrong right truncation causes camera station B3 to lose all 8 of its detections triggering the warning:
because this line of code tries to assign
NAto theobjectfield that does not exist