Chapter 11 Statistical tools for studying migration

11.1 Linear models

Linear models are the basics to most statistical modelling also for the study of migration. It is highly recommended to get an understanding of linear modelling in empirical sciences. Our course is not a statistics course, but we will use statistics for the group work during the course. As a preparation for the course, we recommend reading Chapters 2, 3 and 11 of the online book on linear modelling.

11.2 Distribution models for counts of resting or migrating individuals

Count data are classically analysed using generalized linear (mixed) models (GLMM). Typically, the Poisson or negative-binomial models serve as a first model choice when the outcome variable is a count data. In most cases, depending on the data structures, these models are expanded by including spatial or temporal structures, accounting for an excess of zero values (zero-inflation) or a large variance (overdispersion). Any textbook introducing linear models, linear mixed models, generalized linear models and generalized linear mixed models is recommended. We particularly like the introduction to hierarchical models by Gelman and Hill (2007). Of course, we also recommend our own introduction to linear models that follows the philosophy of Gelman and Hill but is written by ecologists for ecologists, i.e. less technically (F. Korner-Nievergelt et al. 2015). An open access second edition of the most important parts of the book is available.

Example studies that analysed count data for the study of bird migration:

Distribution of migratory species during the non-breeding season within Africa (Wisz, Walther, and Rahbek 2007).

11.3 Migratory connectivity models

Migratory connectivity is the extent to which breeding areas are connected with non-breeding areas via migrating individuals (Webster et al. 2002; S. Bauer and Hoye 2014). A (multidimensional) measure of migratory connectivity are the proportions of individuals from specific, predefined, breeding areas that spend the non-breeding season in specific, predefined, non-breeding areas. Thus, migratory connectivity can be quantified based on distinct breeding areas and distinct non-breeding areas. Alternatively, migratory connectivity may be visualised using different distribution maps by population (Schirmer 2021). Estimates of migratory connectivity can be obtained by tracking individuals from their breeding grounds to their non-breeding ground. However, because migratory connectivity is a characteristic of populations (not of individuals) a lot of individuals need to be tracked in order to get a reliable estimate of migratory connectivity. Therefore, reencounters of marked individuals that are often available for a large number of individuals and from many different populations are well suited to study migratory connectivity. However, the spatial heterogeneity of reencounter probability introduces a bias in the raw mark reencounter data. Stochastic models to correct for this bias have been developed by Kania and Busse (1987), Kendall, Conn, and Hines (2006), Bauthian et al. (2007) and Thorup and Conn (2009). The model has further been expanded by Fränzi Korner-Nievergelt, Liechti, and Hahn (2012), Cohen et al. (2014), F. Korner-Nievergelt, Liechti, and Thorup (2014) and Schirmer (2021). Combination of different data sources such as ring recoveries, GPS tracking data, geolocator data or stable isotopic data further improves the accuracy of the migratory connectivity estimates (Fränzi Korner-Nievergelt et al. 2017; Rönn et al. 2020a).

11.4 Estimating survival based on mark-recapture data: The Cormack-Jolly-Seber model

To study population processes, individuals often are marked and later recaptured or re-sighted. Such mark-recapture/re-sight data contain information on demographic parameters such as survival, movement and population size. The difficulty with mark-recapture data is that not all marked individuals are recaptured during each capture occasion. If an individual has not been captured, it may have died, emigrated from the study area, or it may have escaped being captured. For the study of the demographic parameters it is important to disentangle capture probability from the parameters of interest. To do so, Cormack (1964), Jolly (1965), and Seber (1965) developed a statistical model, the Cormack-Jolly-Seber model (CJS), which became the method used by myriad population biologists (Lebreton et al. 1992; D. L. Thomson and al 2009). The software MARK has become a standard software for the analyses of mark-recapture data (White and Burnham 1999; Cooch and White 2016). The technique is still being developed for the study of various aspects of demographic processes by a lively society of ecologists and statisticians. Here, we introduce the Cormack-Jolly-Seber (CJS) model for estimating apparent survival and recapture probability, or re-sighting probability in cases of re-sights, respectively. The correct term “apparent” survival expresses that in a mark-recapture/re-sight data set from one study site individuals that permanently emigrated from the study site cannot be distinguished from individuals that died. “Apparent” survival is a product of survival and site fidelity.

The CJS model explicitly models the capture process (the observation process) conditional on a model for apparent survival (the biological process). By including an observation process model, the recapture probability and thus the proportion of individuals still alive and present but not recaptured is estimated and taken into account while estimating the probability that an individual survives and stays in the study area. For example, mark-recapture data of Reed warblers Acrocephaus scirpaceus at the constant ringing effort site (CES) in the Wauwilermoos (LU) can be presented as in Table 11.1.

Table 11.1: Number of Reed warblers marked at Wauwilermoos per year of first capture and number of recaptured individuals thereof in the subsequent years. The capture occasion numbers are given in brackets behind the year numbers.
Year of first capture	n marked	2014 (2)	2015 (3)	2016 (4)	2017 (5)	2018 (6)
2013 (1)	107	13	7	4	5	1
2014 (2)	76	-	6	2	1	1
2015 (3)	65	-	-	4	3	0
2016 (4)	54	-	-	-	7	2
2017 (5)	64	-	-	-	-	3

The probability that an individual of all marked individuals released at occasion \(k\) is recaptured at occasion \(t\) is a function of apparent survival probability \(\phi_{t}\) and recapture probability \(p_t\) (Table 11.2). The recapture probability \(p\) is the probability that an individual that is alive and present in the study area is captured. The model parameters (apparent survival \(\phi_t\) and recapture probability \(p_t\)) can be estimated by fitting a multinomial model to the number of recaptured individuals at the different occasions. Such a multinomial model is fitted for each cohort of individuals released at the same occasion with the cell probabilities of Table 11.2.

Table 11.2: . Multinomial cell probabilities of a Cormack-Jolly-Seber model. Note that the cell probabilities are formulated so that one release of a marked individual is resulting in one recapture at maximum. Therefore, the data in Table 11.1 need to be re-arranged so that every recapture becomes a new release of a marked bird (such re-arranged data is called an m-array).
Occ	2	3	4	5	6
1	\(\phi_1p_2\)	\(\phi_1(1-p_2)\phi_2p_3\)	\(\phi_1(1-p_2)\phi_2(1-p_3)\phi_3p_4\)	\(\prod_{t=2}^4(\phi_{t-1}(1-p_t))\phi_4p_5\)	…
2	0	\(\phi_2p_3\)	\(\phi_2(1-p_3)\phi_3p_4\)	\(\phi_2(1-p_3)\phi_3(1-p_4)\phi_4p_5\)	…
3	0	0	\(\phi_3p_4\)	\(\phi_3(1-p_4)\phi_4p_5\)	…
4	0	0	0	\(\phi_4p_5\)	…
5	0	0	0	0	\(\phi_5p_6\)

The original CJS model allows estimating apparent survival and recapture probability separately for each time period except for the last time period. For the last time period, only the product \(\phi_{t-1}p_t\) is estimable. More stringent model assumptions, such as constant survival over time or relating apparent survival to a linear predictor can give separate apparent survival and recapture estimates also for the last time period.

The CJS model can alternatively be formulated as a partially hidden Markov model, and fitted to individual capture histories. Individual based model formulations are appealing because they allow including individual specific predictors and accounting for more complex correlation structures. The individual based model formulation consists of two logistic regressions, i.e. two Bernoulli-models. The first Bernoulli variable is the partially observed indicator \(z_{it}\) that indicates whether individual \(i\) is alive and present at time \(t\):
\(z_{it} \sim Bernoulli(z_{it-1}\phi_{it})\)
In other words, an individual \(i\) is alive and in the study area at time \(t\) (\(z_{it} = 1\)) with probability \(\phi_{it}\), if it was alive and in the study area at time \(t-1\) (\(z_{it-1} = 1\)), or 0, if it was dead or had permanently emigrated from the study area at time \(t-1\) (\(z_{it-1} = 0\)). A linear predictor for the logit (or another link function) of \(\phi_{it}\) can be added to study factors affecting apparent survival. This part of the model looks like an autoregressive logistic regression. However, the variable \(z\) is only partly observed. It is known that an individual is alive (\(z_{it} = 1\)) when it has been recaptured or resighted at time \(t\) (\(y_{it} = 1\)) or at any time point later. When an individual is not recaptured (\(y_{it} = 0\)) during any subsequent capture occasion, we do not know whether it has died or emigrated, or whether it is still alive but has not been captured or resighted. The capture process is added to the model as a second Bernoulli process.
\(y_{it} \sim Bernoulli(z_{it}p_{it})\)
Also, a linear predictor can be added for the logit of \(p_{it}\) to study factors affecting the capture probability.

Software that provide algorithms for fitting CJS models:

MARK
R-package marked CRAN, example in a group work of this course (Laake, Johnson, and Conn 2013)
Stan, chapter 14.5 in F. Korner-Nievergelt et al. (2015)
BUGS: WinBUGS, OpenBUGS, Jags. Chapter 7 in Kéry and Schaub (2012).
E-Surge. Choquet, Rouan, and Pradel (2009).

11.5 Multi-state models to estimate movement rates

Multi-state models are used for analysing changes between different states of animals. If the states are defined to be locations, a multi-state model analyses movements between different locations. The central parameters of such models are the probabilities that an animal moves from location A to location B between two time points. Multi-state models have been introduced by Arnason (1972) and Schwarz, Schweigert, and Arnason (1993). They have later been used by many authors for a variety of different study questions. With a large number of locations, the number of possible movements, and thus the number of model parameters, becomes very large and a very high sample size is needed to estimate all parameters. But for some model formulations, even with a very high sample size, some parameters may not be identifiable (Brownie et al. 1993; Gimenez, Choquet, and Lebreton 2003; King and Brooks 2003). Antoniazza, Korner-Nievergelt, and Keller (2012) used a multi-state model to describe the migration of juvenile and adult cormorants Phalacrocorax carbo that were colour marked at Lake Neuchatel during the breeding seasons.

11.6 Individual based hidden Markov model for bird behaviour

Individual based hidden Markov models are used for analysing tracking data such as geolocator data or satellite telemetry data. Such data contain for each individual location measurements repeated at regular time intervals over a longer time span. Normally a measurement error is involved in the location measurement. For example, spatial error in a location measurement by light-based geolocation can be up to 1000 km (Lisovski et al. 2012) e.g. because of shading by clouds or vegetation. The individual based hidden Markov models assume that each individual has a (hidden) “true” location at each time interval, and that the recorded locations are measurements of the true locations with a measurement error. Both the true location as well as the measurement error can depend on environmental and/or individual characteristics. Therefore, the model can be used to estimate the true locations of the individuals over the time. Further, general patterns of migration can be studied, e.g. factors that influence migration speed, directions or decisions to depart on a migratory flight. A seminal paper that introduces these models for the study of seal migration is Jonsen, Flemming, and Myers (2005). The textbook by Hooten et al. (2017) give an understandable introduction to animal movement modelling.

11.7 The combination of different data sources

In future, we expect that more ornithologists combine different data sources in formal models, the integrated models (e.g. Fränzi Korner-Nievergelt et al. (2017)). Bayesian techniques and user-friendly Bayesian software such as BUGS and Stan made these techniques available for ecologists.

11.8 References

References

Antoniazza, M., F. Korner-Nievergelt, and V. Keller. 2012. “Les Mouvements Des Grands Cormorans Phalacrocorax Carbo Bagueés Dans La Colonie Du Fanel, Lac de Neuchâtel.” Nos Oiseaux 59: 11–22.

Arnason, A. N. 1972. “Parameter Estimates from Mark-Recapture Experiments on Two Populations Subject to Migration and Death.” Researches on Population Ecology 13: 97–113.

Bauer, S., and B. J. Hoye. 2014. “Migratory Animals Couple Biodiversity and Ecosystem Functioning Worldwide.” Science 344: 1242552-1-8. https://doi.org/10.1126/science.1242552.

Bauthian, I., F. Grossmann, Y. Ferrand, and R. Julliard. 2007. “Quantifying the Origin of Woodcock Wintering in France.” Journal of Wildlife Management 71: 701–5.

Brownie, C., J. E. Hines, J. D. Nichols, K. H. Pollock, and J. B. Hestbeck. 1993. “Capture-Recapture Studies for Multiple Strata Including Non-Markovian Transitions.” Biometrics 49: 1173–87.

Choquet, R., L. Rouan, and R. Pradel. 2009. “Program e-Surge: A Software Application for Fitting Multievent Models.” In Modeling Demographic Processes in Marked Populations, edited by David L. Thomson, Evan G. Cooch, and Michael J. Conroy, 845–65. Environmental and Ecological Statistics. New York: Springer. https://doi.org/10.1007/978-0-387-78151-8{\_}39.

Cohen, E. B., J. A. Hostetler, J. A. Royle, and P. P. Marra. 2014. “Estimating Migratory Connectivity of Birds When Re-Encounter Probabilities Are Heterogeneous.” Ecology and Evolution 4: 1659–70.

Cooch, E. G., and G. C. White. 2016. Program MARK: A Gentle Introduction. http://www.phidot.org/software/mark/docs/book/.

Cormack, R. M. 1964. “Estimates of Survival from the Sighting of Marked Animals.” Biometrika 51: 429–38.

Gelman, A., and J. Hill. 2007. Data Analysis Using Regression and Multilevel / Hierarchical Models. Cambridge: Cambridge University Press.

Gimenez, O., R. Choquet, and J.-D. Lebreton. 2003. “Parameter Redundancy in Multistate Capture-Recapture Models.” Biometrical Journal 45: 704–22.

Hooten, Mevin B., Devin S. Johnson, Brett T. McClintock, and Juan M. Morales. 2017. Animal Movement. Statistical Models for Telemetry Data. New York: CRC Press.

Jolly, G. 1965. “Explicit Estimates from Capture-Recapture Data with Both Death and Immigration-Stochastic Model.” Biometrika 52: 225–47.

Jonsen, Ian D., Joanna Mills Flemming, and Ransom A. Myers. 2005. “Robust State-Space Modeling of Animal Movement Data.” Ecology 86: 2874–80.

Kania, Wojciech, and Przemyslaw Busse. 1987. “An Analysis of the Recovery Distribution Based on Finding Probabilities.” Acta Ornithologica 23: 121–28.

Kendall, William L., P. B. Conn, and J. E. Hines. 2006. “Combining Multistate Capture-Recapture Data with Tag Recoveries to Estimate Demographic Parameters.” Ecology 87: 169–77.

Kéry, Marc, and Michael Schaub. 2012. Bayesian Population Analysis Using WinBUGS. Amsterdam: Elsevier.

King, R., and S. P. Brooks. 2003. “Closed-Form Likelihoods for Arnason–Schwarz Models.” Biometrika 90: 435–44.

Korner-Nievergelt, F., Felix Liechti, and K. Thorup. 2014. “A Bird Distribution Model for Ring Recovery Data: Where Do the European Robins Go?” Ecology and Evolution 4 (6): 720–31.

Korner-Nievergelt, Fränzi, Felix Liechti, and Steffen Hahn. 2012. “Migratory Connectivity Derived from Sparse Ring Reencounter Data with Unknown Numbers of Ringed Birds.” Journal of Ornithology 153: 771–82. https://doi.org/10.1007/s10336-011-0793-z.

Korner-Nievergelt, Fränzi, Céline Prévot, Steffen Hahn, Lukas Jenni, and Felix Liechti. 2017. “The Integration of Mark Re-Encounter and Tracking Data to Quantify Migratory Connectivity.” Ecological Modelling 344: 87–94. https://doi.org/10.1016/j.ecolmodel.2016.11.009.

Korner-Nievergelt, F., T. Roth, Stefanie von Felten, J. Guélat, B. Almasi, and P. Korner-Nievergelt. 2015. Bayesian Data Analysis in Ecolog Using Linear Models with r, BUGS, and Stan. New York: Elsevier.

Laake, J. L., D. S. Johnson, and P. B. Conn. 2013. “Marked: An r Package for Maximumlikelihood and Markov ChainMonte Carlo Analysis of Capture-Recapture Data.” Methods in Ecology and Evolution 4: 885–90.

Lebreton, J.-D., K. P. Burnham, J. Clobert, and D. R. Anderson. 1992. “Modelling Survival and Testing Biological Hypotheses Using Marked Animals: A Unified Approach with Case Studies.” Ecological Monographs 62 (1): 67–118.

Lisovski, Simeon, Chris M. Hewson, Raymond H. G. Klaassen, Fränzi Korner-Nievergelt, Mikkel W. Kristensen, and Steffen Hahn. 2012. “Geolocation by Light: Accuracy and Precision Affected by Environmental Factors.” Methods in Ecology and Evolution 3: 603–312. https://doi.org/10.1111/j.2041-210X.2012.00185.x.

Rönn, Jan A. C. von, Martin U. Grüebler, Thord Fransson, Ulrich Köppen, and Fränzi Korner-Nievergelt. 2020a. “Integrating Stable Isotopes, Parasite, and Ring-Reencounter Data to Quantify Migratory Connectivity-a Case Study with Barn Swallows Breeding in Switzerland, Germany, Sweden, and Finland.” Ecology and Evolution 10 (4): 2225–37. https://doi.org/10.1002/ece3.6061.

Schirmer, Saskia. 2021. “Modeling Spatial Patterns of Survival, Space Use and Recovery Probability.” Mathematisch-Naturwissenschaftlichen Fakultät.

Schwarz, Carl J., Jake F. Schweigert, and A. Neil Arnason. 1993. “Estimating Migration Rates Using Tag-Recovery Data.” Biometrics 49: 177–93.

Seber, G. A. F. 1965. “A Note on the Multiple-Recapture Census.” Biometrika 52: 249–59.

Thomson, D. L., and et al, eds. 2009. Modeling Demographic Processees in Marked Populations. Vol. Environmental and Ecological Statistics. New York: Springer.

Thorup, K., and P. B. Conn. 2009. “Estimating the Seasonal Distribution of Migrant Bird Species: Can Standard Ringing Data Be Used?” In Modeling Demographic Proccesses in Marked Populations, edited by D. L. Thomson, E. G. Cooch, and M. J. Conroy, 1107–17. New York: Springer.

Webster, Michael S, Peter P Marra, Susan M Haig, Staffan Bensch, and Richard Holmes. 2002. “Links Between Worlds: Unraveling Migratory Connectivity.” Trends in Ecology and Evolution 17 (2): 76–83. https://doi.org/10.1016/S0169-5347(01)02380-1.

White, Fary C., and Kenneth P. Burnham. 1999. “Program MARK: Survival Estimation from Populations of Marked Animals.” Bird Study 46 (suppl.): S120–189.

Wisz, M. S., B. A. Walther, and C. Rahbek. 2007. “Using Potential Distributions to Explore Determinants of Western Palearctic Migratory Songbird Species Richness in Sub-Saharan Africa.” Journal of Biogeography 34: 828–41.