Chapter 11 Statistical tools for studying migration
11.1 Linear models
Linear models are the basics to most statistical modelling also for the study of migration. It is highly recommended to get an understanding of linear modelling in empirical sciences. Our course is not a statistics course, but we will use statistics for the group work during the course. As a preparation for the course, we recommend reading Chapters 2, 3 and 11 of the online book on linear modelling.
11.2 Distribution models for counts of resting or migrating individuals
Count data are classically analysed using generalized linear (mixed) models (GLMM). Typically, the Poisson or negative-binomial models serve as a first model choice when the outcome variable is a count data. In most cases, depending on the data structures, these models are expanded by including spatial or temporal structures, accounting for an excess of zero values (zero-inflation) or a large variance (overdispersion). Any textbook introducing linear models, linear mixed models, generalized linear models and generalized linear mixed models is recommended. We particularly like the introduction to hierarchical models by Gelman and Hill (2007). Of course, we also recommend our own introduction to linear models that follows the philosophy of Gelman and Hill but is written by ecologists for ecologists, i.e. less technically (F. Korner-Nievergelt et al. 2015). An open access second edition of the most important parts of the book is available.
Example studies that analysed count data for the study of bird migration:
- Distribution of migratory species during the non-breeding season within Africa (Wisz, Walther, and Rahbek 2007).
11.3 Migratory connectivity models
Migratory connectivity is the extent to which breeding areas are connected with non-breeding areas via migrating individuals (Webster et al. 2002; S. Bauer and Hoye 2014). A (multidimensional) measure of migratory connectivity are the proportions of individuals from specific, predefined, breeding areas that spend the non-breeding season in specific, predefined, non-breeding areas. Thus, migratory connectivity can be quantified based on distinct breeding areas and distinct non-breeding areas. Alternatively, migratory connectivity may be visualised using different distribution maps by population (Schirmer 2021). Estimates of migratory connectivity can be obtained by tracking individuals from their breeding grounds to their non-breeding ground. However, because migratory connectivity is a characteristic of populations (not of individuals) a lot of individuals need to be tracked in order to get a reliable estimate of migratory connectivity. Therefore, reencounters of marked individuals that are often available for a large number of individuals and from many different populations are well suited to study migratory connectivity. However, the spatial heterogeneity of reencounter probability introduces a bias in the raw mark reencounter data. Stochastic models to correct for this bias have been developed by Kania and Busse (1987), Kendall, Conn, and Hines (2006), Bauthian et al. (2007) and Thorup and Conn (2009). The model has further been expanded by Fränzi Korner-Nievergelt, Liechti, and Hahn (2012), Cohen et al. (2014), F. Korner-Nievergelt, Liechti, and Thorup (2014) and Schirmer (2021). Combination of different data sources such as ring recoveries, GPS tracking data, geolocator data or stable isotopic data further improves the accuracy of the migratory connectivity estimates (Fränzi Korner-Nievergelt et al. 2017; Rönn et al. 2020).
11.4 Estimating survival based on mark-recapture data: The Cormack-Jolly-Seber model
To study population processes, individuals often are marked and later recaptured or re-sighted. Such mark-recapture/re-sight data contain information on demographic parameters such as survival, movement and population size. The difficulty with mark-recapture data is that not all marked individuals are recaptured during each capture occasion. If an individual has not been captured, it may have died, emigrated from the study area, or it may have escaped being captured. For the study of the demographic parameters it is important to disentangle capture probability from the parameters of interest. To do so, Cormack (1964), Jolly (1965), and Seber (1965) developed a statistical model, the Cormack-Jolly-Seber model (CJS), which became the method used by myriad population biologists (Lebreton et al. 1992; Thomson and al 2009). The software MARK has become a standard software for the analyses of mark-recapture data (White and Burnham 1999; Cooch and White 2016). The technique is still being developed for the study of various aspects of demographic processes by a lively society of ecologists and statisticians. Here, we introduce the Cormack-Jolly-Seber (CJS) model for estimating apparent survival and recapture probability, or re-sighting probability in cases of re-sights, respectively. The correct term “apparent” survival expresses that in a mark-recapture/re-sight data set from one study site individuals that permanently emigrated from the study site cannot be distinguished from individuals that died. “Apparent” survival is a product of survival and site fidelity.
The CJS model explicitly models the capture process (the observation process) conditional on a model for apparent survival (the biological process). By including an observation process model, the recapture probability and thus the proportion of individuals still alive and present but not recaptured is estimated and taken into account while estimating the probability that an individual survives and stays in the study area. For example, mark-recapture data of Reed warblers Acrocephaus scirpaceus at the constant ringing effort site (CES) in the Wauwilermoos (LU) can be presented as in Table 11.1.
Year of first capture | n marked | 2014 (2) | 2015 (3) | 2016 (4) | 2017 (5) | 2018 (6) |
---|---|---|---|---|---|---|
2013 (1) | 107 | 13 | 7 | 4 | 5 | 1 |
2014 (2) | 76 | - | 6 | 2 | 1 | 1 |
2015 (3) | 65 | - | - | 4 | 3 | 0 |
2016 (4) | 54 | - | - | - | 7 | 2 |
2017 (5) | 64 | - | - | - | - | 3 |
The probability that an individual of all marked individuals released at occasion \(k\) is recaptured at occasion \(t\) is a function of apparent survival probability \(\phi_{t}\) and recapture probability \(p_t\) (Table 11.2). The recapture probability \(p\) is the probability that an individual that is alive and present in the study area is captured. The model parameters (apparent survival \(\phi_t\) and recapture probability \(p_t\)) can be estimated by fitting a multinomial model to the number of recaptured individuals at the different occasions. Such a multinomial model is fitted for each cohort of individuals released at the same occasion with the cell probabilities of Table 11.2.
Occ | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|
1 | \(\phi_1p_2\) | \(\phi_1(1-p_2)\phi_2p_3\) | \(\phi_1(1-p_2)\phi_2(1-p_3)\phi_3p_4\) | \(\prod_{t=2}^4(\phi_{t-1}(1-p_t))\phi_4p_5\) | … |
2 | 0 | \(\phi_2p_3\) | \(\phi_2(1-p_3)\phi_3p_4\) | \(\phi_2(1-p_3)\phi_3(1-p_4)\phi_4p_5\) | … |
3 | 0 | 0 | \(\phi_3p_4\) | \(\phi_3(1-p_4)\phi_4p_5\) | … |
4 | 0 | 0 | 0 | \(\phi_4p_5\) | … |
5 | 0 | 0 | 0 | 0 | \(\phi_5p_6\) |
The original CJS model allows estimating apparent survival and recapture probability separately for each time period except for the last time period. For the last time period, only the product \(\phi_{t-1}p_t\) is estimable. More stringent model assumptions, such as constant survival over time or relating apparent survival to a linear predictor can give separate apparent survival and recapture estimates also for the last time period.
The CJS model can alternatively be formulated as a partially hidden Markov model, and fitted to individual capture histories. Individual based model formulations are appealing because they allow including individual specific predictors and accounting for more complex correlation structures. The individual based model formulation consists of two logistic regressions, i.e. two Bernoulli-models. The first Bernoulli variable is the partially observed indicator \(z_{it}\) that indicates whether individual \(i\) is alive and present at time \(t\):
\(z_{it} \sim Bernoulli(z_{it-1}\phi_{it})\)
In other words, an individual \(i\) is alive and in the study area at time \(t\) (\(z_{it} = 1\)) with probability \(\phi_{it}\), if it was alive and in the study area at time \(t-1\) (\(z_{it-1} = 1\)), or 0, if it was dead or had permanently emigrated from the study area at time \(t-1\) (\(z_{it-1} = 0\)). A linear predictor for the logit (or another link function) of \(\phi_{it}\) can be added to study factors affecting apparent survival. This part of the model looks like an autoregressive logistic regression. However, the variable \(z\) is only partly observed. It is known that an individual is alive (\(z_{it} = 1\)) when it has been recaptured or resighted at time \(t\) (\(y_{it} = 1\)) or at any time point later. When an individual is not recaptured (\(y_{it} = 0\)) during any subsequent capture occasion, we do not know whether it has died or emigrated, or whether it is still alive but has not been captured or resighted. The capture process is added to the model as a second Bernoulli process.
\(y_{it} \sim Bernoulli(z_{it}p_{it})\)
Also, a linear predictor can be added for the logit of \(p_{it}\) to study factors affecting the capture probability.
Software that provide algorithms for fitting CJS models:
11.5 Multi-state models to estimate movement rates
Multi-state models are used for analysing changes between different states of animals. If the states are defined to be locations, a multi-state model analyses movements between different locations. The central parameters of such models are the probabilities that an animal moves from location A to location B between two time points. Multi-state models have been introduced by Arnason (1972) and Schwarz, Schweigert, and Arnason (1993). They have later been used by many authors for a variety of different study questions. With a large number of locations, the number of possible movements, and thus the number of model parameters, becomes very large and a very high sample size is needed to estimate all parameters. But for some model formulations, even with a very high sample size, some parameters may not be identifiable (Brownie et al. 1993; Gimenez, Choquet, and Lebreton 2003; King and Brooks 2003). Antoniazza, Korner-Nievergelt, and Keller (2012) used a multi-state model to describe the migration of juvenile and adult cormorants Phalacrocorax carbo that were colour marked at Lake Neuchatel during the breeding seasons.
11.7 The combination of different data sources
In future, we expect that more ornithologists combine different data sources in formal models, the integrated models (e.g. Fränzi Korner-Nievergelt et al. (2017)). Bayesian techniques and user-friendly Bayesian software such as BUGS and Stan made these techniques available for ecologists.