Using weighted regression model for estimating cohort effect in age-period contingency table data

Background Recently, the multiphase method was proposed to estimate cohort effects after removing the effects of age and period in age-period contingency table data. Hepatocellular carcinoma (HCC) is the most common primary malignancy of the liver and is strongly associated with cirrhosis, due to both alcohol and viral etiologies. In epidemiology, age-period-cohort (APC) model can be used to describe (or predict) the secular trend in HCC mortality. Results The confidence interval (CI) of the weighted estimates was found to be relatively narrow (compared to unweighted estimates). Moreover, for males, the mortality trend reverses itself during 2006–2010 was found from an increasing trend into a slightly deceasing trend. For females, the increasing trend reverses (earlier than males) itself during 2001–2005. Conclusions The weighted estimation of the regression model is recommended for the multiphase method in estimating the cohort effects in age-period contingency table data. Impact The regression model can be modified through the weighted average estimate of the effects with narrower CI of each cohort. Methods After isolating the residuals during the median polish phase, the final phase is to estimate the magnitude of the cohort effects using the regression model of these residuals on the cohort category with the weight equal to the occupied proportion according to the number of death of HCC in each cohort.


INTRODUCTION
Evaluating disease and mortality patterns over time has become popular in understanding the utility of disease etiology in public health.However, the trend assessment of age-specific mortality presented inconsistent patterns between age groups.Birth cohort analyses are valuable in predicting future increases (or decreases) of diseases under the same pattern among birth cohorts.In epidemiology, one popular interpretation on the relationship between age, period, and cohort (APC) variables is that age and period interact to create unique generational experiences.Age effects are correlated with the outcome at various ages, such as deaths caused by cancer.Simultaneously, period effects influenced all ages over time.Birth cohort effects presented changes across groups with the same birth year who had the same outcome during the same period.Disease mortality is not only influenced by birth cohort effects but also affected by age and period.For example, if a person born in 1980 (i.e., birth cohort effects) is highly at risk of dying due to cardiovascular disease during his/her lifetime, it will take at least 30 years (i.e., period effects) for him/her to die during adulthood (i.e., age effects) at the beginning of 2010.Therefore, the conceptualization of cohort effects was proposed based on the interaction between age and period [1].Although this conceptualization still has an exact linear relationship (age + cohort = period), exposures (predictors) are not intrinsic to birth cohorts.We would rather explain a cohort effect that existed while different disease distributions arise.However, as age + cohort = period, these three variables are linear dependent, and unless additional constraints are imposed, APC model that estimates the linear effects of age, period, and cohort is non-identifiable.We have explained this problem and the potential constraints imposed in our previous publications [2][3][4][5].However, methodological complexity is a barrier for many researchers.As previously mentioned, a cohort effect is conceptualized as a period effect that is differentially experienced through age-specific exposure to an event or cause (i.e., interaction) [6].Addressing the identifiable problem in this conceptualization is unnecessary because cohort effects are not conceptualized independently from age and period.The median polish analysis has been used to estimate cohort effects under this conceptualization [6,7].
Recently, the multiphase method was proposed by Keys and Li [6] and provides three phases of estimating cohort effects with minimal assumptions on the contingency table data.Moreover, the median polish does not rely on a specific distribution or structure and thus can be widely applied to various types of data, such as rates, log rates, proportions, and counts.The first phase is graphical representation.Graphs were conducted by age across periods or birth cohorts and even birth cohort across ages or periods.For example, we conduct a graph of rates of age across periods.If age-specific rates of different age groups varied mutually among different periods, then the period effect may exist in contingency table data.Cohort effect can also be present while age-specific rates of different age groups interacted mutually among different periods.The second phase involves median polish analysis to remove the additive effect of age and period by iteratively subtracting the median from each row and column.The final phase is regression procedure, which contain cohort effects and random error.We regressed these residuals on the cohort category (defined as an indicator variable) in a linear regression model with the aggregated count data in the format of contingency tables.
The median polish was developed to describe data in a two-way contingency table [8] and remove the additive influence of age (i.e., row) and period (i.e., column) by iteratively subtracting the median from each row and column.Selvin first applied the median polish to APC analysis [7].This technique requires no assumptions about the distribution or structure of the data in a twoway contingency table.Consequently, it can be widely used for any type of data contained in a table without any assumption, such as suicide data [9].APC model was also used to describe the secular trend in disease incidence or mortality [3].The APC model usually assumed that age, period, and cohort have additive effects on the log transformation of disease/mortality rate.
Hepatocellular carcinoma (HCC) is the most common primary malignancy of the liver and is strongly associated with cirrhosis, due to both alcohol and viral etiologies [10].Of all malignant tumors worldwide, HCC ranked fifth in terms of mortality in men (and the eighth in women).In Taiwan, it has been ranked as the first among all major cancers in men (and second in women) [11].
In this study, we investigated the longitudinal trends of HCC mortality data from the Vital Statistics as our demonstration.We evaluated the HCC mortality to identify the effects of age, period, and cohort and examined whether these effects varied by gender.This study aimed to use weighted average method to modify the multiphase method, in order to estimate the cohort effect.We also illustrated how to estimate the cohort effect using the multiphase method and compared the results to those estimated by proposed weighted average method.

RESULTS
Figures 1 and 2 show the HCC mortality rates among age and period groups.These fluctuations were more significant among men than women.The distribution of rates according to age shows that HCC mortality rates begin at 40-44 age group (see Figure 1).Note that HCC mortality rates rose gradually among those in ≥60 age group (see Figure 2).However, HCC mortality rates based on age have considerably changed over time, which means that a significant cohort effect was hidden in the usual ageperiod cross-classified Vital Statistics table and will not apparent until the distant future.We perform the median polish procedure on the log-transformed HCC mortality rates.Tables 1 and 2 present the estimated cohort effects of the APC model on HCC mortality rates.Moreover, Table 3 also presents the age and period effects for both gender.Subsequently, Tables 1 and 2 report the weighted estimates obtained after calculating the weighted average procedure for both gender.According to the smallest deviance (compared to unweighted estimates) of confidence interval (CI) of the weighted estimates, the weighted estimates are better to fit the data.For men, in the left panel of Table 1 presents the cohort effects of the birth cohorts.The cohort effect increases from 0.75 (the earliest cohort effect in 1891) to 1.13 (the greatest cohort effect in 1936).For women, the cohort effect increases from 0.69 (the earliest cohort effect in 1891) to 1.17 (the greatest cohort effect in 1926).Note that the cohort effect significantly increased by approximately 51% and 68% compared to the cohort in 1891 for men and women, respectively.In the right panel of Table 1, the increase was evenly distributed.Here, the cohort effect increased from 0.71 (the earliest cohort effect in 1891) to 1.05 (the greatest cohort effect in 1936).For women, the increased distribution is presented similarly in the right panel of Table 2.The cohort effect increased from 0.65 (the earliest cohort effect in 1891) to 1.04 (the    greatest cohort effect in 1921).Thus, we observed that the mortality rate increasing by approximately 48% and 60% will become the peak value for men and women, respectively.Among the birth cohorts, men born in 1936 exhibited the highest risk of HCC mortality (Table 1).Consequently, for weighted estimates, the effect was 1.05 (95% CI: 1.04-1.07)for the 1936 birth cohort compared to the reference birth cohort in 1946.However, a dramatically decreasing trend was observed for the earlier cohorts.Additionally, the effects were reversed after the 1936 cohort.Moreover, we plot the unweighted and weighted cohort effects with 95% CI of men and women (Figures 3  and 4), respectively.Both figures show that almost all of the widths of 95% CI of weighted are shorter than that of unweighted cohort effects.
In this study, we limited our APC analysis of the median polish procedure to estimating cohort effects and 95% CIs of the HCC mortality.Based on this analysis, it appears that the residual errors (Ɛ ijk ) were close to zero.

DISCUSSION
Considering the time trend of HCC mortality, the conventional analysis using a simple linear extrapolation of the observed log age-adjusted rates may underestimate some important characteristics hidden in the data (such as the cohort effects) and facilitate prediction that are grossly missing.If we directly observe the long-term trends of HCC mortalities from 1976 to 2010 in Taiwan (Figure 5), no one with any reason will doubt that the trends, having been increasing for 35  for the next few years.However, in fact the recent trend on HCC mortalities in Taiwan is decreasing and is driven by cohort effects (identified from the APC analysis), which as described decreased after the 1936 cohort.In this study, applying APC model allows advanced and more accurate warning for trend changes.From a clinical viewpoint, hepatitis B virus (HBV) infection is an important health issue worldwide with high morbidity, approximately 2 billion people infected and 350 million suffering from chronic HBV infection [12].The HBV infection can induce a wide range of clinical problems, from inactive carrier status to fulminate hepatitis, cirrhosis, or hepatocellular carcinoma.Injecting hepatitis B vaccine is the most effective prevention method.Based on the policy implications, the first worldwide hepatitis B mass vaccination program was implemented in 1984 in Taiwan [13].They screened pregnant women for HBsAg and then HBeAg.At first, the immunization program covered only infants of HBsAg carrier mothers in initially 2 years.From the third year of vaccination program, all infants were covered.Recently, the coverage rate of hepatitis B vaccine reached 99%.After three vaccines, approximately 90-95% of the people will have life-long immunity.Note that the decline in pediatric HCC in Taiwan can be attributed to the contribution of this worldwide vaccination program.The APC estimation  described in this paper can have an advanced warning for these (increased) trend changes (to be decreased recently).
This study investigated trend of cohort effect through applying median polish procedure.The weighted estimates for modification of the regression model of these residuals then allow a weighted average estimate on the effect with narrower CI of each cohort.The results are reported in the form of cohort effects using the 1946 cohort, because these categories present the fewest changes in HCC mortality rates with cohort influence removed.
In most modeling methods (such as linear or nonlinear regression models), one of the common assumptions is that each real value of data provides equal information to estimate the parameters in a model which was undertaken.It means that the standard deviation of the error term is the constant underlying predictor variables.Based on our literature reviews, the assumption does not hold in modeling to empirically estimate the parameters.When we use weighted regression, the unknown parameters are estimated, a less weight is given for the less precise data points and more weights are given for the precise data points.The advantage is that weighted procedure can reduce the standard deviation of the estimator.However, the drawback of the weighted regression method is almost unknown in empirical practice.Because the exact weight is almost unknown, the estimated weight can be used to estimate the parameters.Moreover, experience shows that the weighting due to estimation does not change much and often does not affect regression analysis or its interpretation [14].Theoretically, any disease with rates governed by age, period, and cohort effects is amenable for an APC model.Moreover, the weighted average estimates can be used for prediction [15][16][17].If the CI is relatively narrow, the uncertainty is smaller, because the CI describes the uncertainty inherent in this estimate and range of values within which we can be reasonably sure that the true effect actually happens.
Several potential limitations of our study should be noted.First, we can only infer about the etiologies of the changes observed.The HCC mortality based on age, period, and cohort effects are re-amenable for an APC model.However, the presence of set assumptions for the median polish that we used should be noted in the present study.Second, APC analysis can be used extensively in the epidemiology field in populations of developing or recently developed countries, where long-running cohort studies are limited.Third, we do not have information from the aggregated format datasets to adjust confounders, such as comorbidities or lifestyle, in the APC model.Further studies using individual data is needed to solve this limitation.Fourth, we use the number of deaths due to HCC as the weight to modify the regression procedure in the multiphase method.Because the exact weight is almost unknown, the use of various weights may cause minor inflation among estimated cohort effects.Lastly, circumstances in which various APC estimation methods to address the non-identifiable problem may occur (e.g., Holford adopts the linear and curvature trends to tackle the non-identifiable problem [18]).Meanwhile, the median polish provides conceptual shift form complex assumption among APC model to estimate the cohort effect with a minimum of assumptions and easily applies a general format for contingency table.
In conclusion, the weighted estimation to modify the regression model then allows a weighted average effect with narrower CI of each cohort.In summary, the weighted estimation of the regression model is recommended for multiphase method to estimate the cohort effects in ageperiod contingency table data.
Let the mortality rate of the i th age group and the j th period group be denoted by λ ij The APC model is as follows: where the intercept term is represented by μ, the age effects by α i , the period effects by β j , and the cohort effects by γ K .The following constraints are used: The multiphase method in estimating cohort effects The multiphase method includes three-phase processes that concretized the estimation of the cohort effect as a partial interaction in age-period contingency table data [6,9].The natural log rate (λ ij ) is established using the log-additive effect as a constant term plus age, period effect, and multiplicative interaction term, which The Ɛ k is established using a vector of cohort effects (γ k ) and error terms (Ɛ ijk ), where Ɛ ijk represented the error terms unmeasured as i age, j period, and k cohort categories.
According to the cohort-specific mortality by age that calculated as removed and unremoved cohort influence to decide reference categories.Reference categories of cohort had a minimum difference in cohortspecific mortality between with and without cohort influence.After subtracting residuals form contingency data, we can use the residual to calculate the log-additive rate (without cohort effect) with multiplying factor e -(residual) to the rate each age and period group.Then, take the ratio of log-additive rate without and with cohort effects for each cohort.If the rate ratio of the cohort is close to one, then it is determined as the referent birth cohort.The referent birth cohort can be determined based on the slight variation of its rate after removing the influencing factors.

Weighted average
For birth cohorts with members that achieved the cohort, let W k denote the weight of the k th cohort category: However, the common assumption is that each cohort across i age and j period of data provides equal information (i.e., equally weighted) for the estimation of cohort effects in a model while the weighting factor is generally unknown.Empirically, the equally weighted assumption is usually violated while modeling to estimate the cohort effect.The empirical weighting factor most widely used is the number of death [20].Each of these weights can be applied to the regression equation.The weighted average of the cohort effect can be performed via the weight equal to the occupied proportion according to the number of deaths due to HCC in each cohort.
To check model fitness and furthermore we plot deviance residuals which from the null model, the age model, the age-period model, and then to the APC model (under the proposed weighted method) progressively (Supplementary Figure 1).

Figure 4 :
Figure 4: Plot of the unweighted and weighted effects with 95% confidence interval of females.

Figure 3 :
Figure 3: Plot of the unweighted and weighted effects with 95% confidence interval of males.

Figure 5 :
Figure 5: Age-adjusted mortality rate of death from hepatocellular carcinoma for men and women in Taiwan ..., , (Eq-1)

Table 1 : Estimated rate ratios and 95% conference intervals for effect of birth cohort on hepatocellular carcinoma mortality of males in Taiwan, 1891-1966
Note: REF = reference; CI = confidence interval.