|
Oyeka ICA1 and Okeh UM2* |
1Department of Applied Statistics, Nnamdi Azikiwe University, Awka Nigeria |
2Department of Industrial Mathematics and Applied Statistics, Ebonyi State University Abakaliki, Nigeria |
*Corresponding author: |
Okeh UM Department of Industrial Mathematics and Applied Statistics Ebonyi State University Abakaliki, Nigeria E-mail: uzomaokey@ymail.com |
|
 |
Received January 03, 2013; Published January 08, 2013 |
 |
Citation: Oyeka ICA, Okeh UM (2013) Estimating Odds Ratios in Logistic Regression of Dichotomous Data. 2:608 doi:10.4172/scientificreports.608 |
 |
Copyright: © 2013 Oyeka ICA, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
 |
Abstract |
 |
This paper proposes an odds ratio type measure of strength of association between screening test results and state of nature or condition in a population from diagnostic screening tests based on logistic regression analysis of dichotomous outcomes. The proposed method unlike in the analysis of data from most screening tests requires that the response to the condition of interest is dichotomous, assuming one of two possible values. The predisposing factors in this study are categorical variables. This would enable the fitting of a logistic regression model to help in the estimation of desired probabilities, odds and odds ratios of positive responses. A test statistic to assess the statistical significance of the proposed measure based on the logistic regression is developed. The proposed method is illustrated with some sample data and the results are shown to compare favourably with what is obtained using the usual expression for the odds ratio. |
 |
Keywords |
 |
Gestational diabetes mellitus; Odds; Odds ratio; Logistic regression; Dichotomous |
 |
Introduction |
 |
Often a candidate for an examination or a job interview may wish to estimate the probability of his success given some predisposing factors such as the number of hours he studied per day or per week, the nature, type and duration of the examination, the condition; prior qualifications his age ,gender, ethnic group, state of origin etc. A clinician conducting a diagnostic test or drug trials for a certain condition may wish to know the odds that his subjects or patients respond positive given their various characteristics such as age, gender, body weight, family history [1] etc. A gynecologist or a pediatrician may wish to estimate the odds that a new-born baby is under-weight or has more than normal gestation period given the mothers age, parity, body weight and child’s gender [2] etc. In all the situations the response to the condition of interest is dichotomous, assuming one of two possible values. The predisposing factors may be either categorical or continuous variables [3]. This would enable the fitting of a logistic regression model to help in the estimation of desired probabilities, odds and odds ratios of positive responses as discussed below[4-6]. |
 |
The proposed model |
 |
Let yi be the response of the ith randomly selected subject to the condition of interest assuming values of either 1(positive response) or 0 (negative response) for i=1,2,…,n. Let xi1, xi2,...., xik be the score by the ith subject on the independent explanation, or predetermined variables X1, X2,...., XK respectively, |
 |
The following analysis of variance (ANOVA), Table 1 is used to test the adequacy of Equations 4 and 5 based on the F-test statistic: |
 |
|
Table 1: Four-fold table for the screening test results and gold standard of risk pregnant women for GDM Gold Standard. |
|
|
X is an n×(k+1) matrix of regressors. The null hypothesis to be tested for the adequacy of Equation 4 using the results of Table 1 is |
|
H0: β1= β2= ......= βk=0 vs H1(6) |
|
j=1, 2,....,k |
|
H0 is rejected at the α level of significance if |
|
Otherwise H0 is accepted where F1-α; k, n-k-1 is the critical value of the F-distribution with k and n-k-1 degrees of freedom for a specified α level. If the model fits, that is if H0 is rejected so that not all the βjs are zero, then we may proceed to estimate the required probabilities, odds and odds ratios of positive responses to the condition of interest. Thus assuming that H0 is rejected then we estimate from Equations 4 and 5 the odds that the ith subject responds positive to the condition under study given the independent variables X1=xi1, X2=xi2,.... Xk=xik as |
|
|
Estimation odds ratio from logistic regression of data |
|
Note that since the right hand side of Equation 10 is independent of i, i=1, 2,…,n. Equation 10 may be interpreted as the estimated odds ratio of positive responses by any randomly selected subjects under the specified conditions. In obtaining the odds ratio of Equation 10 it is assumed that some independent variables are increased or decreased by some constant. It is however also possible that some of these independent variables are increased or decreased proportionately, that is by some percentage or proportion of the independent variables themselves. Thus suppose assuming a value of (1+α)xil and assuming a value of (1-Υ)xis, holding other independent variables constant. Then the resulting odds of positive response by the ith randomly selected subject is, |
|
llustrative example |
 |
Table 2 shows the data obtained from a collection of hospitals in Ebonyi State covering from January 2010 to December 2011 particularly from the medical record unit of these hospitals. It was the result of a retrospective study on the effect of four independent risk factors (variables) in the development of gestational diabetes mellitus (GDM). A sample of 301 risk pregnant women who satisfied the inclusion criteria based on WHO, 1999 [7] standard were considered. All the risk factors (family history-FH, obesity, age, and previous fetal weight) considered in this work are dichotomous in which case; it has been coded for use in estimating the odds ratio in logistic regression. The dependent variable is GDM. We here present sample data obtained in a diagnostic screening test to confirm the presence or absence of GDM among the sampled subjects from a certain population. The proposed method is illustrated using the sample data of table 1. |
|
 |
Testing the adequacy of model |
 |
Regression analysis showed the following results. Now from Equation 6, where βj≠0 since we have from analysis that obesity=-0.082, Age=-0.020, Family History(FH)= -0.211, PFW=-0.125 and Constant=3.503. |
 |
We here reject H0 and conclude that the risk factors have significant relationship. Odds ratio values for the risk factors showed Obesity=0.921, Age=0.981, FH=0.810, PFW=0.88 and constant=33.209. Significant values of risk factors are Obesity=0.747, Age=0.981, FH=0.810, PFW=0.882 and constant=33.209. These indicates high significance for their relationship. It also shows the effects of these risk factors on the occurrence of GDM. |
|
Summary and Conclusion |
 |
This paper has proposed a statistical method for measuring the probability of success based on some predisposing factors from the association between diagnostic screening test results and state of nature or condition in a population based on the probabilities, odds, odds ratio estimated from logistic regression. A test statistic for the statistical significance of the proposed measure of association are developed based on the estimation of logistic regression of the dichotomous dependent variable and some covariates variables. The proposed method is illustrated with sample data and shown to compare favourably with results that would have been obtained using the traditional expression for the odds ratio test and other statistic for measure of association. |
 |
|
References |
 |
- Hall GH, Round AP (1994)
- Lee J (1986)
- Fleiss JL, Bruce L, Paik MC (2003)
- Fienberg SE (1980)
- Hosmer DW, Lemeshow S (1989)
- Van Houwelingen JC, le Cessie S (1988)
- World Health Organization (1999)
|
 |
 |