Biometry 711: Categorical Data Analysis

Summer 2015

 

Instructor

Elizabeth G Hill

Office

118F Hollings Cancer Center

Phone

876-1115

Email

hille@musc.edu

Class schedule

Monday and Wednesday, 1pm - 3pm

Class dates

Wednesday May 13th - Wednesday July 29th

Location

135 Cannon Place, Room 301

Website

http://people.musc.edu/~hille/BMTRY711/homepage.htm

Instructor Office Hours

By appointment

Teaching Assistant

Chawarat Rotejanaprasert

TA Email

rotejana@musc.edu

TA Office Hours

By appointment

 

Text: Categorical Data Analysis, Third Edition, Alan Agresti, John Wiley & Sons, 2013. ISBN 978-0-470-46963-5

 

Course Description: Biometry 711 (Categorical Data Analysis) covers the theoretical underpinnings and analysis methods for categorical and discrete data. The tentative course outline includes topics from: Chapters 1 - 3 (contingency tables analysis and inference); Chapters 4 - 7 (logistic regression and alternative modeling approaches for binary data); Chapter 8 (models for multinomial response data); and Chapter 11 (models for matched pairs data). Additional topics may be added, time permitting.

 

Grading: There will be four homework assignments, each worth 12.5% of your grade. In lieu of exams, there will be two projects. The first project will be worth 20% of your grade. The final project will be worth 30% of your grade. Late homework is accepted, but at a penalty. Homework turned in late on the day it is due receives 3/4 credit. Homework turned in late the day after it is due receives 1/2 credit. Homework turned in two days after it is due receives 1/4 credit. Homework more than two days late receives no credit. Additional information about the projects will be distributed at a later time. Currently, I anticipate the first project will be assigned in mid-June and due in early July. The final project will be assigned at the completion of the course and will serve as the course's capstone project.

 

Important Dates:

Monday, May 25th, Memorial Day - No Class

Monday, June 29th - No Class

Friday, August 14th - Final Project due

CGS academic calendar - http://academicdepartments.musc.edu/esl/em/records/forms/11_calendar_13-18.pdf

 

CDA website: http://www.stat.ufl.edu/~aa/cda/cda.html

 

Day

Date

Topic

Text References

Handouts

W

May 13th

Introduction

CDA 1.1, 1.2

 

M

May 18th

Introduction (cont.)

CDA 1.3, 1.4

Clopper Pearson R Script

Pediatric Orders data

Basics in R

Basics in SAS

W

May 20th

Introduction (cont.)

CDA 1.5, 16.6

 

M

May 25th

Describing Contingency Tables

CDA 2.1, 2.2

 

W

May 27th

Describing Contingency Tables (cont.)

CDA 2.3, 2.4

 

M

June 1st

Inference for two-way tables

CDA 3.1, 3.2

 

W

June 3rd

Inference for two-way tables (cont.)

CDA 3.3, 3.4

 

M

June 8th

Inference for two-way tables (cont.); GLM introduction

CDA 3.5, 4.1 - 4.3

Homework 1

W

June 10th

Fitting the GLM - IRWLS

CDA 4.4, 4.6

 

M

June 15th

Fitting the GLM - Fisher Scoring; Review of logistic regression

CDA Chapter 5

IMPACT Data set description

IMPACT Data (.SAS7bdat)

IMPACT Data (.csv)

Logistic Regression (SAS)

Logistic Regression (R)

W

June 17th

GLM GOF - Deviance; Grouped and ungrouped logistic regression

CDA 4.5

Grouped versus ungrouped logistic regression (SAS)

M

June 22nd

GLM GOF - HL test, ROC curves, residual analysis

CDA 5.2, 6.2, 6.3

HL and ROC GOF for logistic regression (SAS)

ROC handout

AJE Pepe et al. article, 2004

Homework 2

Hu and Smyth paper

W

June 24th

Assessing linearity in the logit, multivariable fractional polynomials

 

Dichotomizing continuous variables in regression, SIM article, 2006

Royston and Altman fractional polynomials JRSSC paper, 1994

Sauerbrei et al. comparison of FP software CSDA article, 2006

Website for FP software

R documentation for mfp library

Assessing linearity in the logit for logistic regression and use of fractional polynomials (R)

W

July 1st

AIC, Pearson/Deviance residuals and diagnostic plots, Quasi-complete separation

CDA 6.1.6, 4.5.6, 6.2, 6.5

Diagnostic plots (SAS)

Quasi-complete separation (SAS)

PROJECT 1 - Due Monday July 20th

Project 1

Grading Rubric

CRCData (.csv)

CRCData (.sas7bdat)

M

July 6th

Poisson Regression, Overdispersed models for count data - Quasi-likelihood and Negative Binomial models

CDA 4.2, 4.7, 14.4

Infant Death Codesheet

InfantDeaths dataset (.csv)

InfantDeaths dataset (.sas7bdat)

Poisson Regression (SAS)

Output file (SAS)

W

July 8th

Zero-inflated models - Hurdle models, ZIP models and ZINB models

Guest lecture by Dr. Neelon

Lecture slides

M

July 13th

High Throughput Sequencing (HTS) Data Analysis

Guest lecture by Dr. Chung

Nucleic Acids Research article

Bioinformatics article

Lecture slides

W

July 15th

Generalized logit models

CDA 8.1

Lecture slides

SAS program

SAS output

M

July 20th

Proportional odds models

CDA 8.2

Lecture slides

Low birthweight data codesheet

Low birthweight data (sas7bdat)

SAS program

SAS output

W

July 22nd

Quasi-likelihood theory

CDA 12.2, 12.3

Homework 3

teratology.csv

teratology.sas7bdat

homicides.csv

homicides.sas7bdat

F

July 24th

GEE theory

CDA 12.2, 12.3

Liang and Zeger Biometrika article

Zeger and Liang Biometrics article

M

July 27th

GEE applications

 

PAS (swallow) data set (sas7bdat)

PAS data set (csv)

GEEs in SAS using GENMOD

SAS output

Clustered ordinal data article

W

July 29th

GEE GOF

 

Biometrics paper on cumulative sums of residuals

SAS code

R code

Homework 4

Homework 4 AJE paper

PROJECT 2 - Due Monday August 17th, 9AM

Project 2

Journal of Perinatology article by Garner et al.

Grading rubric

project2.csv

project2.sas7bdat

 

References

1.     An Introduction to Categorical Data Analysis, Second Edition. A. Agresti. John Wiley & Sons, 2007.

2.     Analysis of Ordinal Categorical Data, Second Edition. A. Agresti. John Wiley & Sons, 2010.

3.     Statistical Methods for Rates and Proportions, Third Edition. J. Fleiss, B. Levin and M.C. Paik. John Wiley & Sons, 2003.

4.     Applied Logistic Regression, Third Edition. D.W. Hosmer, S. Lemeshow and R.X. Sturdivant. John Wiley & Sons, 2013.

5.     Regression Modeling Strategies. F.E. Harrell. Springer-Verlag, 2001.

6.     Generalized Linear Models, Second Edition. P. McCullagh and J.A. Nelder. Chapman & Hall, 1989.

7.     The Statistical Evaluation of Medical Tests for Classification and Prediction. M.S. Pepe. Oxford University Press, 2003.

8.     The Elements of Statistical Learning, Second Edition. T. Hastie, R. Tibshirani and J. Friedman. Springer-Verlag, 2009.