BMTRY 764

Computing for Research I

Spring 2014

 

Description: Students learn to use the primary statistical software packages for data manipulation and analysis, including (but not limited to): R, R Bioconductor, SAS, SAS macro, and Stata. Additionally, students will learn: how to use the division's high speed cluster-computing environment, how to practice the principles of reproducible research, and how to use LaTeX and BibTeX for manuscript and presentation development. This is a three credit course.

 

Course Organization: This course is organized by Dr. Elizabeth Garrett-Mayer who is also the primary instructor. Some lectures are given by other faculty members and senior students in the Department of Public Health Science.

 

Textbooks: No textbook. Reading material (primarily found on the web) will be provided as necessary.

 

Prerequisites: Biometry 700

 

Grading: Instructors will give short exercises to be completed and turned into the primary instructor by the Wednesday of the week following when it was assigned (e.g., assignments given on Monday Feb 3 and Wednesday Feb 5 are both due on Wednesday Feb 12). Each assignment will count equally towards 75% of the course grade. There will be a final project which will account for the remaining 20% of the course grade. The remaining 5% of the course grade will reflect class participation.

 

Homeworks Policy: Homeworks are due by 5pm on the due date. All homeworks should be emailed to the primary instructor (garrettm@musc.edu) or turned in at lecture time. Asking for extensions on homeworks is strongly discouraged. However, it is expected that, on occasion, extenuating circumstances may arise. Therefore, the policy is that each student may request an extension on homework twice and the extension is to be no more than 2 days. You must notify the primary instructor that you are requesting an extension before the time the assignment is due. After using two extensions, no more extensions will be granted except with a medical note.

 

Office Hours: The teaching assistant will have regular office hours each week. The primary instructor will have office hours by appointment. However, given the nature of the course, the primary instructor may not be knowledgeable regarding all of the topics covered. As a result, additional help may be needed to complete assignments from the lecturers. Be considerate and responsible in scheduling time with course instructors and recognize that they all have busy schedules.

 

Course Objectives: Upon successful completion of the course, the student will be able to

1.    Import data and perform simple analyses and produce graphical displays in Stata, SAS and R

2.    Create new functions or commands in each of R, Stata and SAS

3.    Generate professional quality scientific manuscripts and presentations using Latex along with statistical software

4.    Perform standard power and sample size calculations using available software and simulations.

 

Primary Instructor:

 

Elizabeth Garrett-Mayer

Website:

 

http://people.musc.edu/~elg26/teaching/statcomputing.2014/statcomputingI.2014.htm

Contact Info:

 

Hollings Cancer Center, Rm 118G

garrettm@musc.edu (preferred mode of contact is email)

792-7764

Time:

 

Mondays and Wednesdays, 9:00-10:30am

Location:

 

Cannon 301

Office Hours:

 

By appointment. Contact via email.

TA Office Hours:

 

2pm on Tuesdays.

 

Grading: The homeworks are not intended to be difficult (although some may be time consuming), but the most challenging issue is that there are a number of instructors providing the homework assignments and so there is not a consistency of length of time assignments take or challenge of the assignments. Note that based on MUSC standards, a score of 85% is acceptable. (please see http://academicdepartments.musc.edu/bulletin/acad_policies/grading.html for more information of the grading policies if you are interested.) Using 85% as a benchmark, here is how the homeworks will be evaluated:

 

100% = Perfect

95% = Almost perfect. Only negligible mistakes (e.g. misspellings, typos,)

90% = Shows extra effort but some minor mistakes

85% = Acceptable. This means that there are some relatively minor mistakes, but the general ideas are all there and the student has demonstrated mastery of the concepts.

75%: = clear misunderstanding of at least one concept

70% = several major mistakes, but the majority of the homework is still ok

50% = turned in the homework but most of the concepts were misunderstood and the results are mostly incorrect.

0% = did not turn in homework.

 

Lectures:

 

 

Date

Lecturer

Topic

Lecture materials (i.e. slides, links)

Homework assignment

W Jan 8

EGM

Introduction; Overview and Principles

Introduction

 

M Jan 13

Fan

SAS: introduction

SASintro slides

homework.docx

subject.csv, score.csv

W Jan 15

Ellerbe

SAS: IML

Using Proc IML.pptx

IML_SPRING2014.sas

IML_Graphics.pdf

IML_Language Reference.pdf

IML_SAS Datasets.pdf

IML_Statistics Examples.pdf

IML_Storage.pdf

IML_Using R.pdf

IML_Working with Matrices.pdf

 

W Jan 22

Battenhouse

SAS: macros

macro_lecture_spring_2014.pptx

 

M Jan 27

Foster

SAS: proc tabulate and proc report

SASPres_27JAN2014.ppt

macros.tabulate.HW.docx

vitals.sas7bdat

W Jan 29

SNOW/ICE DAY

 

 

M Feb 3

Nicholas

SAS: ODS

ODS Lecture.pptx

ODS Demo.sas

ODS HW.docx

W Feb 5

Elm

SAS: array processing

SASArrayProcessing.ppt

HANDOUT242-30.pdf

HRARRAYstatements1.doc

 

M Feb 10

Baker

SAS: Gplot

SAS GPLOT slides 1 29 2014.ppt

SAS Gplot HW Description.doc

hw_gplot_1_29_14.sas7bdat

M Feb 17

EGM

Data management principles & Excel

Data.management.pptx

Homework.datamanagement.docx

W Feb 19

EGM

STATA: introduction, immediate commands

Stataintro.pptx

SCBC2004.dta

Statalecture1.do

http://www.ats.ucla.edu/stat/stata/sk/default.htm

http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial/index.html

http://data.princeton.edu/stata/ 

Stata1.homework.docx

Ohiosmall.dta

 

 

F Feb 21

Wahlquist

Data management: RedCap

(rescheduled due to MUSC weather closure)

REDCap2014.pptx

QBdata.xlsx

REDCap Homework 2014.docx

M Feb 24

EGM

STATA: graphical displays

Stata.graphics.pptx

Statalecture2.do

Ceramide.csv

Ptdata.GemDox.csv

IschemicHeartDisease.csv

IschemicHeartDisease.pdf

Stata2.homework2.docx

 

W Feb 26

EGM

STATA: exploratory data analysis;

Lecture12.do

StataEDAandHT.pptx

Ceramide.csv

Ptdata.GemDox.csv

SCBC2004.dta

Ceramide.alldata.dta

 

Homework.StataEDA.docx

Ddata.csv

M Mar 3

EGM

STATA regression commands

Stata.regression.pptx

Sleep.csv

Stata.regression.do

Stata.regression.homework.docx

W Mar 5

EGM

STATA: programming and do files

Stata.programming.pptx

Stata.programming.do

Stata.programming.docx

M Mar 17

EGM

R: introduction to object-oriented programming

Rintro.pptx

Rintro.R

SCBC2004small.csv

Rintro.homework.docx

W Mar 19

Moss

R: downloading packages/libraries; data input & output

R lib. Data Input&Output.ppt

R pack.data in_out.R

 

M Mar 24

EGM

R: graphics

Rgraphics.ppt

Prostate.csv

Graphics Code.R

Framestnc.csv

ComputingHW.pdf

Esoph.csv

 

W Mar 26

Onicescu

R: basic language structure (ifelse, where, looping)

Rpresentation.pdf

Rcode_presentation.txt

Hwk.pdf

M Mar 31

EGM

R: exploratory data analysis; writing commands

Rcommands.pptx

Rcommands.R

Final-3-3-2011.csv

Rcommands.homework.docx

Methylation.csv

W Apr 2

Wei

R: regression commands

Rregression.pptx

Data.dat

Class demos.R

 

M Apr 7

Fan

R: simulations; random number generation; sampling from distributions

R simulations.ppt

InClassCode.Simulation.R

Homework.doc

W Apr 9

Wolf

R: bioconductor

Rconductor.pptx

RevolutionR.pptx

 

M Apr 14

Ellerbe

Latex and Bibtex: manuscript production

How to install latex.docx

Latex_InClass_StatComputing2014.tex

Introduction to Latex_Spring2014.pptx

Mendeley_Spring2013.pptx

HomeworkInstructions.pdf

Homework_StatComputing.tex

biom.bst

titletemplate.tex

HomeworkSolution.pdf

 

W Apr 16

Kistner-Griffin

Latex and Bibtex: presentations

Statcomputing2014.pdf

Statcomputing2014.tex

Beameruserguide.pdf

Conference-ornate-20min.en.tex

DNA.png

Teatime.2010.pics.tex

HWbeamer.pdf

M Apr 21

EGM

Sample size calculation software packages

Sample size & power estimation.pptx

Interaction.sample.size.R

Sample Size Problem.docx

W Apr 23

Hill

Reproducible Research

Sweave.intro.student.notes.2012.pdf

Sweave.example.Rnw

Sweave.example.pdf

Sweave.sty (style file needed to run Sweave. Save this in the folder that contains the .Rnw file you will be running)

carter.cls (the class file used in the presentation Sweave_intro.tex)

PPRCarter.sty (the style file used in the presentation Sweave_intro.tex.

 

Links:

 Sweave homepage  Leisch is the originator of the package

http://www.stat.uni-muenchen.de/~leisch/Sweave/

 

The Cancer Letter link

http://www.bcm.edu/cancercenter/index.cfm?pmid=12886

 

Annals of Applied Statistics paper:

https://projecteuclid.org/euclid.aoas/1267453942

 

SASweave paper in Journal of Statistical Software

http://www.jstatsoft.org/v19/i08/

 

STATweave user’s manual by Russ Lenth

http://www.stat.uiowa.edu/~rlenth/StatWeave/StatWeave-manual.pdf

 

Reproducible Research in Biostatistics:

http://biostatistics.oxfordjournals.org/content/10/3/405.full

 

Reproducible Epidemiologic Research

http://www.biostat.jhsph.edu/~fdominic/papers/repropeng.pdf

 

 

M Apr 28

EGM

Designing your own website

Website2014.pptx

Website.homework.docx

 

 

 

 

 

 

 

 

 

FINAL PROJECT

DUE MAY 5, 9AM

Finalproject.docx

Finalprojectdata.csv

 

 

 

 

 

Computing:

Downloads and Websites:

·       R: http://cran.r-project.org/

·       Stata website: http://www.stata.com/

 

Tutorials :

·       R tutorial: R-intro.pdf