BMTRY
764
Computing
for Research I
Spring
2014
Description: Students learn to use the primary
statistical software packages for data manipulation and analysis, including
(but not limited to): R, R Bioconductor, SAS, SAS macro, and Stata.
Additionally, students will learn: how to use the division's high speed
cluster-computing environment, how to practice the principles of reproducible
research, and how to use LaTeX and BibTeX for manuscript and presentation development. This is a three credit course.
Course Organization: This course is organized by Dr. Elizabeth
Garrett-Mayer who is also the primary instructor. Some lectures are given by
other faculty members and senior students in the Department of Public Health
Science.
Textbooks: No textbook. Reading material (primarily
found on the web) will be provided as necessary.
Prerequisites: Biometry 700
Grading: Instructors will give short exercises to be
completed and turned into the primary instructor by the Wednesday of the week
following when it was assigned (e.g., assignments given on Monday Feb 3 and
Wednesday Feb 5 are both due on Wednesday Feb 12). Each assignment will count
equally towards 75% of the course grade. There will be a final project which
will account for the remaining 20% of the course grade. The remaining 5% of the
course grade will reflect class participation.
Homeworks Policy: Homeworks are due by 5pm on the due date. All homeworks
should be emailed to the primary instructor (garrettm@musc.edu)
or turned in at lecture time. Asking for extensions on homeworks
is strongly discouraged. However, it is expected that, on occasion, extenuating
circumstances may arise. Therefore, the policy is that each student may request an extension on homework twice and the
extension is to be no more than 2 days. You must notify the primary
instructor that you are requesting an extension before the time the assignment
is due. After using two extensions, no more extensions will be granted except
with a medical note.
Office Hours: The teaching assistant will have regular
office hours each week. The primary instructor will have office hours by
appointment. However, given the nature of the course, the primary instructor
may not be knowledgeable regarding all of the topics covered. As a result,
additional help may be needed to complete assignments from the lecturers. Be
considerate and responsible in scheduling time with course instructors and
recognize that they all have busy schedules.
Course
Objectives: Upon successful completion of the
course, the student will be able to
1.
Import data and perform simple
analyses and produce graphical displays in Stata, SAS and R
2.
Create new functions or commands in
each of R, Stata and SAS
3.
Generate professional quality
scientific manuscripts and presentations using Latex along with statistical
software
4.
Perform standard power and sample size
calculations using available software and simulations.
Primary Instructor: |
|
Elizabeth Garrett-Mayer |
Website: |
|
http://people.musc.edu/~elg26/teaching/statcomputing.2014/statcomputingI.2014.htm |
Contact Info: |
|
Hollings Cancer Center, Rm 118G garrettm@musc.edu (preferred mode of
contact is email) 792-7764 |
Time: |
|
Mondays and Wednesdays, 9:00-10:30am |
Location: |
|
Cannon 301 |
Office
Hours: |
|
By appointment. Contact via email. |
TA
Office Hours: |
|
2pm on Tuesdays. |
Grading: The homeworks are not intended to be difficult (although some
may be time consuming), but the most challenging issue is that there are a
number of instructors providing the homework assignments and so there is not a
consistency of length of time assignments take or challenge of the assignments.
Note that based on MUSC standards, a score of 85% is acceptable. (please see http://academicdepartments.musc.edu/bulletin/acad_policies/grading.html
for more information of the grading policies if you are interested.) Using 85%
as a benchmark, here is how the homeworks will be
evaluated:
100% =
Perfect
95% = Almost
perfect. Only negligible mistakes (e.g. misspellings, typos,)
90% = Shows
extra effort but some minor mistakes
85% =
Acceptable. This means that there are some relatively minor mistakes, but the
general ideas are all there and the student has demonstrated mastery of the
concepts.
75%: = clear
misunderstanding of at least one concept
70% =
several major mistakes, but the majority of the homework is still ok
50% = turned
in the homework but most of the concepts were misunderstood and the results are
mostly incorrect.
0% = did not
turn in homework.
Lectures:
Date |
Lecturer |
Topic |
Lecture materials (i.e. slides,
links) |
Homework assignment |
W Jan 8 |
EGM |
Introduction;
Overview and Principles |
|
|
M Jan 13 |
Fan |
SAS:
introduction |
||
W Jan 15 |
Ellerbe |
SAS: IML |
|
|
W Jan 22 |
Battenhouse |
SAS:
macros |
|
|
M Jan 27 |
Foster |
SAS: proc tabulate and proc report |
||
W Jan 29 |
SNOW/ICE
DAY |
|
|
|
M Feb 3 |
Nicholas |
SAS: ODS |
||
W Feb 5 |
Elm |
SAS: array
processing |
|
|
M
Feb 10 |
Baker |
SAS: Gplot |
||
M Feb 17 |
EGM |
Data
management principles & Excel |
||
W Feb 19 |
EGM |
STATA:
introduction, immediate commands |
http://www.ats.ucla.edu/stat/stata/sk/default.htm http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial/index.html |
|
F Feb 21 |
Wahlquist |
Data
management: RedCap (rescheduled
due to MUSC weather closure) |
||
M Feb 24 |
EGM |
STATA:
graphical displays |
|
|
W Feb 26 |
EGM |
STATA:
exploratory data analysis; |
|
|
M Mar 3 |
EGM |
STATA
regression commands |
||
W Mar 5 |
EGM |
STATA:
programming and do files |
||
M Mar 17 |
EGM |
R: introduction
to object-oriented programming |
||
W Mar 19 |
Moss |
R:
downloading packages/libraries; data input & output |
|
|
M Mar 24 |
EGM |
R:
graphics |
|
|
W Mar 26 |
Onicescu |
R: basic
language structure (ifelse, where, looping) |
||
M Mar 31 |
EGM |
R:
exploratory data analysis; writing commands |
||
W Apr 2 |
Wei |
R:
regression commands |
|
|
M
Apr 7 |
Fan |
R: simulations;
random number generation; sampling from distributions |
||
W
Apr 9 |
Wolf |
R: bioconductor |
|
|
M
Apr 14 |
Ellerbe |
Latex and Bibtex: manuscript production |
Latex_InClass_StatComputing2014.tex |
|
W
Apr 16 |
Kistner-Griffin |
Latex and Bibtex: presentations |
||
M Apr 21 |
EGM |
Sample
size calculation software packages |
||
W Apr 23 |
Hill |
Reproducible
Research |
Sweave.intro.student.notes.2012.pdf Sweave.sty (style file needed to
run Sweave. Save this in the
folder that contains the .Rnw file you will be
running) carter.cls (the class file used
in the presentation Sweave_intro.tex) PPRCarter.sty (the style file
used in the presentation Sweave_intro.tex. Links: Sweave homepage Leisch is
the originator of the package http://www.stat.uni-muenchen.de/~leisch/Sweave/ The Cancer Letter link http://www.bcm.edu/cancercenter/index.cfm?pmid=12886 Annals of Applied
Statistics paper: https://projecteuclid.org/euclid.aoas/1267453942 SASweave paper
in Journal of Statistical Software http://www.jstatsoft.org/v19/i08/ STATweave user’s
manual by Russ Lenth http://www.stat.uiowa.edu/~rlenth/StatWeave/StatWeave-manual.pdf Reproducible
Research in Biostatistics: http://biostatistics.oxfordjournals.org/content/10/3/405.full Reproducible
Epidemiologic Research http://www.biostat.jhsph.edu/~fdominic/papers/repropeng.pdf |
|
M Apr 28 |
EGM |
Designing your
own website |
||
|
|
|
|
|
|
|
|
|
|
FINAL
PROJECT |
DUE MAY 5,
9AM |
|
|
Computing:
Downloads
and Websites:
· R: http://cran.r-project.org/
· Stata website: http://www.stata.com/
Tutorials
:
· R tutorial: R-intro.pdf