* slide 13: reading in data clear cd "I:\Classes\StatComputingI" insheet using "SCBC2004.csv", comma clear use SCBC2004.dta /* slide 14: exploring our dataset use d or describe */ d d ercat codebook codebook dodyr sum sum ercat codebook ercat * slide 16: more exploration tab race table race list race age if age<30 sort age * slide 20 outsheet race age caseid prcat using "smalldataset.csv", comma replace * slide 22 drop if race==1 keep ercat prcat stagen age grade * read back in original data clear use SCBC2004.dta * slide 23 by race, sort: sum age bysort ercat prcat: sum age tab ercat if stagen>1 tab ercat if graden~=. bysort ercat prcat: sum age if ercat<9 & prcat<9 * slide 25 gen highgrade = 1 if graden>2 | graden<100 replace highgrade = 0 if graden<3 * cannot create new variable with same name: must drop it first drop highgrade gen highgrade=cond(graden>2,1,0) replace highgrade = . if graden==. * slide 26 drop highgrade egen highgrade=cut(graden), at(-1,3,5) drop highgrade egen highgrade=cut(graden), at(-1,3,5) icodes * slide 27 gen y = log(age) gen x2 = age^2 gen z1 = runiform() * uniform(2,4) gen z2 =2+2*runiform() gen id= _n bysort county: gen countyid=_n drop z1 countyid y x2 id * slide 28 * step 1: gen z1=runiform() * step 2 sort county z1 by county : gen countyid=_n * step 3: keep only 10 women in county drop if countyid>10 * slide 29 * read in data with unformatted dates clear use Ohiosmall.dta tab year gen datedx=date(date_of_dx, "MDY") format datedx %td gen yeardiag=year(datedx) list date_of_dx dxdt_text2 dxdtchar datedx yeardiag if yeardiag==. * slide 30 * reshape clear insheet using "I:\MUSC Oncology\Shirai, Keisuke\October2010\ceramide.csv" reshape wide collecteddate - frombaselines1p, i(patient) j(cycle) reshape long clear insheet using "ceramide2.csv" rename cycle1totalceramidelevels totalceramidelevels1 rename cycle1diseasestatus diseasestatus1 rename cycle1c18ceramide c18ceramide1 rename cycle3totalceramidelevels totalceramidelevels3 rename cycle3diseasestatus diseasestatus3 rename cycle3c18ceramide c18ceramide3 rename cycle5totalceramidelevels totalceramidelevels5 rename cycle5diseasestatus diseasestatus5 rename cycle5c18ceramide c18ceramide5 rename cycle3daysfromstart daysfromstart3 rename cycle5daysfromstart daysfromstart5 reshape long daysfromstart diseasestatus totalceramidelevels c18ceramide , i(patient) j(cycle) drop if totalcerami==. replace daysfromstart=0 if cycle==1