* slide 14: reading in data clear cd "I:\Classes\StatComputingI" insheet using "SCBC2004.csv", comma clear use SCBC2004.dta /* slide 15: exploring our dataset use d or describe */ d d ercat codebook codebook dodyr sum sum ercat codebook ercat * slide 17: more exploration tab race table race list race age if age<30 sort age * slide 21 outsheet race age caseid prcat using "smalldataset.csv", comma replace * slide 23 drop if race==1 keep ercat prcat stagen age grade * read back in original data clear use SCBC2004.dta * slide 24 by race, sort: sum age bysort ercat prcat: sum age tab ercat if stagen>1 tab ercat if graden~=. bysort ercat prcat: sum age if ercat<9 & prcat<9 * slide 26 gen highgrade = 1 if graden>2 | graden<100 replace highgrade = 0 if graden<3 * cannot create new variable with same name: must drop it first drop highgrade gen highgrade=cond(graden>2,1,0) replace highgrade = . if graden==. * slide 27 drop highgrade egen highgrade=cut(graden), at(-1,3,5) drop highgrade egen highgrade=cut(graden), at(-1,3,5) icodes * slide 28 gen y = log(age) gen x2 = age^2 gen z1 = runiform() * uniform(2,4) gen z2 =2+2*runiform() gen id= _n bysort county: gen countyid=_n drop z1 countyid y x2 id * slide 29 * step 1: gen z1=runiform() * step 2 sort county z1 by county : gen countyid=_n * step 3: keep only 10 women in county drop if countyid>10 * slide 29 * read in data with unformatted dates clear use Ohiosmall.dta tab year gen datedx=date(date_of_dx, "MDY") format datedx %td gen yeardiag=year(datedx) list date_of_dx dxdt_text2 dxdtchar datedx yeardiag if yeardiag==. * slide 31 * reshape clear insheet using "I:\MUSC Oncology\Shirai, Keisuke\October2010\ceramide.csv" reshape wide collecteddate - frombaselines1p, i(patient) j(cycle) reshape long clear insheet using "ceramide2.csv" rename cycle1totalceramidelevels totalceramidelevels1 rename cycle1diseasestatus diseasestatus1 rename cycle1c18ceramide c18ceramide1 rename cycle3totalceramidelevels totalceramidelevels3 rename cycle3diseasestatus diseasestatus3 rename cycle3c18ceramide c18ceramide3 rename cycle5totalceramidelevels totalceramidelevels5 rename cycle5diseasestatus diseasestatus5 rename cycle5c18ceramide c18ceramide5 rename cycle3daysfromstart daysfromstart3 rename cycle5daysfromstart daysfromstart5 reshape long daysfromstart diseasestatus totalceramidelevels c18ceramide , i(patient) j(cycle) drop if totalcerami==. replace daysfromstart=0 if cycle==1