zhao1198的个人博客分享 http://blog.sciencenet.cn/u/zhao1198

博文

STATA 数据处理B——修改数据

已有 6379 次阅读 2009-8-24 03:57 |个人分类:Stata|系统分类:科研笔记| 数据处理, STATA

1.0 基本命令

codebook Show codebook information for file
label data  Apply a label to a data set
order  Order the variables in a data set
label variable Apply a label to a variable
label define Define a set of a labels for the levels of a categorical variable
label values Apply value labels to a variable 
list Lists the observations
rename Rename a variable
recode Recode the values of a variable
notes Apply notes to the data file
generate Creates a new variable
replace Replaces one value with another value
egen Extended generate - has special functions that can be used when creating a new variable 

2.0 实例展示

use http://www.ats.ucla.edu/stat/stata/notes3/hs0
Let's use the codebook command to see what our variables look like.  Because we have not listed any variables after the command, Stata will show us the codebook for all of the variables.
codebook
First, let's order the variables in a way that makes sense.  While there are several possible orderings that are logical, we will put the id variable first, followed by the demographic variables, such as gender, ses and program type.  We will put the variables regarding the test scores at the end.
order id gender 
Now let's include some variable labels so that we know a little more about the variables.
label variable schtyp "The type of school the student attended."
label define scl 1 public 2 private
label values schtyp scl
codebook schtyp
list schtyp in 1/10
list schtyp in 1/10, nolabel
Now let's create a new numeric version of the string variable prgtype.  We will call our new variable prog.
encode prgtype, gen(prog)
label variable prog "The type of program in which the student was enrolled."
codebook prog
list prog in 1/10
list prog in 1/10, nolabel
The variable gender may give us trouble in the future because it is difficult to know what the 0s and 1s mean.
rename gender female
label variable female "The gender of the student."
label define fm 1 female 0 male
label values female fm
codebook female
list female in 1/10
list female in 1/10, nolabel
Let's recode the value 5 in the variable race to be missing.
list race if race == 5
recode race 5 = .
list race if race == .

Now let's create a variable that is a total of some of the test scores.

generate total = read + write + math
summarize total
It might make more sense to add the social studies score to the total rather than the math score, so let's change that.
replace total = read + write + socst
summarize total
label variable total "The total of the read, write and socst."
codebook total
Now let's see if we can assign some letter grades to these test scores.
recode total (0/80=0 F) (80/110=1 D) (110/140=2 C) (140/170=3 B) (170/300=4 A), gen(grade)
codebook grade
label variable grade "These are the combined grades of read, write and socst."
codebook grade
list read write socst grade in 1/10
list read write socst grade in 1/10, nolabel
Let's label the data set itself so that we will remember what the data are.  We can also add some notes to the data set.
label data "High School and Beyond"
notes female:  the variable gender was renamed to female
notes race: values of race coded as 5 were recoded to be missing
notes

There is another way to create variables in Stata that uses special functions.  Some of the functions available to you are listed in the table below.  Some examples of the use of the functions follow.

egen zread = std(read)
summarize zread
list read zread in 1/10
egen rmean = mean(read), by(ses)
list read ses rmean in 1/10
egen mread = median(read), by(prog)
list read prog mread in 1/10

save hs1

Source: http://blog.cnfol.com/arlion/article/1119684.html


https://wap.sciencenet.cn/blog-285749-251062.html


下一篇:Princeton stata 教程(1)
收藏 IP: .*| 热度|

0

评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-27 09:42

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部