bwt 데이터셋은 분석 모형을 만드는데 간혹 예제로 사용되는데, birthwt에서 bwt가 만들어지는 과정이 R Commander 기본 사용자에게는 다소 어렵게 느껴질수 있겠다는 판단이다. 데이터셋 자체에 대한 이해의 어려움 때문에 분석 모형의 구성과 해석으로 나아가지 못하는 경우가 있어, bwt 데이터셋 설명을 하고자 한다.
bwt 데이터셋은 저체중아 출생의 원인을 찾고자 하는 문제의식을 담고 있다. low 변수는 출생당시 몸무게가 2.5kg 미만 여부를 담고 있으며, 반응변수가 된다. 나머지 변수들은 저체중아 출산에 영향을 끼치는가 여부인 설명변수들의 후보군이 되겠다.
'데이터 > 패키지에 있는 데이터 > 첨부된 패키지에서 데이터셋 읽기...' 메뉴 기능을 선택하면 하위 선택 창으로 이동한다. 아래와 같이 lme4 패키지를 선택하고, sleepstudy 데이터셋을 찾아 선택한다.
sleepstudy 데이터셋이 활성화된다. R Commander 상단의 메뉴에서 < 활성 데이터셋 없음> 이 'sleepstudy'로 바뀐다.
summary(sleepstudy)
str(sleepstudy)
'통계 > 요약 > 활성 데이터셋' 메뉴 기능을 통해서 sleepstudy 데이터의 요약 정보를 살펴보자. str() 함수를 이용하여 sleepstudy 데이터셋의 내부 구조를 살펴보자.
데이터셋의 내부는 다음과 같다:
sleepstudy {lme4}
R Documentation
Reaction times in a sleep deprivation study
Description
The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject.
Format
A data frame with 180 observations on the following 3 variables.
Reaction
Average reaction time (ms)
Days
Number of days of sleep deprivation
Subject
Subject number on which the observation was made.
Details
These data are from the study described in Belenky et al. (2003), for the sleep-deprived group and for the first 10 days of the study, up to the recovery period.
References
Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003) Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1–12.
Examples
str(sleepstudy)
require(lattice)
xyplot(Reaction ~ Days | Subject, sleepstudy, type = c("g","p","r"),
index = function(x,y) coef(lm(y ~ x))[1],
xlab = "Days of sleep deprivation",
ylab = "Average reaction time (ms)", aspect = "xy")
(fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
(fm2 <- lmer(Reaction ~ Days + (1|Subject) + (0+Days|Subject), sleepstudy))
R Commander의 상단에 있는 '데이터셋 보기' 버튼을 누르면, 아래와 같이 데이터셋 내부를 볼 수 있다.
?Duncan # Duncan 데이터셋 도움말 보기
Duncan {carData}
R Documentation
Duncan's Occupational Prestige Data
Description
The Duncan data frame has 45 rows and 4 columns. Data on the prestige and other characteristics of 45 U. S. occupations in 1950.
Usage
Duncan
Format
This data frame contains the following columns:
type
Type of occupation. A factor with the following levels: prof, professional and managerial; wc, white-collar; bc, blue-collar.
income
Percentage of occupational incumbents in the 1950 US Census who earned $3,500 or more per year (about $36,000 in 2017 US dollars).
education
Percentage of occupational incumbents in 1950 who were high school graduates (which, were we cynical, we would say is roughly equivalent to a PhD in 2017)
prestige
Percentage of respondents in a social survey who rated the occupation as “good” or better in prestige
Source
Duncan, O. D. (1961) A socioeconomic index for all occupations. In Reiss, A. J., Jr. (Ed.) Occupations and Social Status. Free Press [Table VI-1].
References
Fox, J. (2016) Applied Regression Analysis and Generalized Linear Models, Third Edition. Sage.
Fox, J. and Weisberg, S. (2019) An R Companion to Applied Regression, Third Edition, Sage.
Swiss Fertility and Socioeconomic Indicators (1888) Data
Description
Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.
Usage
swiss
Format
A data frame with 47 observations on 6 variables, each of which is in percent, i.e., in [0, 100].
[,1]
Fertility
Ig, ‘common standardized fertility measure’
[,2]
Agriculture
% of males involved in agriculture as occupation
[,3]
Examination
% draftees receiving highest mark on army examination
[,4]
Education
% education beyond primary school for draftees.
[,5]
Catholic
% ‘catholic’ (as opposed to ‘protestant’).
[,6]
Infant.Mortality
live births who live less than 1 year.
All variables but ‘Fertility’ give proportions of the population.
Details
(paraphrasing Mosteller and Tukey):
Switzerland, in 1888, was entering a period known as the demographic transition; i.e., its fertility was beginning to fall from the high level typical of underdeveloped countries.
The data collected are for 47 French-speaking “provinces” at about 1888.
Here, all variables are scaled to [0, 100], where in the original, all but "Catholic" were scaled to [0, 1].
They state that variables Examination and Education are averages for 1887, 1888 and 1889.
Source
Project “16P5”, pages 549–551 in
Mosteller, F. and Tukey, J. W. (1977) Data Analysis and Regression: A Second Course in Statistics. Addison-Wesley, Reading Mass.
indicating their source as “Data used by permission of Franice van de Walle. Office of Population Research, Princeton University, 1976. Unpublished data assembled under NICHD contract number No 1-HD-O-2077.”
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Examples
require(stats); require(graphics)
pairs(swiss, panel = panel.smooth, main = "swiss data",
col = 3 + (swiss$Catholic > 50))
summary(lm(Fertility ~ . , data = swiss))