'Dataset_info' 카테고리의 글 목록 (2 Page)

Dataset_info

sleep 데이터셋 예제

modernity4Rcmdr 2022. 6. 25. 10:29

2022. 6. 25. 10:29

datasets::sleep()

?sleep #sleep 데이터셋 도움말 보기

# 아래는 example(sleep) 입니다.

require(stats)
## Student's paired t-test
with(sleep,
     t.test(extra[group == 1],
            extra[group == 2], paired = TRUE))

## The sleep *prolongations*
sleep1 <- with(sleep, extra[group == 2] - extra[group == 1])
summary(sleep1)
stripchart(sleep1, method = "stack", xlab = "hours",
           main = "Sleep prolongation (n = 10)")
boxplot(sleep1, horizontal = TRUE, add = TRUE,
        at = .6, pars = list(boxwex = 0.5, staplewex = 0.25))

'Dataset_info > sleep' 카테고리의 다른 글

sleep 데이터셋 (0)	2022.03.07

airquality 데이터셋 예제

modernity4Rcmdr 2022. 6. 24. 22:47

2022. 6. 24. 22:47

datasets::airquality()

?airquality # airquality 데이터셋 도움말 보기

# 아래는 example(airquality) 입니다.

require(graphics)
pairs(airquality, panel = panel.smooth, main = "airquality data")

'Dataset_info > airquality' 카테고리의 다른 글

airquality 데이터셋 (0)	2022.02.22

OBrienKaiserLong 데이터셋 예제

modernity4Rcmdr 2022. 6. 24. 22:34

2022. 6. 24. 22:34

carData::OBrienKaiserLong()

?OBrienKaiserLong # OBrienKaiserLong 데이터셋 도움말 보기

# 아래는 example(OBrienKaiserLong) 입니다.

head(OBrienKaiserLong, 15) # first subject

'Dataset_info > OBrienKaiserLong' 카테고리의 다른 글

OBrienKaiserLong 데이터셋 (0)	2022.02.19

OBrienKaiser 데이터셋 예제

modernity4Rcmdr 2022. 6. 24. 22:26

2022. 6. 24. 22:26

carData::OBrienKaiser()

?OBrienKaiser # OBrienKaiser 대이터셋 도움말 보기

# 아래는 example(OBrienKaiser) 입니다.

OBrienKaiser
contrasts(OBrienKaiser$treatment)
contrasts(OBrienKaiser$gender)

'Dataset_info > OBrienKaiser' 카테고리의 다른 글

OBrienKaiser 데이터셋 (0)	2022.02.19

housing 데이터셋

modernity4Rcmdr 2022. 6. 24. 14:26

2022. 6. 24. 14:26

MASS::housing()

library(MASS, pos=16)
data(housing, package="MASS")

'도구 > 패키지 적재하기...' 메뉴 기능을 선택하고, MASS 패키지를 찾아서 선택한다.

그리고나서, '데이터 > 패키지에 있는 데이터 > 첨부된 패키지에서 데이터셋 읽기...' 메뉴 기능을 선택하면 하위 선택 창으로 이동한다. 아래와 같이 MASS 패키지를 선택하고, housing 데이터셋을 찾아 선택한다.

housing 데이터셋이 활성화된다. R Commander 상단의 메뉴에서 < 활성 데이터셋 없음> 이 'housing'로 바뀐다.

summary(housing)
str(housing)

'통계 > 요약 > 활성 데이터셋' 메뉴 기능을 선택하여 housing 데이터셋의 요약 정보를 살펴보자. 아울러 입력창에 str(housing)을 입력하고 <실행하기> 버튼을 누르자.

데이터셋의 내부는 다음과 같다:

housing {MASS}

R Documentation

Frequency Table from a Copenhagen Housing Conditions Survey

Description

The housing data frame has 72 rows and 5 variables.

Usage

housing

Format

Sat

Satisfaction of householders with their present housing circumstances, (High, Medium or Low, ordered factor).

Infl

Perceived degree of influence householders have on the management of the property (High, Medium, Low).

Type

Type of rental accommodation, (Tower, Atrium, Apartment, Terrace).

Cont

Contact residents are afforded with other residents, (Low, High).

Freq

Frequencies: the numbers of residents in each class.

Source

Madsen, M. (1976) Statistical analysis of multiple contingency tables. Two examples. Scand. J. Statist. 3, 97–106.

Cox, D. R. and Snell, E. J. (1984) Applied Statistics, Principles and Examples. Chapman & Hall.

References

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Examples

options(contrasts = c("contr.treatment", "contr.poly"))

# Surrogate Poisson models
house.glm0 <- glm(Freq ~ Infl*Type*Cont + Sat, family = poisson,
                  data = housing)
## IGNORE_RDIFF_BEGIN
summary(house.glm0, cor = FALSE)
## IGNORE_RDIFF_END

addterm(house.glm0, ~. + Sat:(Infl+Type+Cont), test = "Chisq")

house.glm1 <- update(house.glm0, . ~ . + Sat*(Infl+Type+Cont))
summary(house.glm1, cor = FALSE)

1 - pchisq(deviance(house.glm1), house.glm1$df.residual)

dropterm(house.glm1, test = "Chisq")

addterm(house.glm1, ~. + Sat:(Infl+Type+Cont)^2, test  =  "Chisq")

hnames <- lapply(housing[, -5], levels) # omit Freq
newData <- expand.grid(hnames)
newData$Sat <- ordered(newData$Sat)
house.pm <- predict(house.glm1, newData,
                    type = "response")  # poisson means
house.pm <- matrix(house.pm, ncol = 3, byrow = TRUE,
                   dimnames = list(NULL, hnames[[1]]))
house.pr <- house.pm/drop(house.pm %*% rep(1, 3))
cbind(expand.grid(hnames[-1]), round(house.pr, 2))

# Iterative proportional scaling
loglm(Freq ~ Infl*Type*Cont + Sat*(Infl+Type+Cont), data = housing)


# multinomial model
library(nnet)
(house.mult<- multinom(Sat ~ Infl + Type + Cont, weights = Freq,
                       data = housing))
house.mult2 <- multinom(Sat ~ Infl*Type*Cont, weights = Freq,
                        data = housing)
anova(house.mult, house.mult2)

house.pm <- predict(house.mult, expand.grid(hnames[-1]), type = "probs")
cbind(expand.grid(hnames[-1]), round(house.pm, 2))

# proportional odds model
house.cpr <- apply(house.pr, 1, cumsum)
logit <- function(x) log(x/(1-x))
house.ld <- logit(house.cpr[2, ]) - logit(house.cpr[1, ])
(ratio <- sort(drop(house.ld)))
mean(ratio)

(house.plr <- polr(Sat ~ Infl + Type + Cont,
                   data = housing, weights = Freq))

house.pr1 <- predict(house.plr, expand.grid(hnames[-1]), type = "probs")
cbind(expand.grid(hnames[-1]), round(house.pr1, 2))

Fr <- matrix(housing$Freq, ncol  =  3, byrow = TRUE)
2*sum(Fr*log(house.pr/house.pr1))

house.plr2 <- stepAIC(house.plr, ~.^2)
house.plr2$anova

[Package MASS version 7.3-53.1 Index]

'Dataset_info > housing' 카테고리의 다른 글

housing 데이터셋 예제 (0)	2022.06.25

bwt 데이터셋

modernity4Rcmdr 2022. 6. 24. 10:32

2022. 6. 24. 10:32

bwt.RData

0.00MB

MASS 패키지에는 birthwt라는 데이터셋이 포함되어 있다. birthwt 데이터셋을 활용하여 bwt라는 2차 데이터셋이 만들어진다.

bwt <- with(birthwt, {
race <- factor(race, labels = c("white", "black", "other"))
ptd <- factor(ptl > 0)
ftv <- factor(ftv)
levels(ftv)[-(1:2)] <- "2+"
data.frame(low = factor(low), age, lwt, race, smoke = (smoke > 0),
           ptd, ht = (ht > 0), ui = (ui > 0), ftv)
})

bwt <- with(birthwt, {
race <- factor(race, labels = c("white", "black", "other"))
ptd <- factor(ptl > 0)
ftv <- factor(ftv)
levels(ftv)[-(1:2)] <- "2+"
data.frame(low = factor(low), age, lwt, race, smoke = (smoke > 0),
ptd, ht = (ht > 0), ui = (ui > 0), ftv)
})

bwt 데이터셋은 분석 모형을 만드는데 간혹 예제로 사용되는데, birthwt에서 bwt가 만들어지는 과정이 R Commander 기본 사용자에게는 다소 어렵게 느껴질수 있겠다는 판단이다. 데이터셋 자체에 대한 이해의 어려움 때문에 분석 모형의 구성과 해석으로 나아가지 못하는 경우가 있어, bwt 데이터셋 설명을 하고자 한다.

bwt 데이터셋은 저체중아 출생의 원인을 찾고자 하는 문제의식을 담고 있다. low 변수는 출생당시 몸무게가 2.5kg 미만 여부를 담고 있으며, 반응변수가 된다. 나머지 변수들은 저체중아 출산에 영향을 끼치는가 여부인 설명변수들의 후보군이 되겠다.

options(contrasts = c("contr.treatment", "contr.poly"))
GLM.1 <- glm(low ~ ., binomial, bwt)

'Dataset_info > birthwt' 카테고리의 다른 글

birthwt 데이터셋 (0)	2022.03.09

sleepstudy 데이터셋

modernity4Rcmdr 2022. 6. 23. 18:41

2022. 6. 23. 18:41

lme4::sleepstudy()

data(sleepstudy, package="lme4")

'데이터 > 패키지에 있는 데이터 > 첨부된 패키지에서 데이터셋 읽기...' 메뉴 기능을 선택하면 하위 선택 창으로 이동한다. 아래와 같이 lme4 패키지를 선택하고, sleepstudy 데이터셋을 찾아 선택한다.

sleepstudy 데이터셋이 활성화된다. R Commander 상단의 메뉴에서 < 활성 데이터셋 없음> 이 'sleepstudy'로 바뀐다.

summary(sleepstudy)
str(sleepstudy)

'통계 > 요약 > 활성 데이터셋' 메뉴 기능을 통해서 sleepstudy 데이터의 요약 정보를 살펴보자. str() 함수를 이용하여 sleepstudy 데이터셋의 내부 구조를 살펴보자.

데이터셋의 내부는 다음과 같다:

sleepstudy {lme4}

R Documentation

Reaction times in a sleep deprivation study

Description

The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject.

Format

A data frame with 180 observations on the following 3 variables.

Reaction

Average reaction time (ms)

Days

Number of days of sleep deprivation

Subject

Subject number on which the observation was made.

Details

These data are from the study described in Belenky et al. (2003), for the sleep-deprived group and for the first 10 days of the study, up to the recovery period.

References

Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003) Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1–12.

Examples

str(sleepstudy)
require(lattice)
xyplot(Reaction ~ Days | Subject, sleepstudy, type = c("g","p","r"),
       index = function(x,y) coef(lm(y ~ x))[1],
       xlab = "Days of sleep deprivation",
       ylab = "Average reaction time (ms)", aspect = "xy")
(fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
(fm2 <- lmer(Reaction ~ Days + (1|Subject) + (0+Days|Subject), sleepstudy))

[Package lme4 version 1.1-26 Index]

'Dataset_info > sleepstudy' 카테고리의 다른 글

sleepstudy 데이터셋 예제 (0)	2022.06.25

SPSS 데이터 파일 사례

modernity4Rcmdr 2022. 6. 23. 08:24

2022. 6. 23. 08:24

foreign 패키지 안에는 SPSS 파일이 예제로 포함되어 있다.

R Commander에서 SPSS 파일 불러오기 연습은 아래를 참조할 수 있다:

https://rcmdr.tistory.com/30

from SPSS data set...

데이터 > 데이터 불러오기 > SPSS 데이터셋에서... Data > Import data > from SPSS data set... '데이터 > 데이터 불러오기 > SPSS 데이터셋에서...' 메뉴 기능을 선택하자. 아래와 같은 화면이 등장한다. - 데..

rcmdr.kr

Duncan 데이터셋

modernity4Rcmdr 2022. 6. 14. 19:17

2022. 6. 14. 19:17

carData::Duncan()

data(Duncan, package="carData")

R Commander의 상단에 있는 '데이터셋 보기' 버튼을 누르면, 아래와 같이 데이터셋 내부를 볼 수 있다.

?Duncan    # Duncan 데이터셋 도움말 보기

Duncan {carData}

R Documentation

Duncan's Occupational Prestige Data

Description

The Duncan data frame has 45 rows and 4 columns. Data on the prestige and other characteristics of 45 U. S. occupations in 1950.

Usage

Duncan

Format

This data frame contains the following columns:

type

Type of occupation. A factor with the following levels: prof, professional and managerial; wc, white-collar; bc, blue-collar.

income

Percentage of occupational incumbents in the 1950 US Census who earned $3,500 or more per year (about $36,000 in 2017 US dollars).

education

Percentage of occupational incumbents in 1950 who were high school graduates (which, were we cynical, we would say is roughly equivalent to a PhD in 2017)

prestige

Percentage of respondents in a social survey who rated the occupation as “good” or better in prestige

Source

Duncan, O. D. (1961) A socioeconomic index for all occupations. In Reiss, A. J., Jr. (Ed.) Occupations and Social Status. Free Press [Table VI-1].

References

Fox, J. (2016) Applied Regression Analysis and Generalized Linear Models, Third Edition. Sage.

Fox, J. and Weisberg, S. (2019) An R Companion to Applied Regression, Third Edition, Sage.

[Package carData version 3.0-4 Index]

swiss 데이터셋

modernity4Rcmdr 2022. 6. 13. 09:12

2022. 6. 13. 09:12

datasets::swiss()

data(swiss, package="datasets") # swiss 데이터셋 불러오기
summary(swiss)                  # swiss 데이터셋 요약정보보기
str(swiss)                      # swiss 데이터셋 구조살펴보기

데이터셋의 내부는 다음과 같다:

swiss {datasets}

R Documentation

Swiss Fertility and Socioeconomic Indicators (1888) Data

Description

Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.

Usage

swiss

Format

A data frame with 47 observations on 6 variables, each of which is in percent, i.e., in [0, 100].

[,1]	Fertility	Ig, ‘common standardized fertility measure’
[,2]	Agriculture	% of males involved in agriculture as occupation
[,3]	Examination	% draftees receiving highest mark on army examination
[,4]	Education	% education beyond primary school for draftees.
[,5]	Catholic	% ‘catholic’ (as opposed to ‘protestant’).
[,6]	Infant.Mortality	live births who live less than 1 year.

All variables but ‘Fertility’ give proportions of the population.

Details

(paraphrasing Mosteller and Tukey):

Switzerland, in 1888, was entering a period known as the demographic transition; i.e., its fertility was beginning to fall from the high level typical of underdeveloped countries.

The data collected are for 47 French-speaking “provinces” at about 1888.

Here, all variables are scaled to [0, 100], where in the original, all but "Catholic" were scaled to [0, 1].

Note

Files for all 182 districts in 1888 and other years have been available at https://opr.princeton.edu/archive/pefp/switz.aspx.

They state that variables Examination and Education are averages for 1887, 1888 and 1889.

Source

Project “16P5”, pages 549–551 in

Mosteller, F. and Tukey, J. W. (1977) Data Analysis and Regression: A Second Course in Statistics. Addison-Wesley, Reading Mass.

indicating their source as “Data used by permission of Franice van de Walle. Office of Population Research, Princeton University, 1976. Unpublished data assembled under NICHD contract number No 1-HD-O-2077.”

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Examples

require(stats); require(graphics)
pairs(swiss, panel = panel.smooth, main = "swiss data",
      col = 3 + (swiss$Catholic > 50))
summary(lm(Fertility ~ . , data = swiss))

[Package datasets version 4.0.4 Index]

'Dataset_info > swiss' 카테고리의 다른 글

swiss 데이터셋 예제 (0)	2022.06.25

PREV 이전 1 2 3 4 NEXT 다음

Dataset_info

'Dataset_info > sleep' 카테고리의 다른 글

'Dataset_info > airquality' 카테고리의 다른 글

'Dataset_info > OBrienKaiserLong' 카테고리의 다른 글

'Dataset_info > OBrienKaiser' 카테고리의 다른 글

Frequency Table from a Copenhagen Housing Conditions Survey

Description

Usage

Format

Source

References

Examples

'Dataset_info > housing' 카테고리의 다른 글

'Dataset_info > birthwt' 카테고리의 다른 글

Reaction times in a sleep deprivation study

Description

Format

Details

References

Examples

'Dataset_info > sleepstudy' 카테고리의 다른 글

Duncan's Occupational Prestige Data

Description

Usage

Format

Source

References

Swiss Fertility and Socioeconomic Indicators (1888) Data

Description

Usage

Format

Details

Note

Source

References

Examples

'Dataset_info > swiss' 카테고리의 다른 글

티스토리툴바