'Dataset_info' 카테고리의 글 목록

Dataset_info

faithful 데이터셋 예제

modernity4Rcmdr 2022. 7. 25. 15:11

2022. 7. 25. 15:11

require(stats); require(graphics)
f.tit <-  "faithful data: Eruptions of Old Faithful"

ne60 <- round(e60 <- 60 * faithful$eruptions)
all.equal(e60, ne60)             # relative diff. ~ 1/10000
table(zapsmall(abs(e60 - ne60))) # 0, 0.02 or 0.04
faithful$better.eruptions <- ne60 / 60
te <- table(ne60)
te[te >= 4]                      # (too) many multiples of 5 !
plot(names(te), te, type = "h", main = f.tit, xlab = "Eruption time (sec)")

plot(faithful[, -3], main = f.tit,
     xlab = "Eruption time (min)",
     ylab = "Waiting time to next eruption (min)")
lines(lowess(faithful$eruptions, faithful$waiting, f = 2/3, iter = 3),
      col = "red")

'Dataset_info > faithful' 카테고리의 다른 글

faithful 데이터셋 (31)	2022.07.25

Old Faithful Geyser Data

Description

Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.

Usage

faithful

Format

A data frame with 272 observations on 2 variables.

[,1]	eruptions	numeric	Eruption time in mins
[,2]	waiting	numeric	Waiting time to next eruption (in mins)

Details

A closer look at faithful$eruptions reveals that these are heavily rounded times originally in seconds, where multiples of 5 are more frequent than expected under non-human measurement. For a better version of the eruption times, see the example below.

There are many versions of this dataset around: Azzalini and Bowman (1990) use a more complete version.

Source

W. Härdle.

References

Härdle, W. (1991). Smoothing Techniques with Implementation in S. New York: Springer.

Azzalini, A. and Bowman, A. W. (1990). A look at some data on the Old Faithful geyser. Applied Statistics, 39, 357–365. doi: 10.2307/2347385.

Examples

require(stats); require(graphics)
f.tit <-  "faithful data: Eruptions of Old Faithful"

ne60 <- round(e60 <- 60 * faithful$eruptions)
all.equal(e60, ne60)             # relative diff. ~ 1/10000
table(zapsmall(abs(e60 - ne60))) # 0, 0.02 or 0.04
faithful$better.eruptions <- ne60 / 60
te <- table(ne60)
te[te >= 4]                      # (too) many multiples of 5 !
plot(names(te), te, type = "h", main = f.tit, xlab = "Eruption time (sec)")

plot(faithful[, -3], main = f.tit,
     xlab = "Eruption time (min)",
     ylab = "Waiting time to next eruption (min)")
lines(lowess(faithful$eruptions, faithful$waiting, f = 2/3, iter = 3),
      col = "red")

[Package datasets version 4.0.4 Index]

'Dataset_info > faithful' 카테고리의 다른 글

faithful 데이터셋 예제 (0)	2022.07.25

cbpp 데이터셋 예제

modernity4Rcmdr 2022. 7. 1. 10:22

2022. 7. 1. 10:22

lme4::cbpp()

## response as a matrix
(m1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
             family = binomial, data = cbpp))
## response as a vector of probabilities and usage of argument "weights"
m1p <- glmer(incidence / size ~ period + (1 | herd), weights = size,
             family = binomial, data = cbpp)
## Confirm that these are equivalent:
stopifnot(all.equal(fixef(m1), fixef(m1p), tolerance = 1e-5),
          all.equal(ranef(m1), ranef(m1p), tolerance = 1e-5))


## GLMM with individual-level variability (accounting for overdispersion)
cbpp$obs <- 1:nrow(cbpp)
(m2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd) +  (1|obs),
              family = binomial, data = cbpp))

m1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
             family = binomial, data = cbpp)
summary(m1)
Anova(m1)

m1p <- glmer(incidence / size ~ period + (1 | herd), weights = size,
             family = binomial, data = cbpp)
summary(m1p)
Anova(m1p)

## GLMM with individual-level variability (accounting for overdispersion)
cbpp$obs <- 1:nrow(cbpp)
m2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd) +  (1|obs),
              family = binomial, data = cbpp)
summary(m2)
Anova(m2)

anova(m1, m2)

?cbpp  # lme4 패키지의 cbpp 도움말 보기

https://rcmdr.tistory.com/240

cbpp 데이터셋

lme4::cbpp() data(cbpp, package="lme4") '도구 > 패키지 적재하기...' 메뉴 기능을 선택하고 lme4 패키지를 찾아서 선택한다. 그리고 '데이터 > 패키지에 있는 데이터 > 첨부된 패키지에서 데이터셋 읽기...'

rcmdr.kr

'Dataset_info > cbpp' 카테고리의 다른 글

cbpp 데이터셋 (0)	2022.07.01

cbpp 데이터셋

modernity4Rcmdr 2022. 7. 1. 09:44

2022. 7. 1. 09:44

lme4::cbpp()

data(cbpp, package="lme4")

'도구 > 패키지 적재하기...' 메뉴 기능을 선택하고 lme4 패키지를 찾아서 선택한다.

그리고 '데이터 > 패키지에 있는 데이터 > 첨부된 패키지에서 데이터셋 읽기...' 메뉴 기능을 선택하면 하위 선택 창으로 이동한다. 아래와 같이 lme4 패키지를 선택하고, cbpp 데이터셋을 찾아서 선택한다.

cbpp 데이터셋이 활성화된다. R Commander 상단의 메뉴에서 < 활성 데이터셋 없음> 이 'cbpp'로 바뀐다.

summary(cbpp)
str(cbpp)

'통계 > 요약 > 활성 데이터셋' 메뉴 기능을 통해서 cbpp 데이터의 요약 정보를 살펴보자. str() 함수를 이용하여 cbpp 데이터셋의 내부 구조를 살펴보자.

cbpp {lme4}

R Documentation

Contagious bovine pleuropneumonia

Description

Contagious bovine pleuropneumonia (CBPP) is a major disease of cattle in Africa, caused by a mycoplasma. This dataset describes the serological incidence of CBPP in zebu cattle during a follow-up survey implemented in 15 commercial herds located in the Boji district of Ethiopia. The goal of the survey was to study the within-herd spread of CBPP in newly infected herds. Blood samples were quarterly collected from all animals of these herds to determine their CBPP status. These data were used to compute the serological incidence of CBPP (new cases occurring during a given time period). Some data are missing (lost to follow-up).

Format

A data frame with 56 observations on the following 4 variables.

herd

A factor identifying the herd (1 to 15).

incidence

The number of new serological cases for a given herd and time period.

size

A numeric vector describing herd size at the beginning of a given time period.

period

A factor with levels 1 to 4.

Details

Serological status was determined using a competitive enzyme-linked immuno-sorbent assay (cELISA).

Source

Lesnoff, M., Laval, G., Bonnet, P., Abdicho, S., Workalemahu, A., Kifle, D., Peyraud, A., Lancelot, R., Thiaucourt, F. (2004) Within-herd spread of contagious bovine pleuropneumonia in Ethiopian highlands. Preventive Veterinary Medicine 64, 27–40.

Examples

## response as a matrix
(m1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
             family = binomial, data = cbpp))
## response as a vector of probabilities and usage of argument "weights"
m1p <- glmer(incidence / size ~ period + (1 | herd), weights = size,
             family = binomial, data = cbpp)
## Confirm that these are equivalent:
stopifnot(all.equal(fixef(m1), fixef(m1p), tolerance = 1e-5),
          all.equal(ranef(m1), ranef(m1p), tolerance = 1e-5))


## GLMM with individual-level variability (accounting for overdispersion)
cbpp$obs <- 1:nrow(cbpp)
(m2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd) +  (1|obs),
              family = binomial, data = cbpp))

[Package lme4 version 1.1-26 Index]

'Dataset_info > cbpp' 카테고리의 다른 글

cbpp 데이터셋 예제 (31)	2022.07.01

Chile 데이터셋

modernity4Rcmdr 2022. 6. 30. 11:52

2022. 6. 30. 11:52

carData::Chile()

data(Chile, package="carData")

'데이터 > 패키지에 있는 데이터 > 첨부된 패키지에서 데이터셋 읽기...' 메뉴 기능을 선택하면 하위 선택 창으로 이동한다. 아래와 같이 carData 패키지를 선택하고, Chile 데이터셋을 찾아서 선택한다.

Chile 데이터셋이 활성화된다. R Commander 상단의 메뉴에서 < 활성 데이터셋 없음> 이 'Chile'로 바뀐다.

summary(Chile)
str(Chile)

'통계 > 요약 > 활성 데이터셋' 메뉴 기능을 통해서 Chile 데이터의 요약 정보를 살펴보자. str() 함수를 이용하여 Chile 데이터셋의 내부 구조를 살펴보자.

데이터셋의 내부는 다음과 같다:

Chile {carData}

R Documentation

Voting Intentions in the 1988 Chilean Plebiscite

Description

The Chile data frame has 2700 rows and 8 columns. The data are from a national survey conducted in April and May of 1988 by FLACSO/Chile. There are some missing data.

Usage

Chile

Format

This data frame contains the following columns:

region

A factor with levels: C, Central; M, Metropolitan Santiago area; N, North; S, South; SA, city of Santiago.

population

Population size of respondent's community.

sex

A factor with levels: F, female; M, male.

age

in years.

education

A factor with levels (note: out of order): P, Primary; PS, Post-secondary; S, Secondary.

income

Monthly income, in Pesos.

statusquo

Scale of support for the status-quo.

vote

a factor with levels: A, will abstain; N, will vote no (against Pinochet); U, undecided; Y, will vote yes (for Pinochet).

Source

Personal communication from FLACSO/Chile.

References

Fox, J. (2016) Applied Regression Analysis and Generalized Linear Models, Third Edition. Sage.

Fox, J. and Weisberg, S. (2019) An R Companion to Applied Regression, Third Edition, Sage.

[Package carData version 3.0-4 Index]

housing 데이터셋 예제

modernity4Rcmdr 2022. 6. 25. 21:18

2022. 6. 25. 21:18

MASS::housing()

require(MASS)
?housing  # housing 데이터셋 도움말 보기

# 아래는 example(housing) 입니다.

options(contrasts = c("contr.treatment", "contr.poly"))

# Surrogate Poisson models
house.glm0 <- glm(Freq ~ Infl*Type*Cont + Sat, family = poisson,
                  data = housing)
## IGNORE_RDIFF_BEGIN
summary(house.glm0, cor = FALSE)
## IGNORE_RDIFF_END

addterm(house.glm0, ~. + Sat:(Infl+Type+Cont), test = "Chisq")

house.glm1 <- update(house.glm0, . ~ . + Sat*(Infl+Type+Cont))
summary(house.glm1, cor = FALSE)

1 - pchisq(deviance(house.glm1), house.glm1$df.residual)

dropterm(house.glm1, test = "Chisq")

addterm(house.glm1, ~. + Sat:(Infl+Type+Cont)^2, test  =  "Chisq")

hnames <- lapply(housing[, -5], levels) # omit Freq
newData <- expand.grid(hnames)
newData$Sat <- ordered(newData$Sat)
house.pm <- predict(house.glm1, newData,
                    type = "response")  # poisson means
house.pm <- matrix(house.pm, ncol = 3, byrow = TRUE,
                   dimnames = list(NULL, hnames[[1]]))
house.pr <- house.pm/drop(house.pm %*% rep(1, 3))
cbind(expand.grid(hnames[-1]), round(house.pr, 2))

# Iterative proportional scaling
loglm(Freq ~ Infl*Type*Cont + Sat*(Infl+Type+Cont), data = housing)


# multinomial model
library(nnet)
(house.mult<- multinom(Sat ~ Infl + Type + Cont, weights = Freq,
                       data = housing))
house.mult2 <- multinom(Sat ~ Infl*Type*Cont, weights = Freq,
                        data = housing)
anova(house.mult, house.mult2)

house.pm <- predict(house.mult, expand.grid(hnames[-1]), type = "probs")
cbind(expand.grid(hnames[-1]), round(house.pm, 2))

# proportional odds model
house.cpr <- apply(house.pr, 1, cumsum)
logit <- function(x) log(x/(1-x))
house.ld <- logit(house.cpr[2, ]) - logit(house.cpr[1, ])
(ratio <- sort(drop(house.ld)))
mean(ratio)

(house.plr <- polr(Sat ~ Infl + Type + Cont,
                   data = housing, weights = Freq))

house.pr1 <- predict(house.plr, expand.grid(hnames[-1]), type = "probs")
cbind(expand.grid(hnames[-1]), round(house.pr1, 2))

Fr <- matrix(housing$Freq, ncol  =  3, byrow = TRUE)
2*sum(Fr*log(house.pr/house.pr1))

house.plr2 <- stepAIC(house.plr, ~.^2)
house.plr2$anova

house.glm0 <- glm(Freq ~ Infl*Type*Cont + Sat, family = poisson, data = housing)
summary(house.glm0, cor = FALSE)

house.glm1 <- update(house.glm0, . ~ . + Sat*(Infl+Type+Cont))
summary(house.glm1, cor = FALSE)

# multinomial model
library(nnet)
house.mult<- multinom(Sat ~ Infl + Type + Cont, weights = Freq,
                       data = housing)
house.mult2 <- multinom(Sat ~ Infl*Type*Cont, weights = Freq,
                        data = housing)
anova(house.mult, house.mult2)
Anova(house.mult2, type="II")

house.plr <- polr(Sat ~ Infl + Type + Cont,
                   data = housing, weights = Freq)
house.plr2 <- stepAIC(house.plr, ~.^2)
house.plr2$anova

'Dataset_info > housing' 카테고리의 다른 글

housing 데이터셋 (0)	2022.06.24

sleepstudy 데이터셋 예제

modernity4Rcmdr 2022. 6. 25. 20:15

2022. 6. 25. 20:15

lme4::sleepstudy()

require(lme4)
?sleepstudy  # sleepstudy 데이터셋 도움말 보기

아래는 example(sleepstudy) 입니다.

str(sleepstudy)
require(lattice)
xyplot(Reaction ~ Days | Subject, sleepstudy, type = c("g","p","r"),
       index = function(x,y) coef(lm(y ~ x))[1],
       xlab = "Days of sleep deprivation",
       ylab = "Average reaction time (ms)", aspect = "xy")
(fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
(fm2 <- lmer(Reaction ~ Days + (1|Subject) + (0+Days|Subject), sleepstudy))

fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
fm2 <- lmer(Reaction ~ Days + (1|Subject) + (0+Days|Subject), sleepstudy)

'Dataset_info > sleepstudy' 카테고리의 다른 글

sleepstudy 데이터셋 (0)	2022.06.23

swiss 데이터셋 예제

modernity4Rcmdr 2022. 6. 25. 17:26

2022. 6. 25. 17:26

datasets::swiss()

?swiss  # swiss 데이터셋 도움말 보기

# 아래는 example(swiss) 입니다.

require(stats); require(graphics)
pairs(swiss, panel = panel.smooth, main = "swiss data",
      col = 3 + (swiss$Catholic > 50))
summary(lm(Fertility ~ . , data = swiss))

pairs(swiss, panel = panel.smooth, main = "swiss data",
      col = 3 + (swiss$Catholic > 50))

pairs(swiss, panel = panel.smooth, main = "swiss data")  # 두 그래프를 비교해 보기

LinearModel.1 <- lm(Fertility ~ . , data = swiss)
summary(LinearModel.1)

'Dataset_info > swiss' 카테고리의 다른 글

swiss 데이터셋 (0)	2022.06.13

warpbreaks 데이터셋 예제

modernity4Rcmdr 2022. 6. 25. 17:08

2022. 6. 25. 17:08

datasets::warpbreaks

?warpbreaks  # warpbreaks 도움말 보기

# 아래는 example(warpbreaks) 입니다.

require(stats); require(graphics)
summary(warpbreaks)
opar <- par(mfrow = c(1, 2), oma = c(0, 0, 1.1, 0))
plot(breaks ~ tension, data = warpbreaks, col = "lightgray",
     varwidth = TRUE, subset = wool == "A", main = "Wool A")
plot(breaks ~ tension, data = warpbreaks, col = "lightgray",
     varwidth = TRUE, subset = wool == "B", main = "Wool B")
mtext("warpbreaks data", side = 3, outer = TRUE)
par(opar)
summary(fm1 <- lm(breaks ~ wool*tension, data = warpbreaks))
anova(fm1)

'Dataset_info > warpbreaks' 카테고리의 다른 글

warpbreaks 데이터셋 (0)	2022.03.20

USArrests 데이터셋 예제

modernity4Rcmdr 2022. 6. 25. 15:08

2022. 6. 25. 15:08

datasets::USArrests()

?USArrests  # USArrests 데이터셋 도움말 보기

# 아래는 example(USArrests) 입니다.

summary(USArrests)

require(graphics)
pairs(USArrests, panel = panel.smooth, main = "USArrests data")

## Difference between 'USArrests' and its correction
USArrests["Maryland", "UrbanPop"] # 67 -- the transcription error
UA.C <- USArrests
UA.C["Maryland", "UrbanPop"] <- 76.6

## also +/- 0.5 to restore the original  <n>.5  percentages
s5u <- c("Colorado", "Florida", "Mississippi", "Wyoming")
s5d <- c("Nebraska", "Pennsylvania")
UA.C[s5u, "UrbanPop"] <- UA.C[s5u, "UrbanPop"] + 0.5
UA.C[s5d, "UrbanPop"] <- UA.C[s5d, "UrbanPop"] - 0.5

# USArrests 원자료를 수정한 UA.C 데이터셋

pairs(UA.C, panel = panel.smooth, main = "USArrests data (corrected)")

'Dataset_info > USArrests' 카테고리의 다른 글

USArrests 데이터셋 (0)	2022.03.08

PREV 이전 1 2 3 4 NEXT 다음

Dataset_info

'Dataset_info > faithful' 카테고리의 다른 글

Old Faithful Geyser Data

Description

Usage

Format

Details

Source

References

See Also

Examples

'Dataset_info > faithful' 카테고리의 다른 글

'Dataset_info > cbpp' 카테고리의 다른 글

Contagious bovine pleuropneumonia

Description

Format

Details

Source

Examples

'Dataset_info > cbpp' 카테고리의 다른 글

Voting Intentions in the 1988 Chilean Plebiscite

Description

Usage

Format

Source

References

'Dataset_info > housing' 카테고리의 다른 글

'Dataset_info > sleepstudy' 카테고리의 다른 글

'Dataset_info > swiss' 카테고리의 다른 글

'Dataset_info > warpbreaks' 카테고리의 다른 글

'Dataset_info > USArrests' 카테고리의 다른 글

티스토리툴바