'Statistics' 카테고리의 글 목록 (5 Page)

Statistics

5. Table of Statistics... 2022.02.13
4. Count missing observations 2022.02.13
3. Frequency distributions... 2022.02.13
2. Numeric summaries... 2022.02.13
1. Active Data set 2022.02.13

5. Table of Statistics...

modernity4Rcmdr 2022. 2. 13. 14:24

2022. 2. 13. 14:24

통계 > 요약 > 통계표...
Statistics > Summaries > Table of statistics...

통계표(Table of statistics)는 요인(factor) 변수 유형별로 수치형(numeric, integer) 변수의 통계량을 계산하여 출력한다. Prestige 데이터셋에서 직업 유형의 type 변수를 요인에서 선택하고, 직업 유형별로 권위(prestige)의 통계량 중에서 기본 설정으로 선택된 평균값의 통계표를 선택하고, 예(OK) 버튼을 누른다.

직업 유형(bc, prof, wc)별로 평균값을 계산하여 출력한다. 출력창을 보면 Tapply() 함수를 사용함을 알 수 있다.

?Tapply  # car 패키지의 Tapply 도움말 보기

Tapply(conformity ~ partner.status + fcategory, mean, data=Moore)
Tapply(conformity ~ partner.status + fcategory, mean, data=Moore, 
    trim=0.2)

Moore[1, 2] <- NA
Tapply(conformity ~ partner.status + fcategory, mean, data=Moore)
Tapply(conformity ~ partner.status + fcategory, mean, data=Moore, 
  na.rm=TRUE)
Tapply(conformity ~ partner.status + fcategory, mean, data=Moore, 
  na.action=na.omit)  # equivalent
remove("Moore")

'Statistics > Summaries' 카테고리의 다른 글

7. Correlation test... (0)	2022.02.13
6. Correlation matrix... (0)	2022.02.13
4. Count missing observations (0)	2022.02.13
3. Frequency distributions... (0)	2022.02.13
2. Numeric summaries... (0)	2022.02.13

4. Count missing observations

modernity4Rcmdr 2022. 2. 13. 14:22

2022. 2. 13. 14:22

통계 > 요약 > 관측 결측치 셈하기
Statistics > Summaries > Count missing observations

데이터셋을 구성하는 사례에 값이 입력되지 않은 결측치가 있는 경우가 있다. 어떤 변수에 관측값이 없는 결측치가 있는지를 확인할 때 사용하는 기능이다.

Prestige 데이터셋의 type 변수에 결측치가 4개가 있음을 확인한다.

?sapply  # base 패키지의  sapply 도움말 보기

require(stats); require(graphics)

x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE))
# compute the list mean for each list element
lapply(x, mean)
# median and quartiles for each list element
lapply(x, quantile, probs = 1:3/4)
sapply(x, quantile)
i39 <- sapply(3:9, seq) # list of vectors
sapply(i39, fivenum)
vapply(i39, fivenum,
       c(Min. = 0, "1st Qu." = 0, Median = 0, "3rd Qu." = 0, Max. = 0))

## sapply(*, "array") -- artificial example
(v <- structure(10*(5:8), names = LETTERS[1:4]))
f2 <- function(x, y) outer(rep(x, length.out = 3), y)
(a2 <- sapply(v, f2, y = 2*(1:5), simplify = "array"))
a.2 <- vapply(v, f2, outer(1:3, 1:5), y = 2*(1:5))
stopifnot(dim(a2) == c(3,5,4), all.equal(a2, a.2),
          identical(dimnames(a2), list(NULL,NULL,LETTERS[1:4])))

hist(replicate(100, mean(rexp(10))))

## use of replicate() with parameters:
foo <- function(x = 1, y = 2) c(x, y)
# does not work: bar <- function(n, ...) replicate(n, foo(...))
bar <- function(n, x) replicate(n, foo(x = x))
bar(5, x = 3)

'Statistics > Summaries' 카테고리의 다른 글

6. Correlation matrix... (0)	2022.02.13
5. Table of Statistics... (0)	2022.02.13
3. Frequency distributions... (0)	2022.02.13
2. Numeric summaries... (0)	2022.02.13
1. Active Data set (0)	2022.02.13

3. Frequency distributions...

modernity4Rcmdr 2022. 2. 13. 14:21

2022. 2. 13. 14:21

통계 > 요약 > 빈도 분포...
Statistics > Summaries > Frequency distributions...

type 변수를 선택하고 예(OK)를 누른다. Prestige 데이터셋의 type 변수의 빈도를 보는 명령문이 다음과 같이 입력창에 기록되고 출력창에 빈도 정보가 출력된다:

Q1> Prestige의 변수는 여러개가 있습니다. 그중에서 왜 type만 선택 창에 나오나요?

type 변수는 factor 유형입니다. 빈도는 factor 유형의 변수만 셀 수 있기 때문입니다.

> str(Prestige) # Prestige 데이터셋의 변수 유형 살펴보기

?table  # base 패키지의 table 도움말 보기

require(stats) # for rpois and xtabs
## Simple frequency distribution
table(rpois(100, 5))
## Check the design:
with(warpbreaks, table(wool, tension))
table(state.division, state.region)

# simple two-way contingency table
with(airquality, table(cut(Temp, quantile(Temp)), Month))

a <- letters[1:3]
table(a, sample(a))                    # dnn is c("a", "")
table(a, sample(a), deparse.level = 0) # dnn is c("", "")
table(a, sample(a), deparse.level = 2) # dnn is c("a", "sample(a)")

## xtabs() <-> as.data.frame.table() :
UCBAdmissions ## already a contingency table
DF <- as.data.frame(UCBAdmissions)
class(tab <- xtabs(Freq ~ ., DF)) # xtabs & table
## tab *is* "the same" as the original table:
all(tab == UCBAdmissions)
all.equal(dimnames(tab), dimnames(UCBAdmissions))

a <- rep(c(NA, 1/0:3), 10)
table(a)                 # does not report NA's
table(a, exclude = NULL) # reports NA's
b <- factor(rep(c("A","B","C"), 10))
table(b)
table(b, exclude = "B")
d <- factor(rep(c("A","B","C"), 10), levels = c("A","B","C","D","E"))
table(d, exclude = "B")
print(table(b, d), zero.print = ".")

## NA counting:
is.na(d) <- 3:4
d. <- addNA(d)
d.[1:7]
table(d.) # ", exclude = NULL" is not needed
## i.e., if you want to count the NA's of 'd', use
table(d, useNA = "ifany")

## "pathological" case:
d.patho <- addNA(c(1,NA,1:2,1:3))[-7]; is.na(d.patho) <- 3:4
d.patho
## just 3 consecutive NA's ? --- well, have *two* kinds of NAs here :
as.integer(d.patho) # 1 4 NA NA 1 2
##
## In R >= 3.4.0, table() allows to differentiate:
table(d.patho)                   # counts the "unusual" NA
table(d.patho, useNA = "ifany")  # counts all three
table(d.patho, exclude = NULL)   #  (ditto)
table(d.patho, exclude = NA)     # counts none

## Two-way tables with NA counts. The 3rd variant is absurd, but shows
## something that cannot be done using exclude or useNA.
with(airquality,
   table(OzHi = Ozone > 80, Month, useNA = "ifany"))
with(airquality,
   table(OzHi = Ozone > 80, Month, useNA = "always"))
with(airquality,
   table(OzHi = Ozone > 80, addNA(Month)))

'Statistics > Summaries' 카테고리의 다른 글

6. Correlation matrix... (0)	2022.02.13
5. Table of Statistics... (0)	2022.02.13
4. Count missing observations (0)	2022.02.13
2. Numeric summaries... (0)	2022.02.13
1. Active Data set (0)	2022.02.13

2. Numeric summaries...

modernity4Rcmdr 2022. 2. 13. 14:15

2022. 2. 13. 14:15

통계 > 요약 > 수치적 요약...
Statistics > Summaries > Numeric summaries...

<수치적 요약...> 메뉴를 선택하면 하위 창이 나온다:

데이터 창과 통계 창이 있다. 통계 창을 보려면 데이터 옆에 있는 통계 창을 선택하면 된다:

다시 데이터 창으로 와서 prestige 라는 변수의 수치적 요약 정보를 보고자 한다. prestige 변수를 선택하고, 오른쪽 아래의 예(OK) 버튼을 선택한다:

입력 창과 출력 창을 살펴보자. 통계 창의 선택사항들에 변경을 주지 않은 상태에서 Prestige 라는 데이터셋의 prestige 변수의 수치적 정보는 다음과 같다:

?numSummary  # RcmdrMisc 패키지의 numSummary 도움말 보기

if (require("car")){
    data(Prestige)
    Prestige[1, "income"] <- NA
    print(numSummary(Prestige[,c("income", "education")], 
    	statistics=c("mean", "sd", "quantiles", "cv", "skewness", "kurtosis")))
    print(numSummary(Prestige[,c("income", "education")], groups=Prestige$type))
    remove(Prestige)
}

'Statistics > Summaries' 카테고리의 다른 글

6. Correlation matrix... (0)	2022.02.13
5. Table of Statistics... (0)	2022.02.13
4. Count missing observations (0)	2022.02.13
3. Frequency distributions... (0)	2022.02.13
1. Active Data set (0)	2022.02.13

1. Active Data set

modernity4Rcmdr 2022. 2. 13. 13:48

2022. 2. 13. 13:48

통계 > 요약 > 활성 데이터셋
Statistics > Summaries > Active Data set

Prestige라는 데이터셋을 불러와서 자료처리와 분석용으로 활성화시켰다고 가정하자. Prestige 데이터셋의 요약정보를 보고자 할때, <활성 데이터셋> 기능을 선택한다:

data(Prestige)
summary(Prestige)

?summary  # base 패키지의 summary 도움말 보기

summary(attenu, digits = 4) #-> summary.data.frame(...), default precision
summary(attenu $ station, maxsum = 20) #-> summary.factor(...)

lst <- unclass(attenu$station) > 20 # logical with NAs
## summary.default() for logicals -- different from *.factor:
summary(lst)
summary(as.factor(lst))

'Statistics > Summaries' 카테고리의 다른 글

6. Correlation matrix... (0)	2022.02.13
5. Table of Statistics... (0)	2022.02.13
4. Count missing observations (0)	2022.02.13
3. Frequency distributions... (0)	2022.02.13
2. Numeric summaries... (0)	2022.02.13

PREV 이전 1 2 3 4 5 NEXT 다음

Rcmdr.kr: An R Commander User in Korea

Statistics

5. Table of Statistics...

'Statistics > Summaries' 카테고리의 다른 글

4. Count missing observations

'Statistics > Summaries' 카테고리의 다른 글

3. Frequency distributions...

'Statistics > Summaries' 카테고리의 다른 글

2. Numeric summaries...

'Statistics > Summaries' 카테고리의 다른 글

1. Active Data set

'Statistics > Summaries' 카테고리의 다른 글

+ Recent posts

티스토리툴바