'분류 전체보기' 카테고리의 글 목록 (15 Page)

Rcmdr_2.7-2.tar.gz

간혹, 데이터셋의 요약정보를 보려고 할 때 오류가 나는 경우가 있다. (설치된 Rcmdr 2.7-2 의 한글메뉴 경우)

예를 들어, OBrienKaiser 데이터셋은 R Commander에서 활성 데이터셋으로 이용할 수 있다. 그러나 '통계 > 요약 > 활성데이터셋' 기능은 사용할 수 없다. 다음과 같은 오류문을 Rgui 창에서 보게된다.

Error in sprintf(gettextRcmdr("There are %d variables in the data set %s.\nDo you want to proceed?"), :
'%d'는 유효하지 않은 포맷입니다; 문자형 객체들에는 포맷 %s를 사용해주세요

Rcmdr 한글화 번역자의 입장에서 사용자에게 사과해야할 상황이다. 한글 번역 과정에서 발생한 오류이기 때문이다. R의 공식 소스에는 다음의 한글 내용이 포함되어 있다.

다음과 같이 바꾸면, OBrienKaiser 데이터셋의 요약정보를 보는데 오류를 출력하지 않는다. 소스 파일이 변경되는 것이라, 다시 컴파일을 해야한다. 이 화면 상단의 소스 압축파일은 한글 관련 .po / .mo 파일이 수정된 것이다. 내려받아 install.packages(소스압축파일, repo=NULL, type="source") 등으로 컴파일 설치를 해야한다.

아래의 오류 이해와 대응 방식을 참조할 수 있다:
https://rcmdr.tistory.com/m/64

Data > Import data > from Excel file... 오류 발생 및 상황 이해하기

R Commander에서 엑셀파일을 불러올 때가 있다. 이 때 사용하는 기능이다. 그런데 Rcmdr 2.7-x 에서 엑셀파일을 불러오는데 오류가 발생한다. 메뉴 한글화 담당자로서 이 상황을 설명하려고 한다.

rcmdr.kr

'알림-비메뉴정보' 카테고리의 다른 글

RcmdrPlugin.introR에 Rcmdr.kr 블로그 링크 연결 (0)	2022.07.03
오마이뉴스 기사 보도 (22.04.18) (0)	2022.06.26
Rcmdr_2.7-2.3.tar.gz (0)	2022.02.26
Data > Import data > from Excel file... 오류 발생 및 상황 이해하기 (0)	2022.02.06
Changes in version 2.7-0 (0)	2020.08.30

carData > OBrienKaiserLong

OBrienKaiserLong 데이터셋은 carData 패키지에 포함되어 있다. carData 패키지는 Rcmdr 패키지가 호출될 때 자동으로 함께 호출되기 때문에, OBrienKaiserLong 데이터셋을 R Commander에서 메뉴기능을 통해서 활성데이터셋으로 불러올 수 있다.

https://rcmdr.kr/37

2. Read data set from an attached package...

첨부된 패키지에서 데이터셋 읽기... Data > Data in packages > Read data set from an attached package... R에는 많은 예제 데이터셋이 있다. 대부분의 패키지들에 예제 데이터셋이 담겨 있다. R과 R Commande..

rcmdr.kr

통계> 요약 > 활성 데이터셋 메뉴를 통하여 OBrienKaiserLong 데이터셋의 요약정보를 확인할 수 있다.

summary() 함수를 이용한 것을 알 수 있다.

str() 함수를 활용하여 입력창에 직접 str(OBrienKaiserLong)을 입력하고 실행하여, 출력창에 다음과 같이 OBrienKaiserLong 데이터셋의 구조적 정보도 확인할 수 있다.

R Commander 화면에서 <데이터셋 보기> 버튼을 누르면 다음과 같은 내부 구성을 볼 수 있다:

OBrienKaiserLong {carData}

R Documentation

O'Brien and Kaiser's Repeated-Measures Data in "Long" Format

Description

Contrived repeated-measures data from O'Brien and Kaiser (1985). For details see OBrienKaiser, which is for the "wide" form of the same data.

Usage

OBrienKaiserLong

Format

A data frame with 240 observations on the following 6 variables.

treatment

a between-subjects factor with levels control, A, B.

gender

a between-subjects factor with levels F, M.

score

the numeric response variable.

id

the subject id number.

phase

a within-subjects factor with levels pre, post, fup.

hour

a within-subjects factor with levels 1, 2, 3, 4, 5.

Source

O'Brien, R. G., and Kaiser, M. K. (1985) MANOVA method for analyzing repeated measures designs: An extensive primer. Psychological Bulletin 97, 316–333, Table 7.

Examples

head(OBrienKaiserLong, 15) # first subject

[Package carData version 3.0-5 Index]

'Dataset_info > OBrienKaiserLong' 카테고리의 다른 글

OBrienKaiserLong 데이터셋 예제 (0)	2022.06.24

carData 패키지에 있는 OBrienKaiser 데이터셋이다. carData 패키지는 Rcmdr 패키지가 호출될 때 자동으로 함께 호출되기 때문에 R Commander에서 carData 패키지에 포함된 데이터셋들을 자유롭게 호출할 수 있다.

https://rcmdr.kr/37

2. Read data set from an attached package...

첨부된 패키지에서 데이터셋 읽기... Data > Data in packages > Read data set from an attached package... R에는 많은 예제 데이터셋이 있다. 대부분의 패키지들에 예제 데이터셋이 담겨 있다. R과 R Commande..

rcmdr.kr

OBrienKaiser 데이터셋은 R Commander에서 활성 데이터셋으로 이용할 수 있다. 그러나 '통계 > 요약 > 활성데이터셋' 기능은 사용할 수 없다. 다음과 같은 오류문을 Rgui 창에서 보게된다.

Error in sprintf(gettextRcmdr("There are %d variables in the data set %s.\nDo you want to proceed?"), :
'%d'는 유효하지 않은 포맷입니다; 문자형 객체들에는 포맷 %s를 사용해주세요

입력창에 str(OBrienKaiser) 함수를 입력하고 실행하여 OBrienKaiser 데이터셋의 구조를 살펴보자.

입력창에 summary(OBrienKaiser) 함수를 입력하고 실행하여 요약 정보를 살펴보자.

OBrienKaiser {carData}

R Documentation

O'Brien and Kaiser's Repeated-Measures Data

Description

These contrived repeated-measures data are taken from O'Brien and Kaiser (1985). The data are from an imaginary study in which 16 female and male subjects, who are divided into three treatments, are measured at a pretest, postest, and a follow-up session; during each session, they are measured at five occasions at intervals of one hour. The design, therefore, has two between-subject and two within-subject factors.

The contrasts for the treatment factor are set to -2, 1, 1 and 0, -1, 1. The contrasts for the gender factor are set to contr.sum.

Usage

OBrienKaiser

Format

A data frame with 16 observations on the following 17 variables.

treatment

a factor with levels control A B

gender

a factor with levels F M

pre.1

pretest, hour 1

pre.2

pretest, hour 2

pre.3

pretest, hour 3

pre.4

pretest, hour 4

pre.5

pretest, hour 5

post.1

posttest, hour 1

post.2

posttest, hour 2

post.3

posttest, hour 3

post.4

posttest, hour 4

post.5

posttest, hour 5

fup.1

follow-up, hour 1

fup.2

follow-up, hour 2

fup.3

follow-up, hour 3

fup.4

follow-up, hour 4

fup.5

follow-up, hour 5

Source

O'Brien, R. G., and Kaiser, M. K. (1985) MANOVA method for analyzing repeated measures designs: An extensive primer. Psychological Bulletin 97, 316–333, Table 7.

Examples

OBrienKaiser
contrasts(OBrienKaiser$treatment)
contrasts(OBrienKaiser$gender)

[Package carData version 3.0-4 Index]

'Dataset_info > OBrienKaiser' 카테고리의 다른 글

OBrienKaiser 데이터셋 예제 (0)	2022.06.24

그래프 > 색 팔레트...
Graphs > Color palette...

<색 팔레트...> 기능을 선택하면 8개의 주요 색깔이 등장하고 그 색의 16진수(hexadecimal)값과 이름이 표시된다.

다섯째 색인 cyan, 일곱째 색인 yellow를 각각 gold, orange로 바꿔보자.

gold의 16진수 값인 #fad800, orange의 16진수 값인 #ffa500을 기억하자. cyan의 16진수 값인 #00ffff, yellow의 16진수 값인 #ffff00을 기억하자. gold를 cyan으로, orange를 yellow로 다시 바꿔보자. Selection: #fad800을 cyan의 #00ffff로 바꾸고 실행(엔터키)을 한다.

yellow로 바꾸려면, 16진수 값 #ffff00을 입력하고 실행(엔터키)을 한다. 다음과 같이 바뀔 것이다:

?palette  # grDevices 패키지의 palette 도움말 보기

require(graphics)

palette()               # obtain the current palette
palette("R3");palette() # old default palette
palette("ggplot2")      # ggplot2-style palette
palette()

palette(hcl.colors(8, "viridis"))

(palette(gray(seq(0,.9,len = 25)))) # gray scales; print old palette
matplot(outer(1:100, 1:30), type = "l", lty = 1,lwd = 2, col = 1:30,
        main = "Gray Scales Palette",
        sub = "palette(gray(seq(0, .9, len=25)))")
palette("default")      # reset back to the default

## on a device where alpha transparency is supported,
##  use 'alpha = 0.3' transparency with the default palette :
mycols <- adjustcolor(palette(), alpha.f = 0.3)
opal <- palette(mycols)
x <- rnorm(1000); xy <- cbind(x, 3*x + rnorm(1000))
plot (xy, lwd = 2,
       main = "Alpha-Transparency Palette\n alpha = 0.3")
xy[,1] <- -xy[,1]
points(xy, col = 8, pch = 16, cex = 1.5)
palette("default")

## List available built-in palettes
palette.pals()

## Demonstrate the colors 1:8 in different palettes using a custom matplot()
sinplot <- function(main=NULL) {
    x <- outer(
	seq(-pi, pi, length.out = 50),
	seq(0, pi, length.out = 8),
	function(x, y) sin(x - y)
    )
    matplot(x, type = "l", lwd = 4, lty = 1, col = 1:8, ylab = "", main=main)
}
sinplot("default palette")

palette("R3");        sinplot("R3")
palette("Okabe-Ito"); sinplot("Okabe-Ito")
palette("Tableau")  ; sinplot("Tableau")
palette("default") # reset

## color swatches for palette.colors()
palette.swatch <- function(palette = palette.pals(), n = 8, nrow = 8,
                           border = "black", cex = 1, ...)
{
     cols <- sapply(palette, palette.colors, n = n, recycle = TRUE)
     ncol <- ncol(cols)
     nswatch <- min(ncol, nrow)
     op <- par(mar = rep(0.1, 4),
               mfrow = c(1, min(5, ceiling(ncol/nrow))),
     	       cex = cex, ...)
     on.exit(par(op))
     while (length(palette)) {
 	subset <- seq_len(min(nrow, ncol(cols)))
 	plot.new()
 	plot.window(c(0, n), c(0.25, nrow + 0.25))
 	y <- rev(subset)
 	text(0, y + 0.1, palette[subset], adj = c(0, 0))
 	y <- rep(y, each = n)
 	rect(rep(0:(n-1), n), y, rep(1:n, n), y - 0.5,
 	     col = cols[, subset], border = border)
 	palette <- palette[-subset]
 	cols    <- cols [, -subset, drop = FALSE]
     }
}

palette.swatch()

palette.swatch(n = 26) # show full "Alphabet"; recycle most others

활성 데이터셋

Active data set

R Commander의 메뉴 기반 사용법의 큰 특징은 활성 데이터셋에 관한 것이 될 것이다. 입력 창에 명령문을 입력하는 일반적인 방법과 달리 메뉴 기반 R Commander는 활성화된 데이터셋 하나만을 다룬다. (물론 데이터셋 병합하기는 두개 이상의 데이터셋을 필요로 한다)

아래 화면에 왼쪽에 R 아이콘이 있고, 그 옆에 '데이터셋: Prestige'이 보일 것이다. Prestige 라는 데이터셋이 활성화되어서 R Commander에서 사용할 준비가 되었다는 의미가 된다:

'Data > Active data set' 카테고리의 다른 글

15. Convert all character variables to factors (0)	2022.02.10
14. Reshape data set from wide to long format... (0)	2022.02.10
13. Reshape data set from long to wide format... (0)	2022.02.10
17. Export active data set... (0)	2019.09.08
16. Save active data set... (0)	2019.09.08

통계 > 분할표
Statistics > Contingency tables

분석대상인 데이터셋에 요인형 변수가 한개 있거나, 하나도 없는 경우 분할표 메뉴의 오른쪽에 있는 <이원표>, <다원표> 기능은 불활성 음영 표시로 나타난다. 두개 이상의 요인형 변수가 있는 경우, 예를 들어 car 패키지에 포함된 Moore 데이터셋이 활성 데이터셋이 되는 경우 불활성 음영 표시가 사라진다.

요인형 변수가 세개 이상 있는 경우, <다원표> 까지 활성화된다. 아래의 화면을 보면, partner.status, fcatetory 두개의 변수가 요인(factor)형이다. <이원표>는 활성화된 반면에, <다원표> 기능이 아직 활성화되지 않았다면, 요인형 변수가 두개 뿐인 데이터셋임을 간접적으로 알려준다.

'Statistics > Contigency tables' 카테고리의 다른 글

2. Multi-way tables... (0)	2022.06.28
3. Enter and analyze two-way table... (0)	2022.06.28
1. Two-way table... (0)	2022.02.14

통계 > 분할표 > 이원표...
Statistics > Contingency tables > Two-way table...

요인형 변수를 두개 이상 가지고 있는 데이터셋이 활성화되었다면, '통계 > 분할표 > 이원표...' 메뉴 기능을 이용할 수 있다.

두개 이상의 요인형 변수를 가지고 있는 Moore 데이터셋을 활성화시키면, <이원표>의 음영이 사라지고 사용할 수 있는 기능이 된다.

행 변수와 열 변수에 요인형 변수 하나씩을 선택한다.

데이터 창과 함께 통계 창이 있다. 통계 창을 선택하면 다음과 같은 화면에 다양한 선택 기능을 선택할 수 있다. 다른 선택으로 출력 내용의 변화를 주지 않을 경우, 데이터 창으로 돌아가서 예(OK) 버튼을 누른다.

다음과 같은 출력물을 볼 수 있다. 행 변수에 partner.status, 열 변수에 fcategory를 선택한 경우의 출력물이다.

행 변수에 fcategory, 열 변수에 partner.status를 선택한 경우의 출력물이다.

프롬프트의 입력 스크립트를 살펴보면, xtabs() 함수를 사용하는 것이 보인다.

?xtabs  # stats 패키지의 xtabs 도움말 보기

## 'esoph' has the frequencies of cases and controls for all levels of
## the variables 'agegp', 'alcgp', and 'tobgp'.
xtabs(cbind(ncases, ncontrols) ~ ., data = esoph)
## Output is not really helpful ... flat tables are better:
ftable(xtabs(cbind(ncases, ncontrols) ~ ., data = esoph))
## In particular if we have fewer factors ...
ftable(xtabs(cbind(ncases, ncontrols) ~ agegp, data = esoph))

## This is already a contingency table in array form.
DF <- as.data.frame(UCBAdmissions)
## Now 'DF' is a data frame with a grid of the factors and the counts
## in variable 'Freq'.
DF
## Nice for taking margins ...
xtabs(Freq ~ Gender + Admit, DF)
## And for testing independence ...
summary(xtabs(Freq ~ ., DF))

## with NA's
DN <- DF; DN[cbind(6:9, c(1:2,4,1))] <- NA
DN # 'Freq' is missing only for (Rejected, Female, B)
tools::assertError(# 'na.fail' should fail :
     xtabs(Freq ~ Gender + Admit, DN, na.action=na.fail), verbose=TRUE)
op <- options(na.action = "na.omit") # the "factory" default
(xtabs(Freq ~ Gender + Admit, DN) -> xtD)
noC <- function(O) `attr<-`(O, "call", NULL)
ident_noC <- function(x,y) identical(noC(x), noC(y))
stopifnot(exprs = {
  ident_noC(xtD, xtabs(Freq ~ Gender + Admit, DN, na.action = na.omit))
  ident_noC(xtD, xtabs(Freq ~ Gender + Admit, DN, na.action = NULL))
})

xtabs(Freq ~ Gender + Admit, DN, na.action = na.pass)
## The Female:Rejected combination has NA 'Freq' (and NA prints 'invisibly' as "")
(xtNA <- xtabs(Freq ~ Gender + Admit, DN, addNA = TRUE)) # ==> count NAs
## show NA's better via  na.print = ".." :
print(xtNA, na.print= "NA")


## Create a nice display for the warp break data.
warpbreaks$replicate <- rep_len(1:9, 54)
ftable(xtabs(breaks ~ wool + tension + replicate, data = warpbreaks))

### ---- Sparse Examples ----

if(require("Matrix")) withAutoprint({
 ## similar to "nlme"s  'ergoStool' :
 d.ergo <- data.frame(Type = paste0("T", rep(1:4, 9*4)),
                      Subj = gl(9, 4, 36*4))
 xtabs(~ Type + Subj, data = d.ergo) # 4 replicates each
 set.seed(15) # a subset of cases:
 xtabs(~ Type + Subj, data = d.ergo[sample(36, 10), ], sparse = TRUE)

 ## Hypothetical two-level setup:
 inner <- factor(sample(letters[1:25], 100, replace = TRUE))
 inout <- factor(sample(LETTERS[1:5], 25, replace = TRUE))
 fr <- data.frame(inner = inner, outer = inout[as.integer(inner)])
 xtabs(~ inner + outer, fr, sparse = TRUE)
})

'Statistics > Contigency tables' 카테고리의 다른 글

2. Multi-way tables... (0)	2022.06.28
3. Enter and analyze two-way table... (0)	2022.06.28
Contingency tables (0)	2022.02.14

통계 > 요약 > 정규성 검정...

Statistics > Summaries > Test of normality...

수치형(numeric, integer) 변수들 중에서 하나를 선택한다. 기본 설정에 Shapiro-Wilk의 정규성 검정법이 선택되어 있다. 수입(연봉)의 사례들이 정규 분포를 이루고 있는가를 확인하고자, 변수 income을 선택하고 예(OK) 버튼을 누른다.

normalityTest()를 사용한다.

normalityTest(~education, test="shapiro.test", data=Prestige)

?normalityTest  # RcmdrMisc 패키지의 normalityTest 도움말 보기

data(Prestige, package="car")
  with(Prestige, normalityTest(income))
  normalityTest(income ~ type, data=Prestige, test="ad.test")
  normalityTest(~income, data=Prestige, test="pearson.test", n.classes=5)

'Statistics > Summaries' 카테고리의 다른 글

9. Transform toward normality... (0)	2022.06.19
7. Correlation test... (0)	2022.02.13
6. Correlation matrix... (0)	2022.02.13
5. Table of Statistics... (0)	2022.02.13
4. Count missing observations (0)	2022.02.13

통계 > 요약 > 상관 검정...
Statistics > Summaries > Correlation test...

상관 검정은 두 변수를 구성하는 사례값들 사이에 어떤 방향의 관계성이 있는지를 통계학적으로 확인하고자 할 때 사용한다. 아래는Prestige 데이터셋에서 교육수준과 수입(연봉) 사이에 어떤 관계성이 있는지를 확인하고자 한다. education과 income 변수를 선택하고, 예(OK) 버튼을 누른다.

상관의 유형 중에서 Pearson product-moment (피어슨 적률상관), 대립 가설에는 양측이 기본으로 설정되어 있다. 이 설정을 바탕으로 상관 검증의 결과를 출력하면 아래와 같다:

cor.test() 함수를 활용한다.

?cor.test  # stats 패키지의 cor.test 도움말 보기

## Hollander & Wolfe (1973), p. 187f.
## Assessment of tuna quality.  We compare the Hunter L measure of
##  lightness to the averages of consumer panel scores (recoded as
##  integer values from 1 to 6 and averaged over 80 such values) in
##  9 lots of canned tuna.

x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y <- c( 2.6,  3.1,  2.5,  5.0,  3.6,  4.0,  5.2,  2.8,  3.8)

##  The alternative hypothesis of interest is that the
##  Hunter L value is positively associated with the panel score.

cor.test(x, y, method = "kendall", alternative = "greater")
## => p=0.05972

cor.test(x, y, method = "kendall", alternative = "greater",
         exact = FALSE) # using large sample approximation
## => p=0.04765

## Compare this to
cor.test(x, y, method = "spearm", alternative = "g")
cor.test(x, y,                    alternative = "g")

## Formula interface.
require(graphics)
pairs(USJudgeRatings)
cor.test(~ CONT + INTG, data = USJudgeRatings)

'Statistics > Summaries' 카테고리의 다른 글

9. Transform toward normality... (0)	2022.06.19
8. Test of normality... (0)	2022.02.13
6. Correlation matrix... (0)	2022.02.13
5. Table of Statistics... (0)	2022.02.13
4. Count missing observations (0)	2022.02.13

통계 > 요약 > 상관 행렬...
Statistics > Summaries > Correlation matrix...

상관 행렬은 두개 이상의 변수를 선택해야 한다. Prestige 데이터셋에서 교육수준과 연봉(수입)의 관계에 대한 관심에서 이 두 변수를 선택하고, 예(OK) 버튼을 누른다.

출력 창을 보면, cor() 함수가 사용되었음을 알 수 있다.

?rcorr.adjust  #RcmdrMisc 패키지의 rcorr.adjust 도움말 보기

if (require(car)){
    data(Mroz)
    print(rcorr.adjust(Mroz[,c("k5", "k618", "age", "lwg", "inc")]))
    print(rcorr.adjust(Mroz[,c("k5", "k618", "age", "lwg", "inc")], type="spearman"))
    }

'Statistics > Summaries' 카테고리의 다른 글

8. Test of normality... (0)	2022.02.13
7. Correlation test... (0)	2022.02.13
5. Table of Statistics... (0)	2022.02.13
4. Count missing observations (0)	2022.02.13
3. Frequency distributions... (0)	2022.02.13

Rcmdr.kr: An R Commander User in Korea