'분류 전체보기' 카테고리의 글 목록 (13 Page)

그래프 > 조각 도표...
Graphs > Strip chart...

carData 패키지에 포함된 Prestige 데이터셋을 활성화 시킨다. type 이라는 요인이 하나만 있다. <반응 변수 (하나 선택)>에서 income 변수를 선택해보자.

<선택기능> 창에서, <값 복제하기>에 있는 '움직임(떨림)'을 선택해보자. 그리고, <그림 이름표>에 있는 이름표와 제목에 내용 이해를 돕는 변수 이름, 설명, 제목 등을 입력한다.

stripchart(income ~ type, vertical=TRUE, method="jitter", 
	xlab="type (직업유형)", ylab="income (연소득)", 
    main="1971년 캐나다 직업유형별 연소득에 관한 조각 도표", data=Prestige)

아래 그래픽장치 창에 '조각 도표(Strip chart)'가 제작된다. 직업유형별로 연소득의 사례들이 크기별로 표시되어 있음을 알 수 있다.

?stripchart  # graphics 패키지의 stripchart 도움말 보기

x <- stats::rnorm(50)
xr <- round(x, 1)
stripchart(x) ; m <- mean(par("usr")[1:2])
text(m, 1.04, "stripchart(x, \"overplot\")")
stripchart(xr, method = "stack", add = TRUE, at = 1.2)
text(m, 1.35, "stripchart(round(x,1), \"stack\")")
stripchart(xr, method = "jitter", add = TRUE, at = 0.7)
text(m, 0.85, "stripchart(round(x,1), \"jitter\")")

stripchart(decrease ~ treatment,
    main = "stripchart(OrchardSprays)",
    vertical = TRUE, log = "y", data = OrchardSprays)

stripchart(decrease ~ treatment, at = c(1:8)^2,
    main = "stripchart(OrchardSprays)",
    vertical = TRUE, log = "y", data = OrchardSprays)

그래프 > 평균 그림...
Graphs > Plot of means...

carData 패키지에 있는 Prestige 데이터셋을 활성화시키고, 그래프 메뉴창에 <평균 그림...> 기능을 선택하면 아래와 같은 추가 선택 창이 등장한다. type 이라는 요인형 변수 하나가 Prestige 데이터셋에 있어 자동 선택되며, <반응 변수 (하나 선택)>에서 income (수입, 연소득)을 선택해보자.

<선택기능> 창에서 <그림 이름표>에 내용적 이해를 돕기 위해 변수이름, 그래프 제목에 설명을 입력하자.

with(Prestige, plotMeans(income, type, error.bars="se", xlab="type (직업유형)", 
  ylab="income (연소득)", main="1971년  캐나다 직업유형별 연소득 평균그림", connect=TRUE))

Prestige 데이터셋의 요인형 변수 type에는 세개의 수준이 있으며, bc, prof, wc 등이다. 아래 그래픽장치 창에는 bc, prof, wc 직업 유형에 포함된 직업들의 평균 소득을 알리는 그래프가 출력된다.

income 변수 대신 prestige 변수를 선택해보자. 해당 직업에 대한 사회적 권위의 크기를 표시하는 prestige 변수의 평균 그림을 직업유형 bc, prof, wc 별로 제작해 보자.

with(Prestige, plotMeans(income, type, error.bars="se", 
	xlab="type (직업유형)", ylab="prestige (직업권위)", 
    main="1971년  캐나다 직업유형별  직업권위 평균그림", connect=TRUE))

with(Prestige, plotMeans(prestige, type, error.bars="se", xlab="type (직업유형)", 
  ylab="prestige (직업권위)", main="1971년 캐나다 직업유형별 직업권위 평균그림", connect=TRUE))
with(Prestige, plotMeans(prestige, type, error.bars="se", xlab="type (직업유형)", 
  ylab="prestige (직업권위)", main="1971년 캐나다 직업유형별 직업권위 평균그림", connect=FALSE))

명령문의 함수 내부를 살펴보자. 맨 마지막의 'connect=TRUE/FALSE' 가 다를 것이다. 아래 메뉴 창의 맨 아래에서 <평균 프로파일 연결하기>에 표시를 제거해보자.

?plotMeans  # RcmdrMisc 패키지의 plotMeans 도움말 보기

if (require(car)){
    data(Moore)
    with(Moore, plotMeans(conformity, fcategory, partner.status, ylim=c(0, 25)))
}

그래프 > XY 조건 그림...
Graphs > XY conditioning plot...

carData 패키지의 Prestige 데이터셋을 활성화시키자. 연소득과 직업의 사회적귄위에 대한 이해를 확대하고자 income, prestige 변수의 연관성에 대하여 시각적으로 점검한다고 하자. bc, prof, wc라는 수준을 가진 요인형 변수 type을 집단화시켜 시각화에 포함시키자.

<선택기능> 창에 있는 많은 선택 기능은 기본설정으로 놓고 오른쪽의 <그림 이름표>에 그래프의 내용적 이해를 높이고자 관련 사항을 추가적으로 입력하자.

xyplot(prestige ~ income, groups=type, type="p", pch=16, 
  auto.key=list(border=TRUE), par.settings=simpleTheme(pch=16), 
  scales=list(x=list(relation='same'), y=list(relation='same')), data=Prestige, 
  xlab="income (연소득)", ylab="prestige (직업의권위)", main="연소득에 따른 직업의 사회적 권위인식")

그래픽장치 창에 아래와 같은 그래프가 출력된다. 직업유형을 뜻하는 type 변수의 수준인 bc, prof, wc 수준의 범례가 보인다. 그리고 그 색깔별로 점들이 찍혀 있어, 추가적인 이해를 제공한다.

아래 그림은 직업유형 변수인 type을 "Groups 'groups='에서 해제하고, Conditions'|'에 선택한다.

<선택기능>창의 오른쪽에 있는 <그림 이름표>에 내용적인 이해를 높이는 이름표과 제목을 넣자.

xyplot(prestige ~ income | type, type="p", pch=16, auto.key=list(border=TRUE),
   par.settings=simpleTheme(pch=16), scales=list(x=list(relation='same'), 
  y=list(relation='same')), data=Prestige, xlab="income (연소득)", ylab="prestige 
  (직업의권위)", main="연소득에 따른 직업의 사회적 권위의식")

아래에 있는 그래픽장치 창은 위에 있는 그래픽장치 창과 달리 직업유형별(bc, prof, wc)별로 산점도가 각각 제작된다.

xyplot() 함수는 시계열적 수치형 변수와 관련해서는 lineplot()과 유사하게 그래프를 출력할 수 있다. carData 패키지의 Bfox의 사례를 수치형 time 변수로 변환시키고 그래프를 만들어보자.

<선택기능> 창에 있는 <그림 유형(하나 또는 둘 모두)>에 점/줄(선) 모두 선택해보자. 물론 <그림 이름표>에 내용을 추가할 수도 있다.

xyplot(menwage + womwage ~ time, type=c("p", "l"), pch=16, auto.key=list(border=TRUE), 
	par.settings=simpleTheme(pch=16), scales=list(x=list(relation='same'), 
	y=list(relation='same')), data=Bfox)

시간의 흐름에 따른 수치형 변수들의 변화 흐름을 파악할 수 있다. 주의해야할 점은 두개 이상의 수치형 변수를 그래프에 모두 넣을 경우, 각 변수들의 사례 기준(크기, 비율 등)이 동일해야 시각화가 특징을 잡아내는데 효과적이다.

?xyplot  # lattice 패키지의 xyplot 도움말 보기

require(stats)

## Tonga Trench Earthquakes

Depth <- equal.count(quakes$depth, number=8, overlap=.1)
xyplot(lat ~ long | Depth, data = quakes)
update(trellis.last.object(),
       strip = strip.custom(strip.names = TRUE, strip.levels = TRUE),
       par.strip.text = list(cex = 0.75),
       aspect = "iso")

## Examples with data from `Visualizing Data' (Cleveland, 1993) obtained
## from http://cm.bell-labs.com/cm/ms/departments/sia/wsc/

EE <- equal.count(ethanol$E, number=9, overlap=1/4)

## Constructing panel functions on the fly; prepanel
xyplot(NOx ~ C | EE, data = ethanol,
       prepanel = function(x, y) prepanel.loess(x, y, span = 1),
       xlab = "Compression Ratio", ylab = "NOx (micrograms/J)",
       panel = function(x, y) {
           panel.grid(h = -1, v = 2)
           panel.xyplot(x, y)
           panel.loess(x, y, span=1)
       },
       aspect = "xy")

## Extended formula interface 

xyplot(Sepal.Length + Sepal.Width ~ Petal.Length + Petal.Width | Species,
       data = iris, scales = "free", layout = c(2, 2),
       auto.key = list(x = .6, y = .7, corner = c(0, 0)))


## user defined panel functions

states <- data.frame(state.x77,
                     state.name = dimnames(state.x77)[[1]],
                     state.region = state.region)
xyplot(Murder ~ Population | state.region, data = states,
       groups = state.name,
       panel = function(x, y, subscripts, groups) {
           ltext(x = x, y = y, labels = groups[subscripts], cex=1,
                 fontfamily = "HersheySans")
       })

## Stacked bar chart

barchart(yield ~ variety | site, data = barley,
         groups = year, layout = c(1,6), stack = TRUE,
         auto.key = list(space = "right"),
         ylab = "Barley Yield (bushels/acre)",
         scales = list(x = list(rot = 45)))

bwplot(voice.part ~ height, data=singer, xlab="Height (inches)")

dotplot(variety ~ yield | year * site, data=barley)

## Grouped dot plot showing anomaly at Morris

dotplot(variety ~ yield | site, data = barley, groups = year,
        key = simpleKey(levels(barley$year), space = "right"),
        xlab = "Barley Yield (bushels/acre) ",
        aspect=0.5, layout = c(1,6), ylab=NULL)

stripplot(voice.part ~ jitter(height), data = singer, aspect = 1,
          jitter.data = TRUE, xlab = "Height (inches)")

## Interaction Plot

xyplot(decrease ~ treatment, OrchardSprays, groups = rowpos,
       type = "a",
       auto.key =
       list(space = "right", points = FALSE, lines = TRUE))

## longer version with no x-ticks

## Not run: 
bwplot(decrease ~ treatment, OrchardSprays, groups = rowpos,
       panel = "panel.superpose",
       panel.groups = "panel.linejoin",
       xlab = "treatment",
       key = list(lines = Rows(trellis.par.get("superpose.line"),
                  c(1:7, 1)),
                  text = list(lab = as.character(unique(OrchardSprays$rowpos))),
                  columns = 4, title = "Row position"))

## End(Not run)

carData > Bfox

data(Bfox, package="carData")

Bfox 데이터셋이 활성화되었다면, 도움말 기능을 통하여 데이터셋의 정보를 확인할 수 있다.

Bfox {carData}

R Documentation

Canadian Women's Labour-Force Participation

Description

The Bfox data frame has 30 rows and 7 columns. Time-series data on Canadian women's labor-force participation, 1946–1975.

Usage

Bfox

Format

This data frame contains the following columns:

partic

Percent of adult women in the workforce.

tfr

Total fertility rate: expected births to a cohort of 1000 women at current age-specific fertility rates.

menwage

Men's average weekly wages, in constant 1935 dollars and adjusted for current tax rates.

womwage

Women's average weekly wages.

debt

Per-capita consumer debt, in constant dollars.

parttime

Percent of the active workforce working 34 hours per week or less.

Warning

The value of tfr for 1973 is misrecorded as 2931; it should be 1931.

Source

Fox, B. (1980) Women's Domestic Labour and their Involvement in Wage Work. Unpublished doctoral dissertation, p. 449.

References

Fox, J. (2016) Applied Regression Analysis and Generalized Linear Models, Third Edition. Sage.

[Package carData version 3.0-5 Index]

그래프 > 선 그래프...
Graphs > Line graph...

선 그래프(Line graph/ lineplot)는 주로 시계열적인 흐름을 가진 수치형 변수의 변화를 점검할 때 사용한다.

1. carData 패키지에 있는 Bfox 데이터셋을 활성화시키자. Bfox 데이터셋이 활성화된 후, R Commander 화면에서 <데이터셋보기> 버튼을 눌러보자. 행의 이름이 연속형 숫자인 연도로 되어있다.
2. 연도형 행 이름을 time 이라는 변수로 전환시켜보자. 아래 <새로운 변수 계산하기> 창의 <계산 표현식>을 참고하라.

data(Bfox, package="carData")
Bfox$time <- with(Bfox, as.numeric(rownames(Bfox)))

아래와 같이 데이터셋 내부 구성이 보일 것이다.

Bfox 데이터셋에서 새롭게 만든 변수 time을 <x 변수 (하나 선택)>에서 선택하고, <y 변수 (하나 이상 선택)>에 주별(weekly) 남성급여, 여성급여를 뜻하는 menwage, womwage 두 변수를 선택한다. y 변수에 두개의 변수를 선택했다는 것은 두개의 선그래프가 time 변수의 흐름에 따라 만들겠다는 의미이기도 하다.

with(Bfox, lineplot(time, menwage, womwage))

그래픽장치 창에 다음과 같은 선 그래프가 출력된다. 1946년부터 1975년까지의 남성과 여성의 주급(weekly wage)의 변화를 보게된다.

또 다른 선 그래프를 만들어보자. partic은 노동인구내의 여성비율, parttime은 주당 34시간 이내의 시간제노동 비율이다. 이 두 변수의 연도별 변화 추이를 살펴보기 위하여 선 그래프를 만든다면 아래와 같을 것이다:

선 그래프를 만드는데 가장 기본적인 출발점은 x 변수에 시간적 흐름을 갖는 변수를 선택하는 것이다. 만약 시계열적 수치형 변수가 선택되지 않는다면, 선그래프는 시사점을 가지 못하는 불규칙한 그림을 갖게 될 것이다. 예를 들어, carData에 있는 Prestige 데이터셋에는 시계열적 정보를 갖는 수치형 변수가 없다. prestige 변수를 x 변수에 놓고, education과 income을 y 변수에 놓고, 그래프를 그려보자.

다음과 같은 경고문을 만나게된다. x 변수인 prestige 변수의 사례배열에 순서가 없다는 지시문을 보게된다.

?lineplot  # RcmdrMisc 패키지의 lineplot 도움말 보기

 if (require("car")){
    data(Bfox)
    Bfox$time <- as.numeric(rownames(Bfox))
    with(Bfox, lineplot(time, menwage, womwage))
}

그래프 > 산점도 행렬...
Graphs > Scatterplot matrix...

산점도(Scatterplot)는 두개의 수치형 변수 사이의 관계성을 시각적으로 보면서 통찰력을 얻고자 하는 기법이다. 그런데 세개 이상의 수치형 변수들을 함께 점검하면서 관계성을 시각화하고자 할 때, 산점도의 이용은 다소 불편함이 생긴다. 이 때 산점도 행렬(Scatterplot matrix)을 사용한다.
Prestige 데이터셋에서 네개의 수치형 변수를 점검한다고 하자. 교육연수, 연소득, 직업권위의식, 여성참여율 등과 관련된 변수들 네개를 선택하자.

<선택기능> 창에서 <대각선에서>는 <밀도그림>을 선택하고, <다른 선택 기능>에서는 <최소-제곱 선>, <평활선>을 선택하고, 그래프 제목을 추가한다.

scatterplotMatrix(~education+income+prestige+women, regLine=TRUE, 
  smooth=list(span=0.5, spread=FALSE), diagonal=list(method="density"), 
  data=Prestige, main="Prestige 데이터셋 주요변수들의 산점도 행렬")

아래와 같은 그래픽장치 창에 산점도 행렬이 등장한다. education, income, prestige, women 이라는 네개의 변수 각각의 밀도함수가 변수 이름과 함께 작성되어 있으며, 행렬 매 칸마다 두개의 변수 사이의 산점도가 제공된다. 행렬 칸 바깥에 있는 숫자들은 수치형 변수들의 사례 값 범위를 요약해서 보여준다.

산점도행렬에는 많은 정보가 담겨있다. 확인하고자 하는 정보만을 부각시키고자 산점도행렬을 단순화시키는 노력이 요구된다.

?scatterplotMatrix  #  car 패키지의 scatterplotMatrix 도움말 보기

scatterplotMatrix(~ income + education + prestige | type, data=Duncan)
scatterplotMatrix(~ income + education + prestige | type, data=Duncan,
    regLine=FALSE, smooth=list(span=1))
scatterplotMatrix(~ income + education + prestige,
    data=Duncan, id=TRUE, smooth=list(method=gamLine))

그래프 > 산점도...
Graphs > Scatterplot...

산점도(Scatterplot)은 두개의 수치형 변수 사이의 수리적 연관성에 관한 시각화 기법이다. 아래의 화면에서 각 하나씩을 x-변수와 y-변수에 선택해야 한다. Prestige 데이터셋에 있는 education (교육연수), income (수입, 연소득)을 각각 선택해보자.

<선택기능>창에 여러가지 추가 기능과 선택사양들이 있다. 먼저 <그림 선택기능> 중에 <최소-제곱 선>, <평활선>을 선택해보자. 그리고 <그림 이름표와 점 정보>에 변수와 그래프를 이해하는 데 도움을 주는 내용을 입력한다. 그리고 <Point(점) 크기>, <축 텍스트 크기>, <축-이름표 텍스트 크기> 등의 크기를 조금씩 변경할 수 있다.

scatterplot(income~education, regLine=TRUE, smooth=list(span=0.5, 
  spread=FALSE), boxplots=FALSE, xlab="education (교육연수)", 
  ylab="income (수입, 연소득)", main="교육연수와 연소득의 관계", cex.axis=1.5, cex.lab=1.5, 
  data=Prestige)

새로운 그래픽장치 창에 산점도가 출력된다. <교육연수와 연소득의 관계>를 시각적으로 살펴보고자 한 목적으로 점들의 분포와 추가된 최소제곱선, 평활선 등을 점검한다. 교육연수와 연소득의 관계의 방향, 크기 및 경향성 등에 대한 통찰력을 키울 수 있다.

한편, 산점도에 요인형 변수의 수준별로 나누어 시각화를 할 수 있다. Prestige 데이터셋에는 type 이라는 요인형 변수가 있는데, 직업유형에 따른 <교육연수와 연소득의 관계>를 보다 미시적으로 살펴볼 수 있다. 그리고 x-축, y-축 이름 옆에 <한계적인 상자그림>을 추가하여 각 변수들의 수치적 특징을 추가할 수 있다.

scatterplot(income~education | type, regLine=TRUE, smooth=list(span=0.5, 
  spread=FALSE), boxplots='xy', xlab="education (교육연수)", ylab="income (수입, 
  연소득)", main="교육연수와 연소득의 관계", cex.axis=1.5, cex.lab=1.5, by.groups=TRUE, 
  data=Prestige)

?scatterplot  # car 패키지의 scatterplot 도움말 보기

scatterplot(prestige ~ income, data=Prestige, ellipse=TRUE)

scatterplot(prestige ~ income, data=Prestige, smooth=list(smoother=quantregLine))

# use quantile regression for median and quartile fits
scatterplot(prestige ~ income | type, data=Prestige,
            smooth=list(smoother=quantregLine, var=TRUE, span=1, lwd=4, lwd.var=2))

scatterplot(prestige ~ income | type, data=Prestige, legend=list(coords="topleft"))

scatterplot(vocabulary ~ education, jitter=list(x=1, y=1),
            data=Vocab, smooth=FALSE, lwd=3)

scatterplot(infantMortality ~ ppgdp, log="xy", data=UN, id=list(n=5))

scatterplot(income ~ type, data=Prestige)

## Not run: 
    # remember to exit from point-identification mode
    scatterplot(infantMortality ~ ppgdp, id=list(method="identify"), data=UN)

## End(Not run)

그래프 > 대칭 상자그림...
Graphs > Symmetry boxplot...

Prestige 데이터셋에 있는 income 변수를 선택해보자.

?symbox  # car 패키지의 symbox 도움말 보기

symbox(~ income, data=Prestige, trans=bcPower, powers=c(-1,-0.5,0,0.5,1))

그래픽장치 창에 아래와 같이 시각화된다:

그래프 > 분위수-비교 그림...
Graphs > Quantile-comparison plot...
분위수-비교 그림은 수치형 변수 사례 값의 분포적 경향성을 확인하는데 사용된다. 또한 변수들의 관계에 대한 수리적 계산 과정에서 발생하는 잔차(Residuals)들의 분포적 특징을 확인하는데 활용할 수 있다.

Prestige 데이터셋에 있는 수치형 변수중에 income 변수를 선택해보자.

<선택기능> 메뉴 창을 열면, income 변수의 특징을 맞춰 볼 분포들을 선택하는 기능이 있다. 기본 설정으로 되어있는 정규분포를 많이 사용한다. 오른쪽에 있는 <그림 이름표>에 출력될 그래프의 정보를 담는 내용을 입력할 수 있다.

정규분포적 특징 안에서 income 변수의 사례적 흐름을 시각적으로 확인할 수 있다. 오른쪽 상단에 두 개의 이상치가 있는데, general.managers, physicians 직업이라고 알려준다.

with(Prestige, qqPlot(income, dist="norm", id=list(method="y", n=2, 
	labels=rownames(Prestige)), ylab="income (수입)", 
    main="income 변수의 분위수-비교 그림"))

아래 그래프는 income 변수 대신 education 변수를 선택해서 정규분포적 특징 안에서 살펴본 것이다. income 변수에 비해서 이상치들이 적게 보인다.

아래 그래프는 Prestige 데이터셋에 포함된 prestige 변수의 분위수-비교 그림의 시각화 결과이다. 정규분포적 특징 안에서 prestige 변수의 사례들이 분포되어 있음을 알 수 있다.

참고로 income 변수에 log 계산을 한 후 분위수-비교 그림을 그려보자. 원래의 income 변수보다 정규분포적 특징이 강화된다.

?qqPlot  # car 패키지의 qqPlot 도움말 보기

x<-rchisq(100, df=2)
qqPlot(x)
qqPlot(x, dist="chisq", df=2)

qqPlot(~ income, data=Prestige, subset = type == "prof")
qqPlot(income ~ type, data=Prestige, layout=c(1, 3))

qqPlot(lm(prestige ~ income + education + type, data=Duncan),
	envelope=.99)

https://rcmdr.kr/m/205

2. Residual quantile-comparison plot...

모델 > 그래프 > 잔차 분위수-비교 그림... Models > Graphs > Residual quantile-comparison plot... 데이터셋을 활성화시키고, 분석 모형을 만들었다면, '모델 > 그래프 > 잔차 분위수-비교 그림...' 메뉴 기능..

rcmdr.kr

그래프 > 상자그림...
Graphs > Boxplot...

Prestige 데이터셋의 수치형 변수 income을 요인형변수 type의 수준별로 상자그림을 만들고 type별로 비교해보자.

<데이터> 옆에 있는 <선택기능>으로 넘어가서 살펴보자. 기본설정으로 되어있는 <마우스 이상치 식별하기>의 '자동적으로'를 놔두고, <그림 이름표>에 한글을 입력하면서 의미적 해석이 가능한 이름과 제목을 넣는다.

그래픽장치 창에 상자그림이 등장한다. 직업유형의 수준인 bc, prof, wc (블라칼라, 전문직, 화이트칼라) 별로 상자그림이 시각화되어 최소값, 중앙값(median), 최대값과 25%, 75% 수준의 값과 이상치 등이 표시된다.

아래와 같은 명령문이 사용된다:

Boxplot(income~type, data=Prestige, id=list(method="y"), xlab="직업유형(type)", 
  ylab="수입(연봉, income)", main="1971년 캐나다 직업유형별 수입")

?Boxplot  # car 패키지의 Boxplot 도움말 보기

Boxplot(~income, data=Prestige, id=list(n=Inf)) # identify all outliers
Boxplot(income ~ type, data=Prestige)
Boxplot(income ~ type, data=Prestige, at=c(1, 3, 2))
Boxplot(k5 + k618 ~ lfp*wc, data=Mroz)
with(Prestige, Boxplot(income, id=list(labels=rownames(Prestige))))
with(Prestige, Boxplot(income, type, id=list(labels=rownames(Prestige))))
Boxplot(scale(Prestige[, 1:4]))

Rcmdr.kr: An R Commander User in Korea