Rcmdr.kr: An R Commander User in Korea

전체 글

swiss 데이터셋 2022.06.13
2. Compare two models... 2022.06.11
4. Confirmatory factor analysis... 2022.04.30

swiss 데이터셋

modernity4Rcmdr 2022. 6. 13. 09:12

2022. 6. 13. 09:12

datasets::swiss()

data(swiss, package="datasets") # swiss 데이터셋 불러오기
summary(swiss)                  # swiss 데이터셋 요약정보보기
str(swiss)                      # swiss 데이터셋 구조살펴보기

데이터셋의 내부는 다음과 같다:

swiss {datasets}

R Documentation

Swiss Fertility and Socioeconomic Indicators (1888) Data

Description

Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.

Usage

swiss

Format

A data frame with 47 observations on 6 variables, each of which is in percent, i.e., in [0, 100].

[,1]	Fertility	Ig, ‘common standardized fertility measure’
[,2]	Agriculture	% of males involved in agriculture as occupation
[,3]	Examination	% draftees receiving highest mark on army examination
[,4]	Education	% education beyond primary school for draftees.
[,5]	Catholic	% ‘catholic’ (as opposed to ‘protestant’).
[,6]	Infant.Mortality	live births who live less than 1 year.

All variables but ‘Fertility’ give proportions of the population.

Details

(paraphrasing Mosteller and Tukey):

Switzerland, in 1888, was entering a period known as the demographic transition; i.e., its fertility was beginning to fall from the high level typical of underdeveloped countries.

The data collected are for 47 French-speaking “provinces” at about 1888.

Here, all variables are scaled to [0, 100], where in the original, all but "Catholic" were scaled to [0, 1].

Note

Files for all 182 districts in 1888 and other years have been available at https://opr.princeton.edu/archive/pefp/switz.aspx.

They state that variables Examination and Education are averages for 1887, 1888 and 1889.

Source

Project “16P5”, pages 549–551 in

Mosteller, F. and Tukey, J. W. (1977) Data Analysis and Regression: A Second Course in Statistics. Addison-Wesley, Reading Mass.

indicating their source as “Data used by permission of Franice van de Walle. Office of Population Research, Princeton University, 1976. Unpublished data assembled under NICHD contract number No 1-HD-O-2077.”

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Examples

require(stats); require(graphics)
pairs(swiss, panel = panel.smooth, main = "swiss data",
      col = 3 + (swiss$Catholic > 50))
summary(lm(Fertility ~ . , data = swiss))

[Package datasets version 4.0.4 Index]

'Dataset_info > swiss' 카테고리의 다른 글

swiss 데이터셋 예제 (0)	2022.06.25

2. Compare two models...

modernity4Rcmdr 2022. 6. 11. 10:58

2022. 6. 11. 10:58

모델 > 가설 검정 > 두 모델 비교하기...

Models > Hypothesis test > Compare two models...

하나의 데이터셋을 대상으로 가장 최적의 분석모형을 찾고자 할 때, 또는 보다 정교한 설명을 위하여 만들어진 모형들을 비교하고자 할 때 사용하는 기능이다.

예를 들어, carData에 포함된 Prestige 데이터셋을 이용하여 연습해보자. 직업의 사회적 권위(prestige)에 영향을 미치는 두 개의 독립변수(설명변수)를 교육기간(education)과 수입(income)이라고 가정하자. 그런데 education과 income의 선형적 관계에 대한 보다 깊은 고민을 한다고 생각해보자. education과 income이 서로 독립적인 선형관계로 prestige에 영향을 줄 수도 있고, 또 education과 income이 독립적인 영향을 줄 뿐 만 아니라, 서로 상호작용을 일으키면서 prestige에 영향을 추가 할 수 도 있다고 주장할 수 있다. 이러한 문제의식에서 아래와 같은 두개의 모형을 만들고 또 이 두개의 모형 중에서 어느것이 더 정교한지를 찾는다고 생각해보자.

참고로 연산자 +는 설명변수들의 독립적 선형관계를, *는 독립적 선형관계와 결합적 선형관계를 함께 계산하는데 사용한다.

data(Prestige)   #Prestige 데이터셋 불러오기
LinearModel.1 <- lm(prestige ~ education + income, data=Prestige #변수들의 독립영향 점검
summary(LinearModel.1)
LinearModel.2 <- lm(prestige ~ education*income, data=Prestige)  #변수들의 독립영향 + 결합영향 점검
summary(LinearModel.2)
anova(LinearModel.1, LinearModel.2 #LinearModel.1과 LinearModel.2를 비교하기

LinearModel.1과 LinearModel.2라는 두 개의 모형을 만들고 두 개의 모델을 비교하는 방법이다. 모델 > 가설 검정 > 두 모델 비교하기...의 메뉴를 선택하면, 만들어 놓은 두 개의 모형을 비교하는 기능을 이용할 수 있다. 직관적으로 두개의 모형을 차례로 선택해보자. 그리고 예(OK) 버튼을 누른다.

R Commander 출력창에 다음과 같은 결과가 출력될 것이다. 출력 내용은 모델 1과 모델 2의 차이가 유의미하며 (Pr(>F)), 모델 2가 보다 설명력이 높다(Sum of sq > 0 또는 RSS < 0)는 뜻으로 해석할 수 있다.

'Models > Hypothesis test' 카테고리의 다른 글

1. ANOVA table... (0)	2022.03.09

4. Confirmatory factor analysis...

modernity4Rcmdr 2022. 4. 30. 21:44

2022. 4. 30. 21:44

통계> 차원 분석 > 확인적 요인 분석...
Statistics > Dimensional analysis > Confirmatory factor analysis...

R Commander를 설치하는 과정에서 의존패키지인 sem이 함께 설치된다. 안내가 나오면서 추가 설치를 하겠는가 물어보기도 한다. 위의 화면처럼, sem 패키지에 포함된 함수를 사용하는 <확인적 요인 분석...> 기능은 처음에는 비활성화되어 있다.

만약 Rcmdr 패키지가 호출될 때, sem 패키지가 자동으로 호출된다면, '데이터 > 패키지에 있는 데이터 > 첨부된 패키지에서 데이터셋 읽기...'에 sem 패키지가 carData, sandwich 처럼 메뉴창안의 패키지 목록에 포함되어 있어야 할 것이다. 하지만, 위의 화면에선 보이지 않는다. sem 패키지를 추가로 호출해주어야 한다는 뜻이다.

library(sem) #설치한 sem 패키지를 호출하기

sem 패키지가 호출되면, '데이터 > 패키지에 있는 데이터 > 첨부된 패키지에서 데이터셋 읽기...'기능에서 sem 패키지의 데이터셋을 선택할 수 있게 된다. 아래 화면을 살펴보라.

HS.data를 선택하자. HS.data 데이터셋이 활성화되면, 처음에 미활성화된 <확인적 요인 분석...> 메뉴가 활성화된다. <확인적 요인 분석...> 메뉴를 선택하면, 두개의 하위 창을 보게된다. 변수들을 선택하여 요인으로 묶는 <데이터> 창과, 연산을 통하여 획득하고자하는 통계지수(index) 목록의 <선택기능> 창이다.

local({
  .model <- c('spatial: cubes, flags, paper, visual', 'verbal: general, paragrap, sentence, wordc, wordm',
   'memory: figurer, figurew, numberf, numberr, object, wordr', 
  'math: arithmet, deduct, numeric, problemr, series')
  .model <- cfa(file=textConnection(.model), reference.indicators=FALSE)
  .Data <- HS.data[, c('cubes', 'flags', 'paper', 'visual', 'general', 'paragrap', 'sentence', 'wordc', 
  'wordm', 'figurer', 'figurew', 'numberf', 'numberr', 'object', 'wordr', 'arithmet', 'deduct', 'numeric', 
  'problemr', 'series')]
  summary(sem(.model, data=.Data), robust=FALSE, fit.indices=c("AIC","BIC"))
})

위에 보이는 스크립트는 무엇을 나타내는가? 설명을 추가한다.
1. 선택된 HS.data는 여러개의 변수들을 포함하고 있다.
2. 최초의 연구목적에서 개념적으로 구성된 몇 몇 요인들이 있고, 이 요인들을 세부적으로 구성하는 것이 하위 변수들이다.
3. 변수들 몇 개씩을 묶어서 요인으로 재구성하는것, 연구적 의도에서 보면, 요인을 구성한다고 가정하여 세분화된 변수들의 사례적 값들이 실제로 요인을 구성하는지를 확인하는 작업이 <확인적 요인 분석>이다.
4. HS.data에 포함된 여러개의 변수들을 spatial, verbal, memory, math라는 네개의 개념화된 요인으로 변수들을 재그룹화 한것이다.
'spatial: cubes, flags, paper, visual'
'verbal: general, paragrap, sentence, wordc, wordm'
'memory: figurer, figurew, numberf, numberr, object, wordr'
'math: arithmet, deduct, numeric, problemr, series'
5. spatial, verbal, memory, math 라는 요인의 이름은 최초의 설계에서 등장하는 개념적 요인을 뜻하는 것이다. 편의상으로 factor.1, factor.2, factor.3, factor.4 등으로 이름을 붙여도 무방하다.
6. fit.indices라는 옵션에 AIC, BIC 두개의 통계지수가 포함되어 있는데, 이것은 <선택기능> 창의 기본설정이며, 원하는 지수를 추가로 선택할 수 있다. CFI, RMSEA 등이 선택될 수 있다.

Model Chisquare = 288.2654 Df = 164 Pr(>Chisq) = 0.000000007093121
AIC = 380.2654
BIC = -647.7007

Normalized Residuals
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.9135038 -0.7194881 0.0000003 -0.0040765 0.6636815 3.0180220

R-square for Endogenous Variables
cubes flags paper visual general paragrap sentence wordc wordm figurer figurew numberf
0.2226 0.3941 0.2223 0.5230 0.7003 0.6720 0.7473 0.5482 0.7279 0.4048 0.2281 0.2666
numberr object wordr arithmet deduct numeric problemr series
0.2637 0.2620 0.3410 0.3701 0.3716 0.3703 0.4514 0.5677

Parameter Estimates
Estimate Std Error z value Pr(>|z|)
lam[cubes:spatial] 2.2223150 0.29346740 7.572613 3.657918e-14 cubes <--- spatial
lam[flags:spatial] 5.6800079 0.54187523 10.482132 1.043637e-25 flags <--- spatial
lam[paper:spatial] 1.3343163 0.17637296 7.565311 3.869370e-14 paper <--- spatial
lam[visual:spatial] 5.0654142 0.41217065 12.289604 1.030090e-34 visual <--- spatial
lam[general:verbal] 10.3704348 0.59270550 17.496775 1.516142e-68 general <--- verbal
lam[paragrap:verbal] 2.8629032 0.16891656 16.948624 1.970114e-64 paragrap <--- verbal
lam[sentence:verbal] 4.4622524 0.24224226 18.420619 8.977035e-76 sentence <--- verbal
lam[wordc:verbal] 4.2021424 0.28775953 14.602965 2.688910e-48 wordc <--- verbal
lam[wordm:verbal] 6.5431707 0.36275031 18.037671 9.861300e-73 wordm <--- verbal
lam[figurer:memory] 4.8631276 0.45983127 10.575896 3.854762e-26 figurer <--- memory
lam[figurew:memory] 1.9563212 0.25677249 7.618889 2.558674e-14 figurew <--- memory
lam[numberf:memory] 2.3250383 0.27958224 8.316116 9.089292e-17 numberf <--- memory
lam[numberr:memory] 3.9599443 0.47913815 8.264723 1.400191e-16 numberr <--- memory
lam[object:memory] 2.5141011 0.30532432 8.234199 1.807636e-16 object <--- memory
lam[wordr:memory] 6.7199425 0.70198380 9.572789 1.040602e-21 wordr <--- memory
lam[arithmet:math] 2.9160802 0.26684706 10.927908 8.478019e-28 arithmet <--- math
lam[deduct:math] 11.5046085 1.05018494 10.954840 6.298925e-28 deduct <--- math
lam[numeric:math] 2.8136309 0.25736242 10.932563 8.054079e-28 numeric <--- math
lam[problemr:math] 6.2092835 0.50114800 12.390119 2.956038e-35 problemr <--- math
lam[series:math] 6.8583564 0.47520864 14.432306 3.240628e-47 series <--- math
C[spatial,verbal] 0.4489025 0.06117871 7.337560 2.175225e-13 verbal <--> spatial
C[spatial,memory] 0.5108162 0.06792364 7.520449 5.458862e-14 memory <--> spatial
C[spatial,math] 0.7790230 0.04665441 16.697736 1.361327e-62 math <--> spatial
C[verbal,memory] 0.3463073 0.06433754 5.382662 7.339215e-08 memory <--> verbal
C[verbal,math] 0.7149260 0.03895329 18.353416 3.099934e-75 math <--> verbal
C[memory,math] 0.6462679 0.05390219 11.989642 4.026422e-33 math <--> memory
V[cubes] 17.2435545 1.54451622 11.164373 6.091980e-29 cubes <--> cubes
V[flags] 49.6008557 5.10084929 9.724039 2.381442e-22 flags <--> flags
V[paper] 6.2302089 0.55792080 11.166834 5.925551e-29 paper <--> paper
V[visual] 23.4058957 2.94816071 7.939152 2.035683e-15 visual <--> visual
V[general] 46.0293821 4.70499846 9.783081 1.330957e-22 general <--> general
V[paragrap] 4.0002855 0.39637139 10.092266 5.977355e-24 paragrap <--> paragrap
V[sentence] 6.7333966 0.73838990 9.119026 7.580260e-20 sentence <--> sentence
V[wordc] 14.5526746 1.32499385 10.983202 4.603146e-28 wordc <--> wordc
V[wordm] 16.0038823 1.69905922 9.419261 4.542739e-21 wordm <--> wordm
V[figurer] 34.7745483 3.65651230 9.510305 1.901038e-21 figurer <--> figurer
V[figurew] 12.9489304 1.16993513 11.068075 1.792078e-28 figurew <--> figurew
V[numberf] 14.8673713 1.37745320 10.793377 3.699420e-27 numberf <--> numberf
V[numberr] 43.7757571 4.04762882 10.815161 2.917801e-27 numberr <--> numberr
V[object] 17.8024992 1.64412044 10.827978 2.536967e-27 object <--> object
V[wordr] 87.2656598 8.58151618 10.169026 2.726215e-24 wordr <--> wordr
V[arithmet] 14.4741292 1.30214271 11.115624 1.053114e-28 arithmet <--> arithmet
V[deduct] 223.8658487 20.15310880 11.108254 1.143741e-28 deduct <--> deduct
V[numeric] 13.4602271 1.21106704 11.114353 1.068216e-28 numeric <--> numeric
V[problemr] 46.8548686 4.40132174 10.645636 1.827323e-26 problemr <--> problemr
V[series] 35.8131070 3.71864165 9.630696 5.932631e-22 series <--> series

Iterations = 319

7. 위의 내용은 R Commander 출력창에 등장하는 분석 결과 요약이다.

'Statistics > Dimensional analysis' 카테고리의 다른 글

5.4. Add hierarchical clustering to data set... (0)	2022.03.20
5.3. Summarize hierarchical clustering... (0)	2022.03.20
5.2. Hierarchical cluster analysis... (0)	2022.03.20
5.1. k-means cluster analysis... (0)	2022.03.18
3. factor analysis... (0)	2022.03.08

PREV 이전 1 ···18 19 20 21 22 23 24 ···76 NEXT 다음