[問題] 使用R輔助基礎機率論教學

看板R_Language作者 ( )時間11年前 (2013/12/14 22:12), 編輯推噓1(108)
留言9則, 4人參與, 最新討論串1/1
[問題類型]: 程式諮詢(我想用R 做某件事情,但是我不知道要怎麼用R 寫出來) [軟體熟悉度]: 入門(寫過其他程式,只是對語法不熟悉) [問題敘述]: 弱大數法則我還勉強能用丟銅板來"實驗"給學生看... 但是CLT要怎麼做呢= = 所以想用R來做電腦輔助教學... 而且我想示範: 1.不是n>=30就叫做大樣本。 所以除了常態分配和長得比較對稱的均勻分配以外, 其他都選取很極端的機率分佈, 結果n=30之下作Jarque Bera test都拒絕虛無假設。 2.期望值和變異數存在且有限才可適用CLT。 所以我故意選了Cauchy分配來示範。 (抽自柯西分配母體的樣本其平均仍然是柯西分配) 不過,我最後用ks.test的結果怪怪的, 是不是參數沒設定好?(難道沒辦法讓他自己抓常態分配的參數嗎= =) (用SAS和SPSS都不用調參數....是不是用R作KS檢定一定要指定參數?) [程式範例]: #請記得先安裝TSA package library(TSA) #X~N(0,1) set.seed(1) par(mfrow=c(1,2)) popnorm<-rnorm(n=100000,mean=0, sd=1) mean(popnorm) var(popnorm) plot(density(popnorm)) meannorm<-array(0,dim=c(1000)) for (i in 1:1000) { meannorm[i]=mean(sample(popnorm,size=30,replace=TRUE)) } mean(meannorm) var(meannorm) plot(density(meannorm)) ks.test(meannorm,"pnorm") jarque.bera.test(meannorm) 結果: > ks.test(meannorm,"pnorm") One-sample Kolmogorov-Smirnov test data: meannorm D = 0.3406, p-value < 2.2e-16 alternative hypothesis: two-sided > jarque.bera.test(meannorm) Jarque Bera Test data: meannorm X-squared = 4.9045, df = 2, p-value = 0.0861 最後改用這個: mean<-mean(meannorm) sd<-sqrt(var(meannorm)) ks.test(meannorm,"pnorm",mean,sd) 結果: One-sample Kolmogorov-Smirnov test data: meannorm D = 0.029, p-value = 0.3692 alternative hypothesis: two-sided 沒有違背預期的結果。 [關鍵字]: 電腦輔助教學, 機率, 中央極限定理 附註:以下試驗其他機率分佈 http://ideone.com/UkhaJO 為節省版面,貼到網路上供參考。 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 111.255.26.132

12/15 08:05, , 1F
忘了 sqrt(30) ?
12/15 08:05, 1F

12/15 08:07, , 2F
再把 CLT 看一遍.
12/15 08:07, 2F

12/15 11:50, , 3F
ks.test()若要檢驗分配是要指定參數沒錯。
12/15 11:50, 3F

12/15 11:50, , 4F
至於其它軟體不用指定參數,我猜是自動用樣本估的。
12/15 11:50, 4F

12/15 12:02, , 5F
no. sqrt(n) 才會給你正確的 CLT. n=30是猜不到的.
12/15 12:02, 5F
mean<-mean(meannorm) sd<-sqrt(var(meannorm)) ↑這裡的SD=sqrt(樣本變異數)=sqrt(母體變異數/30)=母體標準差/sqrt(30) ks.test(meannorm,"pnorm",mean,sd) 結果: One-sample Kolmogorov-Smirnov test data: meannorm D = 0.029, p-value = 0.3692 alternative hypothesis: two-sided 這個部份的sd應該不用再除sqrt(30) 因為樣本的變異數已經是母體變異數除以30了。 > mean(popnorm) [1] -0.002244083 > var(popnorm) [1] 1.007059 > mean(meannorm) [1] 0.002210398 > var(meannorm) [1] 0.03560481 其中抽出樣本的變異數約為母體變異數/30 > var(popnorm)/30 [1] 0.03356863 所以使用CLT的時候,標準差不用再除以sqrt(30)囉... 主要的問題是,不曉得有沒有用KS檢定卻不需指定參數的方法? 類似SAS和SPSS的那種... 雖然用JB檢定是滿方便的,可是初等統計教科書大多沒寫這方法... 另外,抽樣的時候設定replace=TRUE應該沒問題吧? 我想說這樣比較能確保每個樣本都iid於母體分配。 --- 程式又大幅度改寫: #X~N(0,1) set.seed(1) par(mfrow=c(1,2)) popnorm<-rnorm(n=100000,mean=0, sd=1) meanpopnorm<-mean(popnorm) varpopnorm<-var(popnorm) print(paste("The mean of the population is",round(meanpopnorm,5))) print(paste("The variance of the population is", round(varpopnorm,5))) plot(density(popnorm)) cltnorm<-function(n){ meannorm<-array(0,dim=c(1000)) for (i in 1:1000) { meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE)) } print("----------------------------------------------------------") print(paste("The mean of samples is",round(mean(meannorm),5))) print(paste(", and the variance of samples is",round(var(meannorm),5))) print(paste("According to CLT, the theoretical mean is", round(meanpopnorm,5))) print(paste(", and the theoretical variance of samples is",round(varpopnorm/n,5))) print("----------------------------------------------------------") plot(density(meannorm)) jarque.bera.test(meannorm) } 接下來重覆五次試驗(n=30): > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is 0.00355" [1] ", and the variance of samples is 0.03627" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 4.4989, df = 2, p-value = 0.1055 > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is -0.00778" [1] ", and the variance of samples is 0.03286" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 0.4475, df = 2, p-value = 0.7995 > > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is 0.00405" [1] ", and the variance of samples is 0.03348" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 3.2785, df = 2, p-value = 0.1941 > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is -0.0048" [1] ", and the variance of samples is 0.03026" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 3.2825, df = 2, p-value = 0.1937 > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is 5e-05" [1] ", and the variance of samples is 0.03214" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 10.1117, df = 2, p-value = 0.006372 最後一次試驗錯誤的拒絕虛無假設... 我該怎麼解釋呢= =" --- 剛這樣寫: testing<-function(n, m){ set.seed(1) popnorm<-rnorm(n=100000,mean=0, sd=1) error<-array(0,dim=c(m)) for (i in 1:m){ meannorm<-array(0,dim=c(1000)) for (i in 1:1000) { meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE)) } test<-jarque.bera.test(meannorm) print(test) p<-as.numeric(test$p.value) print(p) error[i]<-ifelse(p<0.05,1,0) print(error[i]) } return(list(matrix=error)) } 使用testing(30,50)測試, 可是傳回的矩陣很怪... 變成有1000個元素的一維陣列, 而且第五筆資料應該要存入1,卻反而是0。 ※ 編輯: anovachen 來自: 111.255.24.52 (12/15 23:48)

12/16 08:49, , 6F
可再抽出沒錯,不過你有大母體了也沒什麼影響。
12/16 08:49, 6F
set.seed(1) popnorm<-rnorm(n=100000,mean=0, sd=1) error<-array(0,dim=c(30)) testing<-function(n){ meannorm<-array(0,dim=c(1000)) for (i in 1:1000) { meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE)) } test<-jarque.bera.test(meannorm) print(test) p<-as.numeric(test$p.value) print(p) return(ifelse(p<0.05,1,0)) } for (i in 1:30){ error[i]<-testing(30) } 這樣寫就OK了。 不曉得上面寫法問題出在哪?? >"< ---- 以下是某不願具名者提供的程式碼: n <- 30 ret <- NULL for(i in 1:100){ x <- rnorm(n) x.hat <- mean(x) * sqrt(n) # CLT stat. goes to N(0, 1) in d. ret[i] <- x.hat } ks.test(ret, "pnorm") # Test 100 x.hat's against N(0, 1) 他指出我原本的寫法是類似bootstrap,在有限的樣本(N=100000)自助重抽樣, 以模擬從真正的母體裡抽樣的過程。 而他的寫法則比較貼近我想表達的。 依據他的寫法,我們可以如此改寫: n <- 100 ret <- NULL for(i in 1:100){ x <- runif(n,min=0,max=1) x.hat <- mean(x) ret[i] <- x.hat } hist(ret) ks.test(ret, "pnorm", mean=1/2, sd = sqrt(1/12/n)) 如此可驗證從U(0,1)中抽樣,n=100,則X-bar的分布 是否會漸近N(1/2, 1/1200)。 ※ 編輯: anovachen 來自: 111.255.13.49 (12/19 00:47)

12/22 03:36, , 8F
希望對你有幫助
12/22 03:36, 8F

12/29 16:25, , 9F
謝謝!
12/29 16:25, 9F
文章代碼(AID): #1Ih6TEF_ (R_Language)
文章代碼(AID): #1Ih6TEF_ (R_Language)