[問題] dataframe includes date with caret

看板R_Language作者 (Babysian)時間10年前 (2015/11/03 04:18), 10年前編輯推噓5(5021)
留言26則, 2人參與, 最新討論串1/1
文章分類提示: - 問題: 當你想要問問題時,請使用這個類別 [問題類型]: 程式諮詢(我想用R 做某件事情,但是我不知道要怎麼用R 寫出來) [軟體熟悉度]: 入門 [問題敘述]: 我有一個dataframe,裡面包含日期變數, 'data.frame': 1000 obs. of 49 variables: $ estate_Post : int 10069 10065 10044 10044 10044 10045 10044 10045 10044 10045 ... $ estate_TransType : int 3 1 4 2 4 4 4 4 4 4 ... $ estate_LandArea : num 15.54 47.3 20.89 1.99 23.98 ... $ estate_ZoneUse : int 2 2 3 3 3 3 3 3 3 3 ... $ estate_TransDate : Date, format: "1989-03-01" "1998-01-01" "2015-01-01" "2015-01-01" ... $ estate_Land : int 1 1 1 0 1 1 1 1 1 1 ... $ estate_House : int 1 0 1 0 1 1 1 1 1 1 ... $ estate_ParkingLot : int 0 0 2 2 2 1 3 3 4 3 ... $ estate_TransFloor : int 5 -99 17 -4 11 6 6 5 15 5 ... $ estate_TotalFloor : int 5 -99 31 31 31 31 31 31 31 31 ... $ estate_HouseType : int 1 12 2 12 2 2 2 2 2 2 ... $ estate_HouseUse : int 1 -99 1 3 1 1 1 1 1 1 ... $ estate_HouseMaterials: int 5 -99 13 13 13 13 13 13 13 13 ... $ estate_HouseDate : Date, format: "1967-05-19" NA "2013-11-29" "2013-11-29" ... $ estate_HouseArea : num 35.1 0 442.7 62.1 507.1 ... $ estate_HouseRoom_1 : int 1 0 5 0 5 4 4 4 3 4 ... $ estate_HouseRoom_2 : int 1 0 2 0 2 2 2 2 2 2 ... $ estate_HouseRoom_3 : int 1 0 6 0 6 3 3 3 3 3 ... $ estate_HouseRoom_4 : int 1 1 1 1 1 1 1 1 1 1 ... $ estate_Guards : int 2 2 2 2 2 2 2 2 2 2 ... $ estate_Price : int 3535 54299 164882 -99 195808 181428 174799 175356 190717 165250 ... $ estate_ParkingType : int -99 -99 3 4 3 4 4 4 4 4 ... $ estate_ParkingArea : num 0 0 13.2 32.2 27.5 ... $ estate_ParkingPrice : int 0 0 0 5600000 0 0 0 0 8400000 0 ... $ estate_Lng : num 122 122 122 122 122 ... $ estate_Lat : num 25 25 25 25 25 ... $ Aport_Distance : num 7.3 6.7 5.3 5.3 5.3 5.3 5.3 5.3 5.3 5.3 ... $ ParkB_Distance : num 0.29 0.785 0.214 0.217 0.215 ... $ Univ_Distance : num 1.7 1 1 1 1 1 1 1 1 1 ... $ ParkR_Distance : num 1.4 2 1.7 1.7 1.7 1.6 1.7 1.7 1.7 1.6 ... $ MRT_StationDistance : num 0.914 0.327 0.403 0.401 0.402 ... $ MRT_LineDistance : num 999 999 999 999 999 999 999 999 999 999 ... $ Fway_EntranceDistance: int 999 999 999 999 999 999 999 999 999 999 ... $ Fway_LineDistance : int 999 999 999 999 999 999 999 999 999 999 ... $ TRA_StationDistance : num 1 1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ... $ THSR_StationDistance : num 3.1 2.5 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ... $ River_Distance : num 999 1.84 1.49 1.48 1.49 ... $ Schools_Distance : num 0.2 0.2 0.7 0.7 0.7 0.8 0.7 0.7 0.7 0.8 ... $ Lib_Distance : num 0.8 0.9 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 ... $ Sport_Distance : num 2.4 1.8 0.9 0.9 0.9 0.8 0.9 0.9 0.9 0.8 ... $ ParkS_Distance : num 0.6 1 0.6 0.6 0.6 0.7 0.6 0.6 0.6 0.7 ... $ Hyper_Distance : num 1.3 0.6 1.2 1.2 1.2 1.1 1.2 1.2 1.2 1.1 ... $ Shop_Distance : num 1.7 1 0.5 0.5 0.5 0.4 0.5 0.5 0.5 0.4 ... $ Post_Distance : num 0.5 0.2 0.5 0.5 0.5 0.4 0.5 0.5 0.5 0.4 ... $ Hosp_Distance : num 0.7 0.4 0.9 0.9 0.9 0.8 0.9 0.9 0.9 0.8 ... $ Gas_Distance : num 0.5 0.4 1.4 1.4 1.4 1.4 1.4 1.5 1.4 1.4 ... $ Incin_Distance : num 10.9 10.2 8.9 8.9 8.9 8.9 8.9 8.9 8.9 8.9 ... $ Mort_Distance : num 6.3 5.7 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 ... $ estate_TotalPrice : num 124117 2568347 73000000 5600000 99300000 ... 當我將日期變數寫成as.Date後,在挑選參數時會有錯誤訊息 Error in { : task 1 failed - "rfe is expecting 48 importance values but only has 46" In addition: Warning messages: 1: In predict.lm(object, x) : prediction from a rank-deficient fit may be misleading 請問我該怎麼改才好 [程式範例]: library(mlbench) library(caret) library(maps) library(rgdal) library(raster) library(sp) library(spdep) library(GWmodel) library(e1071) library(plyr) library(kernlab) library(zoo) mydata <- read.csv("E:/SupportVectorRegression/Realestatedata_1000_delete_date.csv", header=TRUE) mydata$estate_TransDate<-as.Date(paste(mydata$estate_TransDate,1,sep="-"),format="%Y-%m-%d") mydata$estate_HouseDate<-as.Date(mydata$estate_HouseDate,format="%Y-%m-%d") rfectrl <- rfeControl(functions=lmFuncs, method="cv",number=10,verbose=TRUE,returnResamp = "final") results <- rfe(mydata[,1:4],mydata[,49],sizes = c(1:49),rfeControl=rfectrl,method = "svmRadial") #metric = "Rsquared" print(results) predictors(results) plot(results, type=c("g", "o")) [環境敘述]: R version 3.2.2 (2015-08-14) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 8 x64 (build 9200) [關鍵字]: caret、dataframe、date -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 60.250.235.236 ※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1446495492.A.7CB.html ※ 編輯: babysian7 (60.250.235.236), 11/03/2015 04:23:00

11/03 08:40, , 1F
算correlation看看是不是有兩個變數跟其他變數相關
11/03 08:40, 1F

11/03 08:40, , 2F
係數很高
11/03 08:40, 2F

11/03 08:43, , 3F
這個真像實價等登錄的資料
11/03 08:43, 3F

11/03 09:07, , 4F
感覺是input date出錯,date是你的變數之一嗎?
11/03 09:07, 4F

11/03 13:42, , 5F
您好,裡面的兩個變數date型態,我想把他們當作input,
11/03 13:42, 5F

11/03 13:42, , 6F
但不知道是哪裡出錯了
11/03 13:42, 6F

11/03 14:08, , 7F

11/03 14:08, , 8F
跟我想法一致XDD
11/03 14:08, 8F

11/03 14:09, , 9F
我自己去生成date去跑沒問題 他當成整數在run
11/03 14:09, 9F

11/03 14:09, , 10F
應該是你資料有一部分是相依
11/03 14:09, 10F

11/03 14:09, , 11F
我也試過NA沒有問題
11/03 14:09, 11F

11/06 16:58, , 12F
您好:謝謝您的解答。另外在更改的過程中有新的問題,
11/06 16:58, 12F

11/06 16:58, , 13F
我把NA的部分都改掉,錯誤訊息是missing value where T
11/06 16:58, 13F

11/06 16:58, , 14F
RUE/FALSE needed In adition:There were20 warnings(
11/06 16:58, 14F

11/06 16:58, , 15F
use warnings() to see them)
11/06 16:58, 15F

11/06 17:00, , 16F
不是很明白,因為我的資料都是連續型的數值,沒有TRUE/
11/06 17:00, 16F

11/06 17:00, , 17F
FALSE...
11/06 17:00, 17F

11/07 11:25, , 18F
沒看到程式 我也無法隔空抓藥 如果能附資料一起 我
11/07 11:25, 18F

11/07 11:25, , 19F
才能重現錯誤 並嘗試找出解決方法
11/07 11:25, 19F

11/11 13:35, , 20F
您好:我將資料整理好如下
11/11 13:35, 20F

11/11 13:35, , 22F
NN8GKdVqkgOM6OQ-a?dl=0
11/11 13:35, 22F

11/11 13:35, , 23F
謝謝
11/11 13:35, 23F

11/12 21:45, , 24F
放棄~"~ 不知道怎麼辦qq
11/12 21:45, 24F

11/12 21:45, , 25F
寫信去問作者吧QQ
11/12 21:45, 25F

11/13 13:00, , 26F
還是謝謝您撥空幫忙:)
11/13 13:00, 26F
文章代碼(AID): #1MDyK4VB (R_Language)
文章代碼(AID): #1MDyK4VB (R_Language)