[問題] Shiny上做crawler(爬蟲)

看板R_Language作者 (你好)時間8年前 (2017/09/05 23:24), 編輯推噓0(007)
留言7則, 3人參與, 最新討論串1/1
- 問題: 各位大大好, 目前想在shiny上做爬蟲, 但一直出現'Warning: Error in curl::curl_fetch_memory: Bad URL, colon is first character', 感激不盡! [問題類型]: Shiny與Crawler做連結 [問題敘述]: 想在Shiny進行爬蟲, 並進行文字雲分析! [程式範例]: #Submit使用 https://forum.gamer.com.tw/C.php?bsn=23805&snA=564246&tnum=13 為例 suppressPackageStartupMessages({ # library(tcltk) library(httr) library(data.table) library(stringr) library(rvest) require(jiebaR) require(data.table) # library(tidyverse) library(text2vec) library(stringr) # library(iterators) library(pbapply) # library(doParallel) library(class) library(plyr) library(DT) library(wordcloud) require(RColorBrewer) library(reshape2) library(tmcn) library(parallel) library(shiny) library(curl) }) ui <- shinyUI( fluidPage( # Application title titlePanel("Word Cloud"), tags$style(type="text/css", ".shiny-output-error { visibility: hidden; }", ".shiny-output-error:before { visibility: hidden; }" ), sidebarLayout( # Sidebar with a slider and selection inputs sidebarPanel( ####### textInput("scholarid",'google scholar profile link',value = ""), actionButton("submit", "Submit"), hr(), sliderInput("freq", "Minimum Frequency:", min = 1, max = 50, value = 15), sliderInput("max", "Maximum Number of Words:", min = 1, max = 300, value = 100) ), # Show Word Cloud mainPanel( plotOutput("plot") ) ) ) ) server <- shinyServer(function(input, output, session) { # Define a reactive expression for the document term matrix terms <- reactive({ # Change when the "update" button is pressed... input$update # ...but not for anything else isolate({ withProgress({ setProgress(message = "Processing corpus...") getTermMatrix(input$submit) }) }) }) # Make the wordcloud drawing predictable during a session wordcloud_rep <- repeatable(wordcloud) output$plot <- renderPlot({ v <- terms() wordcloud_rep(names(v), v, scale=c(8,1), min.freq = input$freq, max.words=input$max, colors=brewer.pal(8, "Dark2")) }) }) # Using "memoise" to automatically cache the results getTermMatrix <- function(f) { cutter <- worker() core <- detectCores() - 1 cl <- makeCluster(core) clusterEvalQ(cl, library(magrittr)) clusterEvalQ(cl, library(httr)) clusterEvalQ(cl, library(rvest)) f <- as.character(f) Contents <- f %>% GET(encoding = 'UTF-8') %>% content %>% html_nodes(css = '.c-article__content div') %>% html_text() DF_Data_list <- list() for(i in 1:length(Contents)){ DF_Data_list[i] <- as.character(Contents[i]) } text <- sapply(DF_Data_list, function(x) { segment(x, cutter)}) Data_list_split.token <- itoken(text) Data_list_split.vocab <- create_vocabulary(Data_list_split.token, ngram = c(1L, 1L)) Data_list_split.vocab <- Data_list_split.vocab %>% data.frame %>% .[order(. $ term_count, decreasing = T), -3] v <- Data_list_split.vocab[, 2] %>% as.vector() names(v) <- Data_list_split.vocab $ term v } shinyApp(ui, server) -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 1.161.243.9 ※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1504625084.A.934.html

09/06 00:52, , 1F
看錯誤是說網址有誤。你有針對錯誤訊息檢查了嗎?
09/06 00:52, 1F

09/06 11:00, , 2F
回A大, 有喔! 因為在一般R執行可以爬到
09/06 11:00, 2F

09/06 11:01, , 3F
但放到shiny上就出問題惹
09/06 11:01, 3F

09/06 16:39, , 4F
錯誤訊息不會騙人,網址有誤
09/06 16:39, 4F

09/06 16:41, , 5F
你的shiny是shiny server還是一般使用者?
09/06 16:41, 5F

09/06 16:57, , 6F
W大你好, 我的shiny是一般使用者.
09/06 16:57, 6F

09/06 17:13, , 7F
我還沒放url Submit時, 就出現error
09/06 17:13, 7F
文章代碼(AID): #1Phi6yaq (R_Language)
文章代碼(AID): #1Phi6yaq (R_Language)