Re: [問題] 關於網頁資料擷取~~~

看板Ruby作者Shenk (閑客)時間14年前 (2011/06/11 00:38)推噓0(0推 0噓 0→)

留言0則, 0人參與討論串2/2 (看更多)

之前有寫過一個把網頁作成command line可以存取的程式可以給你參考 require 'open-uri' require 'password' require 'hpricot' require 'net/http' url = ARGV[0] begin cookie=open('cookie').read file = open(url, 'Cookie'=>cookie) doc = Hpricot.parse(file) throw Exception.new if (doc/'tr[@valign="top"] > td[@align="right"] >a[@href="/login/"]').length != 0 rescue Exception => e uri = URI.parse('http://some.web.site.com/') req = Net::HTTP::Post.new('/login/') warn "Password?" password = Password.get csrf = Net::HTTP.new(uri.host, uri.port).start {|http| http.get('/login/')} csrfmiddlewaretoken = (Hpricot.parse(csrf.body)/'input[@name="csrfmiddlewaretoken"]').attr('value') req.add_field 'Refer','http://some.web.site.com/login/' req.add_field 'Cookie', csrf['set-cookie'] req.form_data = { :email=>'email@gmail.com', :password =>password, :csrfmiddlewaretoken => csrfmiddlewaretoken, :next => '/' } result= Net::HTTP.new(uri.host, uri.port).start {|http| http.request(req)} puts result open('cookie','w'){ |f| f.puts result['set-cookie']} cookie=result['set-cookie'] file = open(url, 'Cookie'=>cookie) doc = Hpricot.parse(file) end ※ 引述《Terence223 (水鏡)》之銘言： : 最近在做一個東西因此需要擷取特定網頁的資訊 : 目前已知如何直接用ruby擷取原始碼並用regular express去分析 : 再取得自己所需要的資料字串 : 但是這個網頁 : http://www.tdcc.com.tw/smWeb/QryStock.jsp : 需要輸入資訊才會冒出我所要的資料 ex : 證券代號填1101 : 而且網址也無任何變動 : 請問一下要怎麼使用ruby來取得資料填完後的網頁原始碼? : 麻煩指點一下謝謝^^" -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.114.232.77