How to handle 404 not found errors in Nokogiri

I am using Nokogiri to scrape web pages. Few urls need to be guessed and returns 404 not found error when they don't exist. Is there a way to capture this exception?

http://yoursite/page/38475 #=> page number 38475 doesn't exist

I tried the following which didn't work.

url = "http://yoursite/page/38475"
doc = Nokogiri::HTML(open(url)) do
  begin
    rescue Exception => e
      puts "Try again later"
  end
end

It doesn't work, because you are not rescuing part of code (it's open(url) call) that raises an error in case of finding 404 status. The following code should work:

url = 'http://yoursite/page/38475'
begin
  file = open(url)
  doc = Nokogiri::HTML(file) do
    # handle doc
  end
rescue OpenURI::HTTPError => e
  if e.message == '404 Not Found'
    # handle 404 error
  else
    raise e
  end
end

BTW, about rescuing Exception : Why is it a bad style to `rescue Exception => e` in Ruby?

链接地址: http://www.djcxy.com/p/25860.html

上一篇: C#异常过滤器?

下一篇: 如何处理404在Nokogiri中找不到错误