A timeout error crashed the execution #9

crawler-IM · 2013-12-03T16:26:47Z

Hi,
Thanks for resolving the problem noted in the issue 8,

We launched the Pagelyzer with this command :
$ ruby1.9.1 pagelyzer changedetection --urls=../fin.txt --output-file=../fout.txt --headless --output-folder=./first-1500/ --type=hybrid --url-archive

where "fin.txt" is a text file which contains 1500 pairs of link, in each line : "url1 url2"

After treating 216 pairs, pagelyzer crashed and displayed in the screen :

Capturing http://im1c6.internetmemory.org/tna/20110904155927/http://blogs.fco.gov.uk/roller/warren/ with local firefox
ERROR: Page load timeout execution expired
there were a problem in the capture of pages
ERROR: can't process these urls:
http://im1c6.internetmemory.org/tna/20110904155927/http://blogs.fco.gov.uk/roller/warren/
http://webarchive.nationalarchives.gov.uk/20110904155927/http://blogs.fco.gov.uk/roller/warren/
Closing browsers
Browser firefox rest open
Timeout: 30secs
Capturing http://im1c6.internetmemory.org/tna/20100512151544/http://www.huntinginquiry.gov.uk/mainsections/huntingframe.htm with local firefox
Getting screenshot
Waiting page to finish loading...
done.
Timeout: 30secs
Capturing http://webarchive.nationalarchives.gov.uk/20100512151544/http://www.huntinginquiry.gov.uk/mainsections/huntingframe.htm with local firefox
Getting screenshot
Waiting page to finish loading...
done.
/1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/lib/pagelyzer_analyzer.rb:282:in start': undefined methodprocess_path' for [{id:0 pid: cand:["frame"]} chl:[]]:Array (NoMethodError)
from /1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_changedetection:321:in block in <main>' from /1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_changedetection:244:ineach'
from /1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_changedetection:244:in `

'

-And in the output file "fout.txt" :
< test >
< url href="http://im1c6.internetmemory.org/tna/20110904155927/http://blogs.fco.gov.uk/roller/warren/" browser="firefox"/ >
< url href="http://webarchive.nationalarchives.gov.uk/20110904155927/http://blogs.fco.gov.uk/roller/warren/" browser="firefox"/ >
< score >ERROR Time out loading http://im1c6.internetmemory.org/tna/20110904155927/http://blogs.fco.gov.uk/roller/warren/ < /score >
< time >0< /time >
< /test >

Can you please tell us if the problem is from Selenium or from Pagelyzer? and how to avoid it in the future?

Thanks

The text was updated successfully, but these errors were encountered:

asanoja · 2013-12-03T16:49:33Z

Hi, let me look at the error and get back to you soon.

best

crawler-IM · 2013-12-09T11:02:01Z

A second test was launched in order to help to identify the origin of the problem, and this time it has crashed after 213 pairs of URLs, no errors was written in the output file (just the result of the successful comparaison. But on the screen the following lines were displayed :

Capturing http://im1c6.internetmemory.org/tna/20100512151544/http://www.huntinginquiry.gov.uk/mainsections/huntingframe.htm with local firefox
Getting screenshot
Waiting page to finish loading...
done.
Timeout: 30secs
Capturing http://webarchive.nationalarchives.gov.uk/20100512151544/http://www.huntinginquiry.gov.uk/mainsections/huntingframe.htm with local firefox
Getting screenshot
Waiting page to finish loading...
done.
/1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/lib/pagelyzer_analyzer.rb:282:in start': undefined methodprocess_path' for [{id:0 pid: cand:["frame"]} chl:[]]:Array (NoMethodError)
from /1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_changedetection:321:in block in <main>' from /1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_changedetection:244:ineach'

It seems to be the same as the previous one.
from /1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_changedetection:244:in `

'

I noticed also that the line " Timeout: 30secs " was written for each URL loading. Could you please confirm what does, that expression, mean?

Thanks.

asanoja · 2013-12-09T16:22:08Z

Hi all,

it would be very helpfull for me, to have the revision hash of your git working copy. So, I can test the same files version you have. The date of clone is also good.

best regards.

crawler-IM · 2013-12-12T15:16:50Z

Hi André,

We're using the zipped file which is noted in the README.md :
www-poleia.lip6.fr/%7Esanojaa/pagelyzer-ruby-0.9.1-standalone.zip
downloaded 19-04-2013

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A timeout error crashed the execution #9

A timeout error crashed the execution #9

crawler-IM commented Dec 3, 2013

asanoja commented Dec 3, 2013

crawler-IM commented Dec 9, 2013

asanoja commented Dec 9, 2013

crawler-IM commented Dec 12, 2013

A timeout error crashed the execution #9

A timeout error crashed the execution #9

Comments

crawler-IM commented Dec 3, 2013

asanoja commented Dec 3, 2013

crawler-IM commented Dec 9, 2013

asanoja commented Dec 9, 2013

crawler-IM commented Dec 12, 2013