Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A timeout error crashed the execution #9

Open
crawler-IM opened this issue Dec 3, 2013 · 4 comments
Open

A timeout error crashed the execution #9

crawler-IM opened this issue Dec 3, 2013 · 4 comments

Comments

@crawler-IM
Copy link

Hi,
Thanks for resolving the problem noted in the issue 8,

We launched the Pagelyzer with this command :
$ ruby1.9.1 pagelyzer changedetection --urls=../fin.txt --output-file=../fout.txt --headless --output-folder=./first-1500/ --type=hybrid --url-archive

where "fin.txt" is a text file which contains 1500 pairs of link, in each line : "url1 url2"

  • After treating 216 pairs, pagelyzer crashed and displayed in the screen :

Capturing http://im1c6.internetmemory.org/tna/20110904155927/http://blogs.fco.gov.uk/roller/warren/ with local firefox
ERROR: Page load timeout execution expired
there were a problem in the capture of pages
ERROR: can't process these urls:
http://im1c6.internetmemory.org/tna/20110904155927/http://blogs.fco.gov.uk/roller/warren/
http://webarchive.nationalarchives.gov.uk/20110904155927/http://blogs.fco.gov.uk/roller/warren/
Closing browsers
Browser firefox rest open
Timeout: 30secs
Capturing http://im1c6.internetmemory.org/tna/20100512151544/http://www.huntinginquiry.gov.uk/mainsections/huntingframe.htm with local firefox
Getting screenshot
Waiting page to finish loading...
done.
Timeout: 30secs
Capturing http://webarchive.nationalarchives.gov.uk/20100512151544/http://www.huntinginquiry.gov.uk/mainsections/huntingframe.htm with local firefox
Getting screenshot
Waiting page to finish loading...
done.
/1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/lib/pagelyzer_analyzer.rb:282:in start': undefined methodprocess_path' for [{id:0 pid: cand:["frame"]} chl:[]]:Array (NoMethodError)
from /1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_changedetection:321:in block in <main>' from /1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_changedetection:244:ineach'
from /1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_changedetection:244:in `

'

-And in the output file "fout.txt" :
< test >
< url href="http://im1c6.internetmemory.org/tna/20110904155927/http://blogs.fco.gov.uk/roller/warren/" browser="firefox"/ >
< url href="http://webarchive.nationalarchives.gov.uk/20110904155927/http://blogs.fco.gov.uk/roller/warren/" browser="firefox"/ >
< score >ERROR Time out loading http://im1c6.internetmemory.org/tna/20110904155927/http://blogs.fco.gov.uk/roller/warren/ < /score >
< time >0< /time >
< /test >

Can you please tell us if the problem is from Selenium or from Pagelyzer? and how to avoid it in the future?

Thanks

@asanoja
Copy link
Contributor

asanoja commented Dec 3, 2013

Hi, let me look at the error and get back to you soon.

best

@crawler-IM
Copy link
Author

A second test was launched in order to help to identify the origin of the problem, and this time it has crashed after 213 pairs of URLs, no errors was written in the output file (just the result of the successful comparaison. But on the screen the following lines were displayed :

Capturing http://im1c6.internetmemory.org/tna/20100512151544/http://www.huntinginquiry.gov.uk/mainsections/huntingframe.htm with local firefox
Getting screenshot
Waiting page to finish loading...
done.
Timeout: 30secs
Capturing http://webarchive.nationalarchives.gov.uk/20100512151544/http://www.huntinginquiry.gov.uk/mainsections/huntingframe.htm with local firefox
Getting screenshot
Waiting page to finish loading...
done.
/1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/lib/pagelyzer_analyzer.rb:282:in start': undefined methodprocess_path' for [{id:0 pid: cand:["frame"]} chl:[]]:Array (NoMethodError)
from /1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_changedetection:321:in block in <main>' from /1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_changedetection:244:ineach'

It seems to be the same as the previous one.
from /1/crawl/hatem-tmp/pagelyzer/EUP-176/pagelyzer-ruby-0.9.1-standalone/bin/pagelyzer_changedetection:244:in `

'

I noticed also that the line " Timeout: 30secs " was written for each URL loading. Could you please confirm what does, that expression, mean?

Thanks.

@asanoja
Copy link
Contributor

asanoja commented Dec 9, 2013

Hi all,

it would be very helpfull for me, to have the revision hash of your git working copy. So, I can test the same files version you have. The date of clone is also good.

best regards.

@crawler-IM
Copy link
Author

Hi André,

We're using the zipped file which is noted in the README.md :
www-poleia.lip6.fr/%7Esanojaa/pagelyzer-ruby-0.9.1-standalone.zip
downloaded 19-04-2013

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants