You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It should be up to crawler developer whether he want's to parallelize the process, or not.
For example for some social networks it is not good to scrap with many parallel threads.
For we need to come up with proof-of-concept, that allows to parallelize certain pieces with separate sidekiq tasks.
For gallery crawler it makes sense to parallelize each separate page scrapping, to make it faster.
parallelize do |context|
# some action
end
Passing code to this block should spawn a separate job, that receives all necessary context, for example page url, maybe cookies and so on.
The text was updated successfully, but these errors were encountered:
The question is, how are we going to collect all scrapped data?
We need to come up with some synchronization mechanism, or each worker should report its results separately.
It should be up to crawler developer whether he want's to parallelize the process, or not.
For example for some social networks it is not good to scrap with many parallel threads.
For we need to come up with proof-of-concept, that allows to parallelize certain pieces with separate sidekiq tasks.
For gallery crawler it makes sense to parallelize each separate page scrapping, to make it faster.
Passing code to this block should spawn a separate job, that receives all necessary context, for example page url, maybe cookies and so on.
The text was updated successfully, but these errors were encountered: