v2.0.0
Changed
- BREAKING: Removed methods
BaseStep::addToResult()
,BaseStep::addLaterToResult()
,BaseStep::addsToOrCreatesResult()
,BaseStep::createsResult()
, andBaseStep::keepInputData()
. These methods were deprecated in v1.8.0 and should be replaced withStep::keep()
,Step::keepAs()
,Step::keepFromInput()
, andStep::keepInputAs()
. - BREAKING: Added the following keep methods to the
StepInterface
:StepInterface::keep()
,StepInterface::keepAs()
,StepInterface::keepFromInput()
,StepInterface::keepInputAs()
, as well asStepInterface::keepsAnything()
,StepInterface::keepsAnythingFromInputData()
andStepInterface::keepsAnythingFromOutputData()
. If you have a class that implements this interface without extendingStep
(orBaseStep
), you will need to implement these methods yourself. However, it is strongly recommended to extendStep
instead. - BREAKING: With the removal of the
addToResult()
method, the library no longer usestoArrayForAddToResult()
methods on output objects. Instead, please usetoArrayForResult()
. Consequently,RespondedRequest::toArrayForAddToResult()
has been renamed toRespondedRequest::toArrayForResult()
. - BREAKING: Removed the
result
andaddLaterToResult
properties fromIo
objects (Input
andOutput
). These properties were part of theaddToResult
feature and are now removed. Instead, use thekeep
property where kept data is added. - BREAKING: The signature of the
Crawler::addStep()
method has changed. You can no longer provide a result key as the first parameter. Previously, this key was passed to theStep::addToResult()
method internally. Now, please handle this call yourself. - BREAKING: The return type of the
Crawler::loader()
method no longer allowsarray
. This means it's no longer possible to provide multiple loaders from the crawler. Instead, use the new functionality to directly provide a custom loader to a step described below. As part of this change, theUnknownLoaderKeyException
was also removed as it is now obsolete. If you have any references to this class, please make sure to remove them. - BREAKING: Refactored the abstract
LoadingStep
class to a trait and removed theLoadingStepInterface
. Loading steps should now extend theStep
class and use the trait. As multiple loaders are no longer supported, theaddLoader
method was renamed tosetLoader
. Similarly, the methodsuseLoader()
andusesLoader()
for selecting loaders by key are removed. Now, you can directly provide a different loader to a single step using the trait's newwithLoader()
method (e.g.,Http::get()->withLoader($loader)
). The trait now also uses phpdoc template tags, for a generic loader type. You can define the loader type by putting/** @use LoadingStep<MyLoader> */
aboveuse LoadingStep;
in your step class. Then your IDE and static analysis (if supported) will know what type of loader, the trait methods return and accept. - BREAKING: Removed the
PaginatorInterface
to allow for better extensibility. The oldCrwlr\Crawler\Steps\Loading\Http\Paginators\AbstractPaginator
class has also been removed. Please use the newer, improved versionCrwlr\Crawler\Steps\Loading\Http\AbstractPaginator
. This newer version has also changed: the first argumentUriInterface $url
is removed from theprocessLoaded()
method, as the URL also is part of the request (Psr\Http\Message\RequestInterface
) which is now the first argument. Additionally, the default implementation of thegetNextRequest()
method is removed. Child implementations must define this method themselves. If your custom paginator still has agetNextUrl()
method, note that it is no longer needed by the library and will not be called. ThegetNextRequest()
method now fulfills its original purpose. - BREAKING: Removed methods from
HttpLoader
:$loader->setHeadlessBrowserOptions()
=> use$loader->browser()->setOptions()
instead$loader->addHeadlessBrowserOptions()
=> use$loader->browser()->addOptions()
instead$loader->setChromeExecutable()
=> use$loader->browser()->setExecutable()
instead$loader->browserHelper()
=> use$loader->browser()
instead
- BREAKING: Removed method
RespondedRequest::cacheKeyFromRequest()
. UseRequestKey::from()
instead. - BREAKING: The
HttpLoader::retryCachedErrorResponses()
method now returns an instance of the newCrwlr\Crawler\Loader\Http\Cache\RetryManager
class. This class provides the methodsonly()
andexcept()
to restrict retries to specific HTTP response status codes. Previously, this method returned theHttpLoader
itself ($this
), so if you're using it in a chain and calling other loader methods after it, you will need to refactor your code. - BREAKING: Removed the
Microseconds
class from this package. It has been moved to thecrwlr/utils
package, which you can use instead.
Added
- New methods
FileCache::prolong()
andFileCache::prolongAll()
to allow prolonging the time to live for cached responses.
Fixed
- The
maxOutputs()
method is now also available and working onGroup
steps. - Improved warning messages for step validations that are happening before running a crawler.
- A
PreRunValidationException
when the crawler finds a problem with the setup, before actually running, is not only logged as an error via the logger, but also rethrown to the user. This way the user won't get the impression, that the crawler ran successfully without looking at the log messages.
Detailed upgrade guide on https://www.crwlr.software/packages/crawler/v2.0/upgrade-guide