Sinama is a simple web scraping library.
- PHP 7.0
composer require rafaelglikis/sinama
Create a Sinama Client (which extends Goutte\Client):
use Sinama\Client;
$client = new Client();
Make requests with the request() method:
// Go to the website
$crawler = $client->request('GET', '');
The method returns a Crawler object (which extends Symfony/Component/DomCrawler/Crawler).
To use your own Guzzle settings, you may create and pass a new Guzzle 6 instance to Sinama Client. For example, to add a 60 second request timeout:
use Sinama\Client;
use GuzzleHttp\Client as GuzzleClient;
$client = new Client(new GuzzleClient([
'timeout' => 60
$crawler = $client->request('GET', '');
For more options visit Guzzle Documentation.
Click on links:
$link = $crawler->selectLink('PHP')->link();
$crawler = $client->click($link);
echo $crawler->getUri()."\n";
Extract data the symfony way:
$crawler->filter('h3 > a')->each(function ($node) {
print trim($node->text())."\n";
Or use Sinama special methods:
$crawler = $client->request('GET', '');
echo '<html>';
echo '<head>';
echo '<title>'.$crawler->findTitle().'</title>';
echo '<head>';
echo '<body>';
echo '<h1>'.$crawler->findTitle().'</h1>';
echo '<p>Main Image: '.$crawler->findMainImage().'</p>';
echo $crawler->findMainContent();
echo '<pre>';
echo 'Links: ';
echo 'Emails: ';
echo 'Images: ';
echo '</pre>';
echo '</body>';
echo '</html>';
Submit forms:
$crawler = $client->request('GET', '');
$form = $crawler->selectButton('Google Search')->form();
$crawler = $client->submit($form, ['q' => 'rafaelglikis/sinama']);
$crawler->filter('h3 > a')->each(function ($node) {
print trim($node->text())."\n";
Now that we have learned enough let's scrape a site with Sinama Spider:
use Sinama\Crawler;
use Sinama\Spider as BaseSpider;
class Spider extends BaseSpider
public function parse(Crawler $crawler)
$crawler->filter(' > a')->each(function (Crawler $node) {
$crawler->filter(' > a')->each(function ($node) {
public function scrape($url)
echo "*************************************************** ".$url."\n";
$crawler = $this->client->request('GET', $url);
echo "Title: " . $crawler->findTitle() . "\n";
echo "Main Image: " . $crawler->findMainImage()."\n";
echo "Main Content: \n" . $crawler->findMainContent()."\n";
echo "Emails: \n";
echo "Links: \n";
public function getStartUrls(): array
return [
$spider = new Spider([
'start_urls' => [ '' ],
'max_depth' => 2,
'verbose' => true
- Crawler::findTags()