Skip to content

Xtact/site-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

shot-scraper-template

Quickly create a new GitHub repository that takes automated screenshots of a web page using shot-scraper.

Read more about how this works in Instantly create a GitHub repository to take screenshots of a web page.

simonw/simonwillison-net-shot is an example repository created using this template.

How to get started

Visit https://github.com/simonw/shot-scraper-template/generate

Screenshot of the interface for creating a new repository with this template, showing the URL pasted into the description field

Pick a name for your new repository, and paste the URL of the page you would like to take screenshots of into the description field (including the http:// or https://).

Then click Create repository from template.

Your new repository will be created, and a script will run which will do the following:

  • Add a shots.yml file to your repository containing the URL of the page you requested
  • Take a screenshot of that URL and add that to your repository as a file called shot.png

You can then edit that shots.yml file to customize your screenshot, or add more URLs - see below.

If the script does not run when the repository is first created you may need to Enable Actions first:

  • Click the "Actions" tab
  • Clice "Enable Actions"
  • Run the "Take screenshots" workflow as described below

Re-taking the screenshot

To re-take the screenshot:

  • Click "Actions"
  • Select the "Take screenshots" workflow
  • Click the "Run workflow" menu item
  • Click the green "Run workflow" button

image

The repository will keep a history of every previous version of each screenshot, which is useful for keeping track of visual changes to a page.

Configuring the screenshots

The initial shots.yml file in your repository should look like this:

- url: https://simonwillison.net/
  output: shot.png
  height: 800

To change the name of the file that the screenshot is saved to, change output: shot.png to a different file name.

To take a full height image of the page, remove the height: 800 line.

To add additional screenshots, add them to the YAML file like this:

- url: https://simonwillison.net/
  output: shot.png
  height: 800
- url: https://www.example.com/
  output: example.png
  height: 800

Other useful options include:

  • wait: 3000 to add a 3 second delay before taking the shot (in case some things need more time to load)
  • javascript: ... to execute custom JavaScript before taking the shot - to activate menus or hide elements or similar
  • quality: 80 to save a smaller, lower quality JPEG image

This example takes a shot of the LA Times homepage after hiding ads and the terms of service prompt:

- url: https://www.latimes.com/
  output: latimes.jpg
  width: 1600
  height: 1600
  quality: 80
  wait: 2000
  javascript: |
    document.querySelectorAll(
      '[data-ad-rendered],#ensNotifyBanner'
    ).forEach(el => el.style.display = 'none')

Further options are described in the shot-scraper README file.

Installing fonts for more languages

The default Ubuntu used by GitHub Actions does not include fonts for many languages, including Chinese and Japanese.

You can modify the shots.yml file to install extra fonts by adding this section, between the "Cache Playwright browsers" and "Install dependencies" steps:

    - name: Cache Playwright browsers
      uses: actions/cache@v2
      with:
        path: ~/.cache/ms-playwright/
        key: ${{ runner.os }}-browsers
    - name: Install extra fonts
      run: |
        sudo apt-get install fonts-arphic-ukai fonts-arphic-uming fonts-ipafont-mincho fonts-ipafont-gothic fonts-unfonts-core
    - name: Install dependencies
      run: |
        pip install -r requirements.txt

About

Shot scraper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published