Skip to content

Simple web scrapping app made using Spring Boot + Thymeleaf + Jsoup + Java 8 Lambdas & Streams

License

Notifications You must be signed in to change notification settings

txitxarra/spring-boot-web-scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spring Boot Web Scraper

About

This is a demo project. The idea was to build some basic web scraping app.

It was made using Spring Boot, Spring Security, Thymeleaf, Spring Data JPA, Spring Data REST and Docker. Database is in memory H2.

There is a login and registration functionality included.

Users can add web links to their profile and tag them. Links are also suggested based on data scraping of the website on given link.

Configuration

Configuration Files

Folder src/resources/ contains config files for web-scraper Spring Boot application.

  • src/resources/application.properties - main configuration file. Here it is possible to change admin username/password, as well as change the port number.

How to run

There are several ways to run the application. You can run it from the command line with included Maven Wrapper, Maven or Docker.

Once the app starts, go to the web browser and visit http://localhost:8090/home

Admin username: admin

Admin password: admin

User username: user

User password: password

Maven Wrapper

Using the Maven Plugin

Go to the root folder of the application and type:

$ chmod +x scripts/mvnw
$ scripts/mvnw spring-boot:run

Using Executable Jar

Or you can build the JAR file with

$ scripts/mvnw clean package

Then you can run the JAR file:

$ java -jar target/web-scraper-0.0.1-SNAPSHOT.jar

Maven

Open a terminal and run the following commands to ensure that you have valid versions of Java and Maven installed:

$ java -version
java version "1.8.0_102"
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
$ mvn -v
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00)
Maven home: /usr/local/Cellar/maven/3.3.9/libexec
Java version: 1.8.0_102, vendor: Oracle Corporation

Using the Maven Plugin

The Spring Boot Maven plugin includes a run goal that can be used to quickly compile and run your application. Applications run in an exploded form, as they do in your IDE. The following example shows a typical Maven command to run a Spring Boot application:

$ mvn spring-boot:run

Using Executable Jar

To create an executable jar run:

$ mvn clean package

To run that application, use the java -jar command, as follows:

$ java -jar target/web-scraper-0.0.1-SNAPSHOT.jar

To exit the application, press ctrl-c.

Docker

It is possible to run web-scraper using Docker:

Build Docker image:

$ mvn clean package
$ docker build -t web-scraper:dev -f docker/Dockerfile .

Run Docker container:

$ docker run --rm -i -p 8090:8090 \
      --name web-scraper \
      web-scraper:dev
Helper script

It is possible to run all of the above with helper script:

$ chmod +x scripts/run_docker.sh
$ scripts/run_docker.sh

Docker

Folder docker contains:

  • docker/web-scraper/Dockerfile - Docker build file for executing web-scraper Docker image. Instructions to build artifacts, copy build artifacts to docker image and then run app on proper port with proper configuration file.

Util Scripts

  • scripts/run_docker.sh.sh - util script for running web-scraper Docker container using docker/Dockerfile

Tests

Tests can be run by executing following command from the root of the project:

$ mvn test

Helper Tools

HAL REST Browser

Go to the web browser and visit http://localhost:8090/

You will need to be authenticated to be able to see this page.

H2 Database web interface

Go to the web browser and visit http://localhost:8090/h2-console

In field JDBC URL put

jdbc:h2:mem:web_scraper_db

In /src/main/resources/application.properties file it is possible to change both web interface url path, as well as the datasource url.

About

Simple web scrapping app made using Spring Boot + Thymeleaf + Jsoup + Java 8 Lambdas & Streams

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 67.2%
  • HTML 30.5%
  • Shell 1.4%
  • Other 0.9%