GitHub - zaid-tariq/WebSearchEngine

Web Search Engine

####Readme change log: 04.02.2020 Added ad placement section
17.12.2019 Added going live section

Build instructions

Checkout the project repository. Build using Maven.
mvn clean package -Dmaven.test.skip=true
Use the following command to run the specific part of the project: mvn spring-boot:run -Dspring-boot.run.profiles=#profiles

Replace #profiles by any of the arguments shown in the table below. If you want to run multiple profiles at a time just list them comma separated. We recommend to run the crawler and indexer profile at the same time.

Command	Description
crawler	Runs the crawler
indexer	Runs the indexer. Note that you need to pass the arguments in the same command. See section 'CLI to schedule task' for further information.
cli	Runs the command line. Note that you need to pass the arguments in the same command. See section 'Task3, CLI' for further information about parameters etc.

Usage instructions

Crawler

For crawler, add seed urls in the seed_urls.txt file in src/main/resources (one per line). The crawler is started with the whole Spring Boot application and starts the crawler based on the properties file main/resources/application.properties.

You can set there the following properties:

Property	Description
crawler.max_depth	max depth to crawl
crawler.max_docs	maximum number of docs
crawler.leave_domain_boolean	flag that indicates to search only in the provided domains
crawler.numberOfThreadsToSpawn	number of threads to spawn

Connection

Edit relevant properties in the application.properties file which is located in the same resources folder as above. The following properties must be only adjusted, because we are using postgres anyway:

Property	Description
spring.datasource.url	jdbc complete connection string
spring.datasource.username	username for db authentication
spring.datasource.password	password for db authentication. Should match the password of the set user via spring.datasource.username

Example Configuration:

spring.datasource.url=jdbc:postgresql://localhost:5432/project
spring.datasource.username=postgres
spring.datasource.password=postgres

Task3, CLI

For the CLI requested in task 3, use the interface in src/main/java/com/example/main/CLI.

Arguments are the following: @query, @k, @typeOfSearch (conjunctive or disjunctive)

CLI to schedule task for tf_idf score update in DB

It starts a daemon thread to refresh the tf_idf scores in the DB after some interval Path: src/main/java/com/example/main/CLIIndexing

Arguments:

Property Name	Description
indexing.interval.milliseconds	Time between two indexing processes

##Going live! The TUgle web interface is accessible at:

http://isproj-vm01.informatik.uni-kl.de:8080/is-project/index.html

The json interface is accessible at:

http://isproj-vm01.informatik.uni-kl.de:8080/is-project/json?query=#1&k=#2&score=#3

Replace the argument placeholder #1,#2, and #3 by:

Argument	Description	Example
#1	Keywords separated by +	database+course
#2	Integer that represents the maximum number of results	10
#3	Integer that represents the scoring function:	3
	1 - TF*IDF
	2 - Okapi BM25
	3 - Combined Score

The TUgle IMAGES web interface is accesible at: http://isproj-vm01.informatik.uni-kl.de:8080/is-project/images.html

Metasearch Engine

Go to the following page:
http://isproj-vm01.informatik.uni-kl.de:8080/is-project/metasearch/ Add "cutoff" paramter in query string to adjust cutoff limit for chosen search engines. Default is 10. e.g: http://isproj-vm01.informatik.uni-kl.de:8080/is-project/metasearch/results?query=beautiful&cutoff=1

Ad Placement on TUgle

Go to the following page:
http://is-proj-vm01.informatik.uni-kl.de:8080/is-project/ads.html

There you can place ads by filling in the form and submitting it.
You can place free ads under the following link:
http://is-proj-vm01.informatik.uni-kl.de:8080/is-project/free-ads

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
WebSearchEngine		WebSearchEngine
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Search Engine

Build instructions

Usage instructions

Crawler

Connection

Task3, CLI

CLI to schedule task for tf_idf score update in DB

Metasearch Engine

Ad Placement on TUgle

About

Releases

Packages

Languages

zaid-tariq/WebSearchEngine

Folders and files

Latest commit

History

Repository files navigation

Web Search Engine

Build instructions

Usage instructions

Crawler

Connection

Task3, CLI

CLI to schedule task for tf_idf score update in DB

Metasearch Engine

Ad Placement on TUgle

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages