Skip to content

ali-moradzade/information-retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Information Retrieval System

Phase1 - Positional index

The most important steps done in this phase:

  • Preprocessing:
    • Deleting some extra stuff in content
    • Deleting punctuation
    • Doing normalization
    • Doing tokenization
    • Doing stemming
    • Removing stop words
  • Building the positional postings list
  • Searching inside our index:
    • Processing queries
    • Searching one word with positional index
    • Find the documents, that contain all the words in the list
    • Implementing multi word search:
    • Complete Search (also considering Not !)
  • Plotting zipf and heaps

Phase2 - Efficient query responding by heap and champion list

In this phase, we improve IR system accuracy and speed with famous IR techniques.
The most important steps implemented in this phase:

  • tf-idf Vector representation for docs
  • Similarity calculation (cosine-sim)
  • Index elimination
  • Champion list (tf base)
  • K-top extraction by max-heap
  • Pharase query responding (Ranked Retrieval system)

About

information retreival project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published