Analyzes a server's log files to uncover most popular posts and most popular authors.
This is a project from Udacity's Full Stack Nanodegree program. Its intent is to show how SQL can be used to analyze a website log.
Provided you'll find a Vagrant environment with all dependencies.
To run Vagrant, you will need to install Vagrant and VirtualBox.
From within the project root:
- Start Vagrant environment
$ cd vagrant
$ vagrant up
It can take a little while for this to complete since it needs to download and install everything 2. SSH into Vagrant environment:
$ vagrant ssh
- Change to project directory
$ cd /vagrant
- Load data into the
news
database. A test log is included:
$ psql -d news -f newsdata.sql
- Create database views:
$ python3 create_db_views.py
- Run the analyzer:
$ python3 log_analyzer.py
If all works, you should get 3 tables like this:
----------------Popular Articles-----------------
Candidate is jerk, alleges rival | 338647 views
-------------------------------------------------
Bears love berries, alleges bear | 253801 views
-------------------------------------------------
Bad things gone, say good people | 170098 views
Data is retrieved from the database using 3 functions:
# Returns 3 most popular articles
get_three_popular_articles()
# Returns a list of all authors, in order of popularity (number of views)
get_popular_authors()
# Returns the percentage of errors (404 errors / total request) grouped by day
# min is the minimum error percentage to filter
get_error_data(min=0)
There is also a helper module called table_maker
to handle formatting the data
Original virtual machine from Udacity