This project explores the fundamentals of web server development, focusing on socket programming, HTTP/1.1 protocol, and the request-response cycle. We've crafted a robust and efficient web server from scratch, empowering developers with the knowledge and skills to understand and implement web communication in its entirety.
- Socket Programming: Master the art of establishing connections, transmitting data, and managing communication channels between clients and servers.
- HTTP/1.1 Protocol: Explore HTTP methods, headers, status codes, and message formats to interpret and construct HTTP messages accurately.
- Request-response lifecycle: Understand the heartbeat of web communication, from client requests to server processing and response delivery.
-
- Multiplexing with kqueue and epoll: Utilize efficient event notification mechanisms for handling multiple connections concurrently, enhancing the scalability and performance of the web server.
To get started with our web server project, follow these steps:
-
Clone the Repository:
git clone https://github.com/hel-mefe/Nginx-similar-webserver-42.git
-
Build the Project: Navigate to the project directory and build the project using the Makefile
make
-
Run the Web Server: Execute the compiled binary to start the web server.
./webserv [configuration_file]
-
Access the Web Server: Open a web browser and navigate to
http://localhost:port
(replaceport
with the port number configured in the server).
Our web server supports a range of functionalities, including:
- Serving static content (HTML, CSS, JavaScript, etc.).
- Handling dynamic requests (CGI scripts, server-side scripting, etc.).
- Logging server activity and client requests for analysis and troubleshooting.
- Configurable settings for port number, server root directory, etc.
- GET, POST, PATCH, PUT, TRACE, OPTIONS, DELETE are all supported
- Keep-alive is also supported
Our configuration file syntax is following the recursive dfs approach, each location's configurations are prioritized over the server block, and each server block has the priority over the http block, exactly as nginx behaves.
- root [string]: defines the root directory
- allowed_methods [METHOD1, METHOD2 ...]: or * asterisk for all the methods, identifies the http methods allowed in the whole webserver
- client_max_body_size [number]: identifies the max body size per request
- client_max_request_timeout [number]: identifies the max request time the webserver waits for the request
- client_max_uri_size [number]: identifies the max uri size in term of how many characters the webserver can accept
- cgi_max_request_timeout [number]: specifies number of seconds to wait for the cgi process before it gets killed, the default value is 30 seconds
- keep_alive_max_timeout [number]: specifies the number of seconds to wait in a keep alive connection when the client has written nothing, the default value is 65 seconds
- multiplexer [takes one of these 'kqueue' or 'epoll' or 'poll' or 'select']: specifies the multiplexer used for simultaneous connections, kqueue is the default one for FreeBSD and Apple distributions meanwhile Epoll is the default one for Linux distributions
- support_cookies [on/off]: specifies if the webserver supports cookies or not
- proxy_cache [on/off]: specifies if the server should serve requests from the cache or not, the default is off
- proxy_cache_register [on/off]: specifies if the server should register the requests that require some processing for future use the default is off
- proxy_cache_max_time [time] ex. (10s, 10m, 10h, 10d): specifies the period of time a request should get cached, 3 days is the default
- proxy_cache_max_size [size] ex. (10by, 10kb, 10mb, 10gb): specifies max size of the caches that the server should never surpass, 12mb is the default
- server_name [string]: specifies the server name
- listen [number]: specifies the port the virtual server should listen on
- try_index_files [index1, index2, ... indexN]: takes index files that should be served as indexes in case a directory has been requested rather than a normal file
- try_404_files [file1, file2, ... fileN]: takes 404 files that should be served in case the requested path was not found
- allowed_methods [*] for all the methods or [METHOD1, METHOD2, ... METHOD_N]: takes multiple methods that should be supported by the server
- root [string]: takes the root directory and starts serving files starting from it
- client_max_connections [number]: takes the number of the maximum simultaneous connections that a server can handle
- error_page [error_number] [error_file]: takes the number of the error and whenever the server has to respond with that http code it servers the provided file as http body response
- cgi_bin [.extension] [/bin_path]: takes the extension and maps it with the bin cgi path that should be run in case a file with the provided extension has been provided
- max_client_request_timeout: maximum time the server should wait for the client to write something as request, the default is 2 seconds in case not provided
- max_client_body_size: maximum body size the server should consider from the client as http request
- cgi_max_request_timeout [number]: specifies number of seconds to wait for the cgi process before it gets killed, the default value is 30 seconds
- keep_alive_max_timeout [number]: specifies the number of seconds to wait in a keep alive connection and the client has written nothing, the default value is 65 seconds
- location [location]: defines a location block inside the server block more details about it below
- try_index_files [index1, index2, ... indexN]: takes index files that should be served as indexes in case this directory has been requested
- try_404_files [file1, file2, ... fileN]: takes 404 files that should be served in case the requested path was not found
- redirect [path]: takes a path and redirects the client to it whenever this location has been requested
- support_cookies [on/off]: specifies if the requested directory supports cookies or not
- directory_listing [on/off]: the default value depends on the server block, if specifies on then whenever there is no file to serve and this location is requested the directory_listing will run
http
{
keep_alive_max_timeout 2s
client_max_uri_size 200
client_max_body_size 4by
support_cookies on
server
{
listen 8080
server_name server_1
cgi_bin .pl /usr/bin/perl
root httpdocs/www
error_page 404 404.png # NOT FOUND
error_page 400 400.png # BAD REQUEST
error_page 405 405.png # METHOD NOT ALLOWED
error_page 414 414.png # URI MAX
error_page 501 501.png # NOT IMPLEMENTED
error_page 500 500.png # Internal Server Error
directory_listing on
cgi_bin .php bin/php-cgi
}
}
Understand the HTTP Protocol: Before diving into web server development, it's crucial to have a solid understanding of the HTTP protocol. Familiarize yourself with HTTP methods (GET, POST, etc.), status codes (200 OK, 404 Not Found, etc.), headers, and the overall request-response cycle. There is no place better than their official RFC's to read from, feel free to read about them from the official RFC's.
Set Up TCP Socket: If you're not using a networking library, start by creating a TCP socket to listen for incoming connections. Use the socket() function to create the socket, bind() to associate it with a port, and listen() to start listening for incoming connections.
Choose a Multiplexing Syscall: You have the choice between select and poll as a low-level syscalls, but the issue with these ones is that they are linear functions which mean they take O(N) time complexity, you have to iterate over all the sockets in your webserver to figure out which one is ready for I/O operations. There are some high-level ones as Kqueue in FreeBSD and MacOS distributions, in addition to Epoll which is a good multiplexing method if you're on a Linux distribution.
Accept Incoming Connections: Once the server socket is set up, use the accept() function to accept incoming connections from clients. This function will return a new socket for each client connection, which you can then use to send and receive data.
Handle HTTP Requests: Receive and parse HTTP requests from clients. Extract information such as the request method, requested URL, headers, and body. Depending on the request method and URL, decide how to handle the request (e.g., serve static files, execute dynamic content, etc.).
Generate HTTP Responses: Based on the request received, generate an appropriate HTTP response. This may involve reading files from the server's file system, executing server-side scripts, or generating dynamic content. Construct the HTTP response with the correct status code, headers, and content.
Send Response to Client: Once the HTTP response is generated, send it back to the client over the established connection. Use the send() or write() function to send the response data over the socket.