This project focuses on developing a Python CLI tool aiming to provide rapid range queries (logarithmic time complexity) and employed on a dataset consisting of famous computer scientists scraped from Wikipedia.
The current project was carried out as a part of CEID’s course “Decentralized Systems for Big Data Managment and Decision Making” taught by Prof. Spyros Sioutas. While the primary goal was to optimize time complexity, matters such as concurrency support and allowing for simultaneous handling of requests were not a central aspect in our implementation. As such, there are multiple areas that enhancements can be made, including but not limited to improving the web crawler’s efficiency and adding concurrency support in terms of network’s nodes sudden joins and leaves.
In the pursuit of self-improvement, any new suggestions, solutions and potential bug fixes are welcome.
Before using ChordSeek, ensure you have Docker installed and that your user is added to the Docker group. You can follow the official detailed guides for these matters in the links below:
As for the installation of the CLI tool you must install the required libraries through requirements.txt, preferably in a virtual environment(e.g. using venv) as can be seen below:
# Navigate to project's root folder
cd ChordSeek
# Create the venv
python3 -m venv dhtChordVenv
# Activate the virtual environment
# E.g. for an Ubuntu distribution: source dhtChordVenv/bin/activate
pip3 install -r requirements.txt
[!CAUTION]
dhtChordVenv should be the name of the virtual environment because of an unresolved dependency to the empty message definition for Google’s protobuffers. A quick fix is specifying to protoc compiler the path where it would be naturally defined inside gRPC tools installation folder. For a virtual environment created with venv this was located at[the name of your venv]/lib/python3.8/site-packages/grpc_tools/_proto/"
at the time of development.
This is hard coded in line 153 of netsetup:f"--proto_path=dhtChordVenv/lib/python3.8/site-packages/grpc_tools/_proto/"
Any fix better than the current one is a welcome addition.
The project’s implementation is based on the research presented in this paper.
For the purpose of designing and implementing our peer-to-peer chord network we used a state-of-the-art solution, that of Docker containers. Each peer, a chord node, is modeled as a container, inside a standalone docker network.
The initialization of the chord network is carried out by a single node(not part of the Chord) and is responsible for each of the following:
Concerning the implementation of the communication between the nodes, we’ve followed the route of the RPC means of communication, selecting Google’s implementation, the gRPC framework mainly for its simplicity and high performance capabilities. Each node as part of a peer-to-peer network acts both as a server and client to its fellow peers, accepting and responding to their requests.
As far as storage is concerned, each node maintains a SQLite3 database for data persistency.
Figure 1
As mentioned earlier, Chord is a distributed hash table (DHT) protocol specifically designed for efficiently locating data in decentralized peer-to-peer networks. It addresses the challenges of organizing and retrieving information in large-scale distributed systems.
Ring Structure: Chord adopts a ring structure to organize participating nodes, assigning each node a unique identifier(from a fixed identifier space | experimented with 2048 unique values space) based on its hash. This ring structure facilitates efficient key lookup and distribution across the network. |
As already mentioned, the implementation of Project’s Chord protocol was based on the official paper by Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari Balakrishnan. Its functionality relies heavily on the seven key functions outlined in the paper:
In our implementation, in addition to the join function, we have also developed the leave function. This function, along with fix_others() and fix_finger_table(departing_node, index), ensures the proper update of finger tables and consistency in the network when a node leaves.
In a nutshell:
For the evaluation of the overall performance of the Chord lookup key operation, we achieved the following results:
<img src="./images_media/lookup_results.png" alt="Lookup Results Graph" width = 80%>
Figure 2
The results yielded the observation that the average number of hops per lookup is O(logN), where N is the number of nodes in the network. This observation validates the claim made in the original paper, supporting the efficiency of the Chord protocol in minimizing the traversal distance during key retrieval in large-scale networks.
[!WARNING]
Please note that the results were based in a small and uneven dataset.
For the evaluation of the overall performance of the Chord join operation, we achieved the following results:
<img src="./images_media/join_results.png" alt="Join Results Graph" width = 80%>
Figure 3
From the results obtained, we observe that for networks with a number of nodes equal or greater than 32, the total number of messages required to re-establish the Chord routing invariants and finger tables is on the order of O((logN)^2), where N is the number of nodes in the network. This observation validates the corresponding claim made in the original paper for the operation of node joining.
In order to evaluate the overall performance of the Chord leave operation, we achieved the following results:
<img src="./images_media/leave_results.png" alt="Leave Results Graph" width = 80%>
Figure 4
In a similar vein to the join operation, our findings for the leave operation, derived from networks with a node count equal to or greater than 32, indicate that the total number of messages required to re-establish Chord routing invariants and finger tables follows the order of O((logN)^2), where N represents the number of nodes in the network. This parallel observation aligns with the corresponding claim in the original paper, demonstrating consistency in the efficiency of Chord protocol operations, both in node joining and leaving scenarios.
Experience the ChordSeek CLI tool in action through a step-by-step walkthrough. In this section, we’ll guide you through the complete process of creating a Chord network, executing key lookup operations and showcasing node joins and leaves.
Start by setting up the Chord network with a total of 32(adjustable based on your computational power) containers. This initial step establishes the foundation for distributed key management.
We observe that the creation of nodes-containers was successful. The return table informs us about the hash value each container received.
Suppose we want to find the successor of the node with the IP address “10.0.0.31”.
From the returned table after the network creation, we can see that this node is indeed the node with the IP address “10.0.0.18”.
Suppose we want to find the predecessor of the node with the IP address “10.0.0.22”.
From the returned table after the network creation, we can see that this node is indeed the node with the IP address “10.0.0.27”.
Suppose we want to find the Finger Table(FT) of the node with the IP address “10.0.0.33”.
We will now showcase the two following range queries:
Searching for computer scientists of MIT university with a total number of awards greater than or equal to 3.
Searching for computer scientists of University of Cambridge with a total number of awards greater than or equal to 3.
Showcasing a node joining the network:
The new node(with the IP address “10.0.0.34”) is successfully integrated into the network. Its finger table is initialized correctly. Additionaly, it is set as the successor for the node with IP address 10.0.0.12 and as the predecessor for the node with IP address 10.0.0.20.
Showcasing the node with IP Address “10.0.0.10” leaving the network: From the execution of the ‘printnodes’ command, we observe that the node with IP address “10.0.0.10” was the successor of the node with IP address “10.0.0.22” and the predecessor of the node with IP address “10.0.0.7”. After its departure from the network, the values for the above mentioned nodes have been correctly updated as expected.
Metaxakis Dimitris | d.metaxakis@ac.upatras.gr |
Sofotasios Argiris | up1079616@ac.upatras.gr |
Sfikas Theodore | up1072550@ac.upatras.gr |
Distributed under the MIT License. See LICENSE.md
for more details.