Assignment 2

Part 1:

In this part of the assignment, you will compare three overlay networks: Freenet, Gnutella (with extensions) and Chord. Make a table with four columns where the first column should list all the characteristics or criteria that you use to compare the three networks. The three other columns should describe the similarities or differences of the three networks in terms of the criteria in the first column. Make sure you list a sufficient number of criteria. For Freenet, please read the following paper:

Protecting Free Expression Online with Freenet. Ian Clarke, Theodore W. Hong, Scott G. Miller, Oskar Sandberg, and Brandon Wiley, IEEE Internet Computing, 2002.

For Gnutella and Chord, please use the week 10 papers.

Answer 1:

Criteria Freenet Gia Chord
1. Routing algorithm Steepest-ascent hill climbing biased towards nodes with GUID close to GUID of search key. Random walk biased towards high capacity nodes. Distributed hash table that uses hash IDs to index into routing table.
Routing updates When query results are returned (eager changes), when nodes join or when queries timeout (periodic changes). Topology adaptation algorithm (periodic changes), when nodes join or when dead nodes are purged (periodic changes). When nodes join or leave the system (eager changes), or nodes timeout (periodic changes).
Churn Low cost due to periodic routing updates. Low cost due to periodic routing updates. High cost due to eager routing updates. O(log2(n))
Search type Exact key value based on GUID. Keyword match based on indexes. Exact key value based on hash ID.
Search failure When TTL is exceeded or due to churn. When lookup path is long and nodes in path leave the system (churn). Due to churn. Most affected by churn.
Insert/search time Worst case: O(n), average case: O(n0.28) based on experiments. Median path length is 8 for 10000 nodes. Worst case: O(n), average case: depends on data replication factor. Median path length is 16 for 10000 nodes and replication factor of 10. Worst case: O(log(n)) without churn, average case: O(log(n)). With churn, not clear. Median path length is 6 for 10000 nodes.
Routing table size Depends on disk capacity. Depends on network capacity. Fixed. m, where 2m is size of ID space.
Network locality Not considered during routing. Not considered. Uses location tables to route to near nodes.
 
2. Replication and caching No explicit replication. Data cached on the query return path. No explicit replication. Data cached by nodes that receive results of a query. Index cached by neighbors. Explicit replication at r nearest successor nodes.
 
3. Data consistency Data can't be updated so no consistency issue. An update would create a new file (since data contents are identified by their hash). This would cause the problem that the new hash has to be known to search for the new file. Data is identified by keywords and not consistent. Files with same keywords can have different content and vice versa. Updates and deletes allowed. Eventual consistency among replicas.
 
4. Load balancing Both network and storage Both network and storage Neither
Network Nodes sends queries to nodes that have answered in the past which leads to pruning low network capacity nodes. Caching reduces data hot-spots. No explicit flow control algorithm. Network capacity directly taken into account in topology adaptation and flow-control algorithm. Mainly based on timing out on dead nodes.
Storage Nodes can drop any data or routing information based on disk capacity. Nodes can drop any data based on disk capacity. However, it is unclear whether they can drop index data (one-hop replication) or how this would affect the performance of Gia. All nodes must store roughly the same number of keys. Storage balancing would be complicated.
 
5. Security Secure except nodes can acquire any GUID by only contacting "friends". None. None.
Privacy and Anonymity System hides who stores data and who searched for data None. None.
Integrity Data is identified by hash and signed by owner. Malicious user can modify data. Malicious user can modify data.
Debugging How do you do it? Hard to know global state of system since node only knows about its neighbors. Also, due to caching during a search, debugging a search would change the state of the system (Heisenberg). Conceptually easy? Conceptually easy?



Part 2:

The Gnutella paper proposes 4 extensions to improve the original Gnutella protocol. Would these extensions apply to Freenet? If not, please explain why. If they do apply, then explain in a paragraph or two how these extensions to Freenet would change your comparison of Freenet with the other networks that you showed in Part 1 of this assignment.

Answer 2:

Gia proposes 4 extensions to the Gnutella protocol: topology adaptation, one-hop replication, flow control and biased random-walk search protocol. Of these extensions, only the flow control extension would be beneficial to Freenet.

One of the main differences between Freenet and Gnutella is that Freenet uses GUIDs for searching while Gnutella has no notion of a key and uses keywords for searching. This makes Gnutella much more powerful in terms of search capabilities but it introduces the problem that a biased random walk is hard to implement (on what basis do you bias your search). Gia enhances Gnutella by using node capacity to implement a biased random walk. Freenet uses the GUID to bias its search (the search is directed towards nodes with GUID close to the search key). As a result, Gia's topology adaptation is not going to be very useful in Freenet. In fact, it could hinder Freenet's biasing mechanism. Further, it could lead to a malicious node claiming large capacity and controlling a large amount of data in Freenet. However, this problem already exists in Freenet. Another way to see that Freenet already does topology adaptation is to notice that when a node returns results it automatically becomes a neighbor for other nodes. So high capacity nodes will automatically have more neighbors.

The one-hop replication could not be implemented because it requires that a node know its neighbors contents which is not provided in Freenet. Interestingly, this extension is not needed in Freenet (even if it was possible to implement it) because Freenet already does caching of data along the query return path. I believe this caching is probably more effective for reducing hot-spots than the one-hop replication extension.

The flow control extension would improve the network load balancing in Freenet (see network load balancing in Part 1). Currently, the only way a Freenet node performs network load balancing is by routing queries to nodes that have answered to its queries in the past. However, some of these nodes could be overloaded. A Freenet node would have to timeout on these nodes before trying the next one. With flow control, the node would not have to timeout on heavily loaded nodes. One challenge in implementing flow control is that the notion of a neighbor is asymmetric in Freenet. A node A may know about B but B may not know about A. To implement flow control, A would have to "pull" tokens from B either when B replies to its queries or it could do so periodically. A token "push" mechanism would not be possible.

As explained above, Freenet already implements a biased random-walk search protocol so again this Gia extension would make no difference to Freenet. However, the Freenet search protocol could use a flow-control improve network load balancing.