Version 1.0 |
Anurag Singla |
Document History 1.0 Nov/26/2002 Added discussion of leaves with multiple ultrapeers. "Draft" Dec/18/2001 First public version. Abstract: We propose a scheme to have a hierarchical gnutella network by categorizing the nodes on the network as client-nodes and super-nodes. A client-node keeps only one connection open, and that is to a super-node. A super-node acts as a proxy to the gnutella network for the client nodes connected to it. This has a effect of making the gnutella network scale, by reducing the number of nodes on the network involved in message handling and routing, as well as reducing the actual traffic among them.
The Gnutella network is a peer-to-peer network, all the hosts on the network are treated equal regardless of their bandwidth capabilities or computation power or other properties (like uptime, connection capability etc). Nodes on the network communicate using either of Gnutella Protocol 0.4 [PROT04], or an enhanced version that uses new Connection Handshaking mechanism [GC01], both of which have many interesting properties. New hosts can join any time, and the current hosts on the network may leave the network any time. The ability to have a dependable network, without dependence on any particular host (or set of hosts), is a remarkable feature of gnutella that has led to its immense popularity. Along with this there are many other remarkable features of gnutella network [WHYGNUT01] that distinguish it from other systems.
But along with all the good features that gnutella possesses, there have always been doubts cast upon the scalability of gnutella network. That’s primarily because of the overabundance of messages flowing thru the gnutella network including broadcast pings and queries, as well as the inability to assure content availability close enough to the searching host. Queries are sent to large number of hosts in an attempt to secure a hit from few of them. This has led to bandwidth problems in the past as well as in the present.
Gnutella developers have taken a number of very promising initiatives to reduce the number of messages flowing in the gnutella network, by making the methods more directed as well as carry more information. Notable among these are Ping Pong Scheme Proposal [CV01] which reduces the number of messages by eliminating the ones not contributing much to the system, Meta Information Search Proposal (S01) which attempts to make search more specific, and Query Routing Proposal [C00] which reduces the query traffic by doing selective broadcast.
Although the above initiatives are a promising step towards making the network efficient, there definitely exist other supplementary schemes, one of which is the scheme proposed in this paper that has a major potential of making network scalable and efficient. We propose a scheme which if implemented will create a two level hierarchy of nodes in the gnutella network, where faster nodes (ones having better networking capabilities and CPU power) take charge of much of the load on slower nodes. In the proposed scheme this restructuring of the network takes place automatically and dynamically, and works in unison with the other initiatives mentioned above. The restructured network will have a effect of significant reduction in the number of messages flowing thru the gnutella network.
At the same point, we would like to mention that the concept of the ultrapeers is not new, and has been in existence in variety of applications (including proxy access to web which has static ultrapeer selection, as well as distributed transaction processing which has dynamic ultrapeer selection algorithm). Also it has been applied to recent peer-to-peer applications using FastTrack [FTRACK], or other such similar technologies. The application of the ultrapeer concept in Peer-to-Peer application has been done using proprietary algorithms and protocols. Our contribution in this paper is to come up with a scheme to implement the dynamic ultrapeer selection mechanism in the Gnutella network. No such scheme has been published as yet. Our approach may or may not be similar to the proprietary ones in existence, with which we have no way to compare.
We present our scheme as follows. First we describe how ultrapeers work in an ideal network with a static topology. In doing so, we discuss options for ultrapeers to index [xyz1] leaf nodes. Next we tackle the more difficult problem of “ultrapeer election”, i.e., of dynamically constructing a tiered network. We describe a handshaking mechanism based on the recent Gnutella 0.6 protocol in detail and the concept of “ultrapeer guidance”. Finally, we conclude with some performance observations and directions for future work.
Nodes in this network can be divided into leaf nodes and ultrapeers. Leaf nodes maintain only a single connection to a ultrapeer. Ultrapeers maintain many (10-100) leaf connections, as well as a small (<10) number of connections to other ultrapeers. Ultrapeers shield leaf nodes from virtually all ping and query traffic. This can be done in one of two ways:
- Reflector indexing: in the style of the Clip2 Reflector [xyz2], ultrapeers periodically send an indexing query (four spaces, i.e., “ ”) to leaf nodes. Leaf nodes respond with a query reply naming all shared files. (The leaf can send multiple query reply messages if sharing more than 255 messages.) The ultrapeer uses these replies to build an index of its leaves’ contents. The ultrapeer then responds to queries on behalf of leaf nodes, shielding them from all query traffic.
- QRP routing: leaf nodes periodically send ultrapeers query routing tables using LimeWire’s proposed Query Routing Protocol [xyz3] (QRP). Ultrapeers then forward queries only to those leaf nodes whose QRP table has a corresponding entry. Leaves respond in the normal manner. Ultrapeers do not propogate QRP tables amongst themselves. [1]
Hybrid schemes are of course possible. However, we recommend QRP routing for several reasons [xyz4]:
- Better QHD support: because leaf nodes are given the chance to respond to all queries, they can provide up-to-the-minute information in the QHD, such as estimated download speed and busy status.
- Update efficiency: because QRP tables eliminate redundant keywords and add compression, they are likely to be significantly smaller than replies to indexing queries. Furthermore, a leaf node only needs to send incremental QRP updates when its shared files have changed. In contrast, a reflector-style ultrapeer would need to periodically re-index all of the leaves’ files—this takes more bandwidth.
- CPU efficiency: it is very cheap (constant time) for a ultrapeer to decide whether to forward a query to a leaf node with the QRP proposal. The reflector-style scheme can be implemented efficiently, but it is considerably more difficult.
- Privacy: ultrapeers do not actually know what files are shared by leaf nodes—only those files’ hashes.
- Ease of implementation: LimeWire has already implemented QRP, and the code is available to the public under the GNU Public License (GPL). Maintaing a single QRP table per connection is easier to implement than building a “virtual file manager” for all connections.
A more difficult question is how ultrapeers communicate amongst themselves. Initially we recommend simple broadcast as with the current Gnutella protocol. Eventually ultrapeers may use more complicated protocols. One idea is for a ultrapeer to not forward a query if it receives a large number of replies from its leaf nodes. QRP could also be used between ultrapeers.
The proposed scheme is backward compatible with older clients. From the perspective of newer clients, old clients are simply ultrapeers that do not maintain any shielded leaf connections. The scheme benefits the network even in the presence of old clients, though of course it would be better if all clients supported ultrapeers.
Now we return to the question of how the ultrapeer-leaf hiearchy is created in a distributed manner. Hosts regularly determine whether they are eligible to be ultrapeers (“ultrapeer capable”) by looking at uptime, operating system, bandwidth, etc. We recommend the following requirements for all ultrapeers:
- Not firewalled. This can usually be approximated by looking at whether the host has received incoming connections.
- Suitable operating system. Some operating systems handle large numbers of sockets better than others. Linux, Windows 2000/NT/XP, and Mac OS/X will typically make better ultrapeers than Windows 95/98/ME or Mac Classic.
- Sufficient bandwidth. We recommend at least 15KB/s downstream and 10KB/s upstream bandwidth. This can be approximated by looking at the maximum upload and download throughput.
- Sufficient uptime. Ultrapeers should have long expected uptimes. A reasonable heuristic is that the expected future uptime is proportional to the current uptime. That is, nodes should not become ultrapeers until they have been running for several minutes.
It is important to distinguish between hosts that are ultrapeer capable and hosts that are actually ultrapeers:
- Ultrapeer: hosts that are ultrapeer capable and not a shielded leaf node to an ultrapeer. Note that an ultrapeer does not necessarily have leaf connections.
- Leaf: hosts that have only a single, routed connection to an ultrapeer. Note that leaves may actually be ultrapeer capable.
When hosts connect, they express their capabilities and intents using Gnutella 0.6 [xyz5] connection headers. In some cases, this will require some negotiation to prevent too many nodes from becoming ultrapeers. There are five possible cases, shown below.
The simplest case is a leaf node connecting to a ultrapeer. The key header here is "X-Ultrapeer", which tells whether a host plans on acting as a ultrapeer (if true) or a shielded node (if false). When this interaction is over, the leaf is a shielded node of the ultrapeer. The leaf should drop any Gnutella 0.4 connection is have and send a QRP routing table.
Client Server GNUTELLA CONNECT/0.6
User-Agent: LimeWire 1.9
X-Ultrapeer: False
X-Query-Routing: 0.1
X-My-Address: 10.254.0.16:6349GNUTELLA/0.6 200 OK
User-Agent: LimeWire 1.9
X-Ultrapeer: True
X-Ultrapeer-Needed: false
X-Try-Ultrapeers: 23.35.1.146:6346,18.207.63.25:6347
X-Try: 24.37.144:6346,193.205.63.22:6346
X-Query-Routing: 0.1
X-My-Address: 10.254.0.16:6346GNUTELLA/0.6 200 OK The meaning of the other headers are as follows:
- User-Agent: the identity of the vendor
- X-Ultrapeer-Needed: used to balance the number of ultrapeers. This is discussed in detail in section 3.5 below.
- X-Query-Routing: the version of query routing to be used between leaf and ultrapeer, IF a shielded leaf/ultrapeer connection is established.
- X-My-Address: the host's address and port reported in query replies and pongs.
- X-Try: addresses of non-ultrapeer hosts to try if the connection is lost.
- X-Try-Ultrapeers: addresses of ultrapeer to try if the connection is lost.
These headers are sent in almost all interactions. However, for clarity, non-essential headers will be ommitted in the remaining examples. It is important to note that headers can be sent in any order. Also, case is ignored in "True" and "False".
The next case we consider is a ultrapeer-incapable node A connecting to a leaf node B who happens to have a connection to a ultrapeer S. In this case, B will refuse to accept the connection, redirecting A instead to S. Note that B does not send "200 OK" in its response.
Client (A) Server (B) GNUTELLA CONNECT/0.6
X-Ultrapeer: FalseGNUTELLA/0.6 503 I am a shielded leaf node
X-Ultrapeer: False
X-Try-Ultrapeers: 18.2.3.14:6346, 18.1.17.2:6346
[terminates connection]At this point, A should should try to establish a connection to S, as in case (1) above.
Sometimes nodes will be ultrapeer-incapable but unable to find a ultrapeer. In this case, they behave exactly like old, unrouted Gnutella 0.4 connections.
Client Server GNUTELLA CONNECT/0.6
X-Ultrapeer: FalseGNUTELLA/0.6 200 OK
X-Ultrapeer: FalseGNUTELLA/0.6 200 OK The original ultrapeer proposal involved two TCP connections in this case. This version is much simpler. Note: the server response in this case will probably not include the "X-Ultrapeer: false" header, if the Server simply ignores or does not implement or comply with this Leaf/Ultrapeer protocol.
When two ultrapeers meet, both set X-Ultrapeer true. If both have leaf nodes, they will remain ultrapeers after the interaction. Note that no QRP route table is sent between ultrapeers after the connection is established.
Client Server GNUTELLA CONNECT/0.6
X-Ultrapeer: TrueGNUTELLA/0.6 200 OK
X-Ultrapeer: TrueGNUTELLA/0.6 200 OK Note that the X-Ultrapeer-Needed header is ignored in this case. This is discussed in the next section.
Sometimes there will be too many ultrapeer-capable nodes on the network. Consider the case of an ultrapeer A connecting to an ultrapeer B. If B doesn’t have enough leaves, it may direct A to become a leaf node. If A has no leaf connections, it stops fetching new connections, drops any Gnutella 0.4 connections, and sends a QRP table to B. (If A has leaf connections, it obeys the guidance, as in the above case.) Then B will shield A from all traffic.
Client (A) Server (B) GNUTELLA CONNECT/0.6
X-Ultrapeer: TrueGNUTELLA/0.6 200 OK
X-Ultrapeer: True
X-Ultrapeer-Needed: falseGNUTELLA/0.6 200 OK
X-Ultrapeer: False
In the original version of this proposal, leaves only connected to a single ultrapeer. An initial implementation quickly showed a problem with this scheme; leaves connected to overloaded or malicious ultrapeers received no responses. A simple solution is for leaves to connect to k>1 ultrapeers, reducing the variance of the number of search results.
If leaves have too many ultrapeer connections, they consume valuable slots and increase network traffic. For this reason, leaves MUST NOT connect to more than 3 ultrapeers by default. Leaves SHOULD make it difficult for the user to adjust these defaults and MUST NOT ever allow the user to connect to more than 10 ultrapeers.
Leaves MUST NOT forward messages to ultrapeers or otherwise route between ultrapeer connections. Doing otherwise would overload ultrapeers. In other words, all messages sent from a leaf L to an ultrapeer originated at L and have a hops value of zero.
Footnotes
[1] Readers familiar with the QRP protocol will note that only some of the protocol is actually used. In particular, all route table entries will be limited to INFINITY or 1.
Comments