Ultrapeers:

Another Step Towards Gnutella Scalability

Version 1.0

Anurag Singla
Christopher Rohrs
Lime Wire LLC

Document History
1.0 Nov/26/2002 Added discussion of leaves with multiple ultrapeers.
"Draft" Dec/18/2001 First public version.

Abstract: We propose a scheme to have a hierarchical gnutella network by categorizing the nodes on the network as client-nodes and super-nodes. A client-node keeps only one connection open, and that is to a super-node. A super-node acts as a proxy to the gnutella network for the client nodes connected to it. This has a effect of making the gnutella network scale, by reducing the number of nodes on the network involved in message handling and routing, as well as reducing the actual traffic among them.

1. Introduction

The Gnutella network is a peer-to-peer network, all the hosts on the network are treated equal regardless of their bandwidth capabilities or computation power or other properties (like uptime, connection capability etc). Nodes on the network communicate using either of Gnutella Protocol 0.4 [PROT04], or an enhanced version that uses new Connection Handshaking mechanism [GC01], both of which have many interesting properties. New hosts can join any time, and the current hosts on the network may leave the network any time. The ability to have a dependable network, without dependence on any particular host (or set of hosts), is a remarkable feature of gnutella that has led to its immense popularity. Along with this there are many other remarkable features of gnutella network [WHYGNUT01] that distinguish it from other systems.

But along with all the good features that gnutella possesses, there have always been doubts cast upon the scalability of gnutella network. That’s primarily because of the overabundance of messages flowing thru the gnutella network including broadcast pings and queries, as well as the inability to assure content availability close enough to the searching host. Queries are sent to large number of hosts in an attempt to secure a hit from few of them. This has led to bandwidth problems in the past as well as in the present.

Gnutella developers have taken a number of very promising initiatives to reduce the number of messages flowing in the gnutella network, by making the methods more directed as well as carry more information. Notable among these are Ping Pong Scheme Proposal [CV01] which reduces the number of messages by eliminating the ones not contributing much to the system, Meta Information Search Proposal (S01) which attempts to make search more specific, and Query Routing Proposal [C00] which reduces the query traffic by doing selective broadcast.

Although the above initiatives are a promising step towards making the network efficient, there definitely exist other supplementary schemes, one of which is the scheme proposed in this paper that has a major potential of making network scalable and efficient. We propose a scheme which if implemented will create a two level hierarchy of nodes in the gnutella network, where faster nodes (ones having better networking capabilities and CPU power) take charge of much of the load on slower nodes. In the proposed scheme this restructuring of the network takes place automatically and dynamically, and works in unison with the other initiatives mentioned above. The restructured network will have a effect of significant reduction in the number of messages flowing thru the gnutella network.

At the same point, we would like to mention that the concept of the ultrapeers is not new, and has been in existence in variety of applications (including proxy access to web which has static ultrapeer selection, as well as distributed transaction processing which has dynamic ultrapeer selection algorithm). Also it has been applied to recent peer-to-peer applications using FastTrack [FTRACK], or other such similar technologies. The application of the ultrapeer concept in Peer-to-Peer application has been done using proprietary algorithms and protocols. Our contribution in this paper is to come up with a scheme to implement the dynamic ultrapeer selection mechanism in the Gnutella network. No such scheme has been published as yet. Our approach may or may not be similar to the proprietary ones in existence, with which we have no way to compare.

We present our scheme as follows. First we describe how ultrapeers work in an ideal network with a static topology. In doing so, we discuss options for ultrapeers to index [xyz1] leaf nodes. Next we tackle the more difficult problem of “ultrapeer election”, i.e., of dynamically constructing a tiered network. We describe a handshaking mechanism based on the recent Gnutella 0.6 protocol in detail and the concept of “ultrapeer guidance”. Finally, we conclude with some performance observations and directions for future work.

2. Routing with Ultrapeers

Nodes in this network can be divided into leaf nodes and ultrapeers. Leaf nodes maintain only a single connection to a ultrapeer. Ultrapeers maintain many (10-100) leaf connections, as well as a small (<10) number of connections to other ultrapeers. Ultrapeers shield leaf nodes from virtually all ping and query traffic. This can be done in one of two ways:

Hybrid schemes are of course possible. However, we recommend QRP routing for several reasons [xyz4]:

A more difficult question is how ultrapeers communicate amongst themselves. Initially we recommend simple broadcast as with the current Gnutella protocol. Eventually ultrapeers may use more complicated protocols. One idea is for a ultrapeer to not forward a query if it receives a large number of replies from its leaf nodes. QRP could also be used between ultrapeers.

The proposed scheme is backward compatible with older clients. From the perspective of newer clients, old clients are simply ultrapeers that do not maintain any shielded leaf connections. The scheme benefits the network even in the presence of old clients, though of course it would be better if all clients supported ultrapeers.

3. Ultrapeer Election

Now we return to the question of how the ultrapeer-leaf hiearchy is created in a distributed manner. Hosts regularly determine whether they are eligible to be ultrapeers (“ultrapeer capable”) by looking at uptime, operating system, bandwidth, etc. We recommend the following requirements for all ultrapeers:

It is important to distinguish between hosts that are ultrapeer capable and hosts that are actually ultrapeers:

When hosts connect, they express their capabilities and intents using Gnutella 0.6 [xyz5] connection headers. In some cases, this will require some negotiation to prevent too many nodes from becoming ultrapeers. There are five possible cases, shown below.

3.1. Leaf to Ultrapeer

The simplest case is a leaf node connecting to a ultrapeer. The key header here is "X-Ultrapeer", which tells whether a host plans on acting as a ultrapeer (if true) or a shielded node (if false). When this interaction is over, the leaf is a shielded node of the ultrapeer. The leaf should drop any Gnutella 0.4 connection is have and send a QRP routing table.

Client Server
GNUTELLA CONNECT/0.6
User-Agent: LimeWire 1.9
X-Ultrapeer: False
X-Query-Routing: 0.1
X-My-Address: 10.254.0.16:6349
 
  GNUTELLA/0.6 200 OK
User-Agent: LimeWire 1.9
X-Ultrapeer: True
X-Ultrapeer-Needed: false
X-Try-Ultrapeers: 23.35.1.146:6346,18.207.63.25:6347
X-Try: 24.37.144:6346,193.205.63.22:6346
X-Query-Routing: 0.1
X-My-Address: 10.254.0.16:6346
GNUTELLA/0.6 200 OK  

The meaning of the other headers are as follows:

These headers are sent in almost all interactions. However, for clarity, non-essential headers will be ommitted in the remaining examples. It is important to note that headers can be sent in any order. Also, case is ignored in "True" and "False".

3.2. Leaf to Shielded Leaf

The next case we consider is a ultrapeer-incapable node A connecting to a leaf node B who happens to have a connection to a ultrapeer S. In this case, B will refuse to accept the connection, redirecting A instead to S. Note that B does not send "200 OK" in its response.

Client (A) Server (B)
GNUTELLA CONNECT/0.6
X-Ultrapeer: False
 
  GNUTELLA/0.6 503 I am a shielded leaf node
X-Ultrapeer: False
X-Try-Ultrapeers: 18.2.3.14:6346, 18.1.17.2:6346
[terminates connection]

At this point, A should should try to establish a connection to S, as in case (1) above.

3.3. Leaf to Unshielded Leaf

Sometimes nodes will be ultrapeer-incapable but unable to find a ultrapeer. In this case, they behave exactly like old, unrouted Gnutella 0.4 connections.

Client Server
GNUTELLA CONNECT/0.6
X-Ultrapeer: False
 
  GNUTELLA/0.6 200 OK
X-Ultrapeer: False
GNUTELLA/0.6 200 OK  

The original ultrapeer proposal involved two TCP connections in this case. This version is much simpler. Note: the server response in this case will probably not include the "X-Ultrapeer: false" header, if the Server simply ignores or does not implement or comply with this Leaf/Ultrapeer protocol.

3.4. Ultrapeer to Ultrapeer

When two ultrapeers meet, both set X-Ultrapeer true. If both have leaf nodes, they will remain ultrapeers after the interaction. Note that no QRP route table is sent between ultrapeers after the connection is established.

Client Server
GNUTELLA CONNECT/0.6
X-Ultrapeer: True
 
  GNUTELLA/0.6 200 OK
X-Ultrapeer: True
GNUTELLA/0.6 200 OK  

Note that the X-Ultrapeer-Needed header is ignored in this case. This is discussed in the next section.

3.5. Ultrapeer to Ultrapeer, with Leaf Guidance

Sometimes there will be too many ultrapeer-capable nodes on the network. Consider the case of an ultrapeer A connecting to an ultrapeer B. If B doesn’t have enough leaves, it may direct A to become a leaf node. If A has no leaf connections, it stops fetching new connections, drops any Gnutella 0.4 connections, and sends a QRP table to B. (If A has leaf connections, it obeys the guidance, as in the above case.) Then B will shield A from all traffic.

Client (A) Server (B)
GNUTELLA CONNECT/0.6
X-Ultrapeer: True
 
  GNUTELLA/0.6 200 OK
X-Ultrapeer: True
X-Ultrapeer-Needed: false
GNUTELLA/0.6 200 OK
X-Ultrapeer: False
 

4. Leaves with Multiple Ultrapeers

In the original version of this proposal, leaves only connected to a single ultrapeer. An initial implementation quickly showed a problem with this scheme; leaves connected to overloaded or malicious ultrapeers received no responses. A simple solution is for leaves to connect to k>1 ultrapeers, reducing the variance of the number of search results.

If leaves have too many ultrapeer connections, they consume valuable slots and increase network traffic. For this reason, leaves MUST NOT connect to more than 3 ultrapeers by default. Leaves SHOULD make it difficult for the user to adjust these defaults and MUST NOT ever allow the user to connect to more than 10 ultrapeers.

Leaves MUST NOT forward messages to ultrapeers or otherwise route between ultrapeer connections. Doing otherwise would overload ultrapeers. In other words, all messages sent from a leaf L to an ultrapeer originated at L and have a hops value of zero.


Footnotes

[1] Readers familiar with the QRP protocol will note that only some of the protocol is actually used. In particular, all route table entries will be limited to INFINITY or 1.


Comments

[xyz1] I don't like using this word. It's not accurate.

[xyz2] Cite...except it doesn't exists any more.

[xyz3] Expand this section...perhaps discuss XML.

[xyz4] Bullet points over-simplifies here. For example up-to-dateness and efficiency are related. There are tradeoffs.

[xyz5] Cite this.