Monday, September 21, 2009

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, A. Vahdat, "PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric", ACM SIGCOMM, (August 2009).

This paper presented PortLand, a Layer 2 Network Fabric for modern Data Centers. The authors laid down the following requirements in order to achieve their goal:

  1. Permit VM migration to any physical machine without having the need to change IP address.
  2. Plug and Play Switches.
  3. Efficient Communication.
  4. No forwarding loops.
  5. Rapid and efficient failure detection.
It is interesting to note here that strictly speaking considering current protocols, these funcationalities can not be solved by a protocol at either layer 2 or 3. While Req 1 can be tackled at layer 3 (IP), it fails to provide plug and play functionality to switches. Further, Req 4 can be tackled by layer 2 and 3 both (by spanning tree protocol or TTL resepctively ). Howeover, Req 5 is not met by either of the layers since the routing protocols (ISIS/OSPF) are broadcast based!

As a result, the authors proposed PortLand based on the assumption of a 'Fat Tree Network Topology':

  1. A lightweight protocol to enable switches to discover their position in the topology. It involves a separate centralized system known as the Fabric Manager which is a user process on a dedicated machine responsible for ARP resolution, fault tolerance and multicast. It maintains the soft state about the network topology. It contains IP - PMAC mappings.
  2. A concept of 48 bit pseudo MAC (PMAC) of the form pod.position.port.vmid which helps in encoding a machine's position in the topology. Any switch can query the fabric manager for the PMAC and hence know the exact location of the destination.
  3. Location Discovery Protocol (LDP) which enables switches to detect their exact topology.
  4. Support for ARP, multicast and broadcast.

 Critiques

Overall, the paper was very clear in its requirements, and clearly showed that the implementation design met all those goals. However, my only concern is that this paper depends heavily on the assumption of the fat-tree network topology. I am not really sure if designing an entire protocol based on a specific topology is a good idea.  Though the authors claimed that this will generalize to any multi-rooted topology (over any link mesh?) , however, no concrete data was given to support these claims. Secondly, I was curious as to what PortLand actually stood for? Did I miss reading it somewhere in the paper :-) ?

2 comments:

  1. If you consider that the city Portland is just a little south of Seattle, the choice of the name might become more apparent :-)

    ReplyDelete
  2. Wow, that was neat! And all this time I had been thinking that the authors didn't really compare themselves with SEATTLE :-)

    ReplyDelete