Saturday, December 19, 2009

Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems

S. Nedevschi, J. Chandrashekar, J. Liu, B. Nordman, S. Ratnasamy, N. Taft, "Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems," NSDI'09, (April 2009).
This paper argues that systems draw a lot of power even when they are idle which should obviously not be the case. Even though various solutions have been proposed over the years to save energy costs in clusters such as wake-on-lan (WoL) or proxies (NCP), there has been no in-depth evaluation of potential energy savings. In this paper, the authors proposed an extensible proxy architecture by analyzing a trace from 250 enterprise users (90% laptops and 10% desktops) at Intel Berkeley Research lab over 4 weeks consisting of common network protocol packets in the network. Their work was essentially a trace-driven evaluation of broader benefits and design tradeoffs  in desiging proxy architectures.

The evaluation of traces showed some really interesting results. On an average, desktops were idle >50% of the time wasting >60% of the energy. This scaled to 6 billion dollars being wasted per year (assuming 170 million desktop PCs in the US). Howeover, a strict wake-on-lan scheme that wakes a node on receiving any packet doesnt work well due to the fact that the incoming traffic is high even when the node is idle. To tackle this, a hybrid scheme is necessary that can proxy for certain protocol packets and wake the hosts for important packets that require user attention. The authors analyzed various such protocols and identified "key offenders" keeping volume of packets and half-time sleep as metrics for identifying them. Once these key offenders were identified, they can be ignored or responded by the proxy or the client can be woken up.

On idetifying various key offenders, the authors designed a general proxy architecture consisting of a set of rules of type (trigger, action). The trigger was based on incoming packet and proxy state while the actions could be drop, wake, respond or redirect. The system involved no modifications to the end systems and could be implemented as a separate network product which makes it extremely useful and easily deployable.

Cutting the Electric Bill for Internet-Scale Systems

A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, B. Maggs, "Cutting the Electric Bill for Internet-Scale Systems," ACM SIGCOMM Conference, (August 2009).
This paper talked about how data center operating costs can be brought down by millions of dollars by offloading computation work at those locations where electricity costs are cheaper at that particular time. The authors argue that costs of electricity are highly variable at various places in the US and have little correlation between them. Further, the costs are highly dynamic which requires transferring computation continuously to different locations in a highly dynamic manner. Further, they introduce a concept of energy elasticity, that is the degree to which the energy consumed by a cluster depends on the load placed on it. The maximum reduction in cost their approach can achieve hinges on this factor. Moreover, it also assumes that change in bandwidth due to routing additional data does not affect the overall costs.

This paper mainly involved an extensive survey of energy costs and the feasibility of such a scheme based on Akamai systems. It modeled energy consumption and the increase in routing energy to show as a proof of concept that such a routing scheme could be profitable to implement for an organization with large datacenters at different geographical locations. The effectiveness of the proposal lies in mainly latency. Offloading computations to the most energy efficient server can result in increased latency which can be an important factor for consideration for eg. to Google in comparison with the few million dollars saved otherwise. 

A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing

S. Floyd, V. Jacobson, S. McCanne, C-G Liu, L. Zhang, "A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing," ACM SIGCOMM Conference, (August 1995).
This paper talked about SRM (Scalable Reliable Multicast), a reliable multicast protocol for application level framing and light-weight sessions. This is based on the IP Multicast group delivery model. The paper starts off by identifying some key problems in extending basic TCP style multicast such as ACK implosion, responsibility of maintaining dynamic group membership information, and the whole concept of single receiver based RTT. Keeping this problems in mind, the authors proposed that any sender based reliability scheme like TCP wouldn't really work in such a scenario and what was needed was a receiver based scheme. Further, the reliability should be maintained on a 'application data unit' (ADU) basis. This helps in putting the application in control and also since ADU names can be independent of sending hosts, it can use the anonymity of IPMulticast to exploit redundancy of multiple receivers.

The entire system was based on the wb framework which was a network conferencing tool. The underlying idea was to prevent duplicate requests for the same ADU by various receivers in the multicast group. This is done by promiscuously hearing retransmit request and responses by other multicast receivers. Whenever a receiver detects a loss, it schedules a repair request for a random time in future between [Ci*d, (C1+C2)*d] where d is the the receiver's estimate of the one-way delay to the host. Similarly, when host B receives a request from A that B is capable of answering, B sets a repair timer to a value between [D1*d, (D1+D2)*d]. The values of C1, C2, D1, D2 are set such that there is as less retransmit requests as possible in case of lost packets.

The authors showed examples of various topologies such as chain, stars etc. The simulations were done on random trees as well as large bounded-degree trees to show the scalability of the protocol. They also proposed an adaptive adjustment of the constants by means of random timer algorithm that depended on average duplicate requests. Further, to conclude the paper, the authors talked about how is this protocol generic in nature and is not just limited to wb style multicast model.

Scalable Application Layer Multicast

S. Banerjee, B. Bhattacharjee, C. Kommareddy, "Scalable Application Layer Multicast," ACM SIGCOMM Conference, (August 2002).

This paper presents an application layer multicast protocol specifically designed for low-bandwidth, data streaming applications with large receiver sets. Being based on hierarchical clustering of application-layer multicast nodes, it is highly scalable and is not very sensitive to churn. The authors claim that 2 measures of goodness of any application layer multicast overlay network can be per-link stress (no. of identical packets sent by the protocol over each underlying link in the network) and a per-member stretch (ration of path length from the source to the member along the overlay to the length of the direct unicast path). 

The authors propose NICE application-layer multicast scheme. It reduces the control overhead at any member to O(log N). Additionally, they also showed that the average member maintains state for a constant number of other members and incurs constant control overhead for topology creation and maintenance. The overall idea of the NICE scheme is as follows. All receivers are divided into various groups (of size between k and 3k-1 where k is a constant and consists of hosts that are closer to one another). Each cluster has a cluster leader that is preferably geometrically equidistant from all other nodes in the group. The cluster leaders of all the groups are further divided into groups in a higher hierarchy and this goes on until there is only one node at the top of the hierarchy. Data is sent in an hierarchical manner to the topmost node that forwards it to its corresponding children which again forward the data down the tree. It can be shown that on an average the control overhead incurred by the nodes is O(k) while in the worst case it could be upto O(k log N).

The simulation results consisted of a comparison on average link stress between NICE and Narada-5 and average path length (hops) between NICE, Narada-5, IPMulticast and Unicast. NICE performed better than Narada-5 over long periods of time. The results clearly showed a promising implementation of the application layer multicast protocol.