Saturday, December 19, 2009

Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems

S. Nedevschi, J. Chandrashekar, J. Liu, B. Nordman, S. Ratnasamy, N. Taft, "Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems," NSDI'09, (April 2009).
This paper argues that systems draw a lot of power even when they are idle which should obviously not be the case. Even though various solutions have been proposed over the years to save energy costs in clusters such as wake-on-lan (WoL) or proxies (NCP), there has been no in-depth evaluation of potential energy savings. In this paper, the authors proposed an extensible proxy architecture by analyzing a trace from 250 enterprise users (90% laptops and 10% desktops) at Intel Berkeley Research lab over 4 weeks consisting of common network protocol packets in the network. Their work was essentially a trace-driven evaluation of broader benefits and design tradeoffs  in desiging proxy architectures.

The evaluation of traces showed some really interesting results. On an average, desktops were idle >50% of the time wasting >60% of the energy. This scaled to 6 billion dollars being wasted per year (assuming 170 million desktop PCs in the US). Howeover, a strict wake-on-lan scheme that wakes a node on receiving any packet doesnt work well due to the fact that the incoming traffic is high even when the node is idle. To tackle this, a hybrid scheme is necessary that can proxy for certain protocol packets and wake the hosts for important packets that require user attention. The authors analyzed various such protocols and identified "key offenders" keeping volume of packets and half-time sleep as metrics for identifying them. Once these key offenders were identified, they can be ignored or responded by the proxy or the client can be woken up.

On idetifying various key offenders, the authors designed a general proxy architecture consisting of a set of rules of type (trigger, action). The trigger was based on incoming packet and proxy state while the actions could be drop, wake, respond or redirect. The system involved no modifications to the end systems and could be implemented as a separate network product which makes it extremely useful and easily deployable.

Cutting the Electric Bill for Internet-Scale Systems

A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, B. Maggs, "Cutting the Electric Bill for Internet-Scale Systems," ACM SIGCOMM Conference, (August 2009).
This paper talked about how data center operating costs can be brought down by millions of dollars by offloading computation work at those locations where electricity costs are cheaper at that particular time. The authors argue that costs of electricity are highly variable at various places in the US and have little correlation between them. Further, the costs are highly dynamic which requires transferring computation continuously to different locations in a highly dynamic manner. Further, they introduce a concept of energy elasticity, that is the degree to which the energy consumed by a cluster depends on the load placed on it. The maximum reduction in cost their approach can achieve hinges on this factor. Moreover, it also assumes that change in bandwidth due to routing additional data does not affect the overall costs.

This paper mainly involved an extensive survey of energy costs and the feasibility of such a scheme based on Akamai systems. It modeled energy consumption and the increase in routing energy to show as a proof of concept that such a routing scheme could be profitable to implement for an organization with large datacenters at different geographical locations. The effectiveness of the proposal lies in mainly latency. Offloading computations to the most energy efficient server can result in increased latency which can be an important factor for consideration for eg. to Google in comparison with the few million dollars saved otherwise. 

A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing

S. Floyd, V. Jacobson, S. McCanne, C-G Liu, L. Zhang, "A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing," ACM SIGCOMM Conference, (August 1995).
This paper talked about SRM (Scalable Reliable Multicast), a reliable multicast protocol for application level framing and light-weight sessions. This is based on the IP Multicast group delivery model. The paper starts off by identifying some key problems in extending basic TCP style multicast such as ACK implosion, responsibility of maintaining dynamic group membership information, and the whole concept of single receiver based RTT. Keeping this problems in mind, the authors proposed that any sender based reliability scheme like TCP wouldn't really work in such a scenario and what was needed was a receiver based scheme. Further, the reliability should be maintained on a 'application data unit' (ADU) basis. This helps in putting the application in control and also since ADU names can be independent of sending hosts, it can use the anonymity of IPMulticast to exploit redundancy of multiple receivers.

The entire system was based on the wb framework which was a network conferencing tool. The underlying idea was to prevent duplicate requests for the same ADU by various receivers in the multicast group. This is done by promiscuously hearing retransmit request and responses by other multicast receivers. Whenever a receiver detects a loss, it schedules a repair request for a random time in future between [Ci*d, (C1+C2)*d] where d is the the receiver's estimate of the one-way delay to the host. Similarly, when host B receives a request from A that B is capable of answering, B sets a repair timer to a value between [D1*d, (D1+D2)*d]. The values of C1, C2, D1, D2 are set such that there is as less retransmit requests as possible in case of lost packets.

The authors showed examples of various topologies such as chain, stars etc. The simulations were done on random trees as well as large bounded-degree trees to show the scalability of the protocol. They also proposed an adaptive adjustment of the constants by means of random timer algorithm that depended on average duplicate requests. Further, to conclude the paper, the authors talked about how is this protocol generic in nature and is not just limited to wb style multicast model.

Scalable Application Layer Multicast

S. Banerjee, B. Bhattacharjee, C. Kommareddy, "Scalable Application Layer Multicast," ACM SIGCOMM Conference, (August 2002).

This paper presents an application layer multicast protocol specifically designed for low-bandwidth, data streaming applications with large receiver sets. Being based on hierarchical clustering of application-layer multicast nodes, it is highly scalable and is not very sensitive to churn. The authors claim that 2 measures of goodness of any application layer multicast overlay network can be per-link stress (no. of identical packets sent by the protocol over each underlying link in the network) and a per-member stretch (ration of path length from the source to the member along the overlay to the length of the direct unicast path). 

The authors propose NICE application-layer multicast scheme. It reduces the control overhead at any member to O(log N). Additionally, they also showed that the average member maintains state for a constant number of other members and incurs constant control overhead for topology creation and maintenance. The overall idea of the NICE scheme is as follows. All receivers are divided into various groups (of size between k and 3k-1 where k is a constant and consists of hosts that are closer to one another). Each cluster has a cluster leader that is preferably geometrically equidistant from all other nodes in the group. The cluster leaders of all the groups are further divided into groups in a higher hierarchy and this goes on until there is only one node at the top of the hierarchy. Data is sent in an hierarchical manner to the topmost node that forwards it to its corresponding children which again forward the data down the tree. It can be shown that on an average the control overhead incurred by the nodes is O(k) while in the worst case it could be upto O(k log N).

The simulation results consisted of a comparison on average link stress between NICE and Narada-5 and average path length (hops) between NICE, Narada-5, IPMulticast and Unicast. NICE performed better than Narada-5 over long periods of time. The results clearly showed a promising implementation of the application layer multicast protocol.

Tuesday, November 24, 2009

BotGraph: Large Scale Spamming Botnet Detection

Y. Zhao, Y. Xie, F. Yu, Q. Ke, Y. Yu, Y. Chen, E. Gillum, "BotGraph: Large Scale Spamming Botnet Detection," NSDI'09, (April 2009).
This paper also talks about botnet detection albeit from an entirely different approach as the previous paper.  The authors mainly focus their attention to the web account abuse attack. That is a spammer can create many accounts (bot users) on popular mail servers and then use a large number of infected computers (bots) to send spam mails using these bot users. The authos design and implement a system called BotGraph to detect these attacks at a large scale (Hotmail logs of 500 million users and identifying over 26 million botnet created users). The whole methodology involves around creating giant user-user graphs and then analyzing the graph to identify bot-user groups by identifying connected components. The analysis was based on the following 3 strategies adopted by spammers:
  1. Assigning bot-users (email accounts) to random bots (infected machines).
  2. Keeping a queue of bot users and whenever the bots come online in a random order, the spammer assigns the requesting bot, the top k bot users. 
  3. Here, there is no limit on the number of request a bot can make, and every time the spammer assigns it a single bot user.
Clearly, these 3 strategies result in different graphs and required a dynamic recursive algorithm. The idea was that each node represented bot users and there was an edge if they both share an IP. Based on the number of shared IPs, weights were assigned to the edges. The algorithm first of all pruned the graph for all edges with weights less than 2 and then identified all connected components. Then it recursively identified connected components within these subgraphs that had edges with even higher weights. Apart from the algorithmic aspect of the paper, its computational aspect was equally challenging. Identifying bot users from 500 million email accounts is really computationally intensive. The authors implemented BotGraph using Dryad/DryadLINQ which is a powerful programming environment for distributed parallel computing. This method was called selective filtering and partitioned inputs based on user ID and distributed only the related records across partitions.

Comments

There can be an interesting comparison over the two solutions provided by NAB and BotGraph. Though both these paper were trying to solve the same problem, they were entirely different in their approaches -- NAB identifies bot vs user traffic whereas BotGraph identifies bot user accounts. However it appears that in a long term, NAB has additional advantages of identifying any bot traffic (DDoS or Click-fraud) and was not just limited to spam email whereas NetGraph had the advantage of being really practical. Email service providers can quickly/easily analyze botgraphs and ban bot user accounts. It would be interesting to discuss what would be more beneficial for say someone like Google which runs an email service as well as is concerned about DDoS attacks and click fraud. At this point, the best answer appears to be 'both'.

Not-a-Bot: Improving Service Availability in the Face of Botnet Attacks

R. Gummadi, H. Balakrishnan, P. Maniatis, S. Ratnasamy, "Not-a-Bot: Improving Service Availability in the Face of Botnet Attacks," NSDI'09, (April 2009).
This paper talks about identifying the traffic that is sent from compromised machines that form botnets. It aims at separating human generated  traffic from that generated by botnets by observing a fact that human generated traffic is generally accompanied by keyboard or mouse inputs. The authors designed a system called NAB ("Not-A-Bot") to approximately identify human generated activity. It uses a trusted software component called 'attester' which runs on the client machine with an untrusted OS. The authors implemented their version of the attester in the Xen hypervisor but claimed that other implementations such as building the attester in the trusted hardware or running it in software without virtualization is also possible. (The last option actually looks very promising and I wonder as to why it was not done).


Design

The authors used TPM and built the attester on top of it. A TPM provides many security services, among which the ability to measure and attest to the integrity of trusted software running on the computer boot time. The attester relies on two key primitives provided by TPMs called the direct anonymous attester (DAA) and the sealed storage. The basis of the design is that whenever an application send a message, it has to obtain an attestation from the attester. The attestation has the hash of the message, a nonce and range values signed by the attester's public key as well as its certificate. The attestation is given if the request is generated within some range (delta-m and delta-k) of the keyboard and mouse activity (with certain exceptions such as scripted mails which uses PID). On the other end, there is a verifier, which can now implement various policies based on the attestation. It can all together drop non attested packet, but practically, the authors talk about giving low priority to non-attested packets or rather making sure that the attested packets always pass through. Based on this, the paper contained implementation of effective spam solutions, DDoS prevention and click-fraud mitigation.

Comments

Overall, this was an excellent paper. It tackled a very practical problem and came up with a partially practical solution too. However, I have certain comments on the effectiveness of the protocol:

  1. It can very well be possible to have adaptive botnets that can adapt to sending data whenever there is a mouse or key stroke. This can in the worst case result in sending botnet data in bursts and may result in throttling the sender.
  2. Current design an implementations involve modifying applications and protocols (to accommodate attester data). This raises concerns about its deployability.
  3. Although, the authors mention that they donot intend to 'punish' those applications who donot use attesters, I somehow felt that giving priority to attested traffic is against this very goal.
  4. The evaluations showed that 0.08% of human generated mail was still misclassified as spam. Shouldn't it be ideally zero now? Given that they already tackled script generated mails, what could be the cause of these emails? 
  5. It was a bit of a surprise for  me when the authors mentioned that 600 GB of disk overhead for storing nonces is not much for an ISP. Is that scalable?

Thursday, November 12, 2009

X-Trace: A Pervasive Network Tracing Framework

R. Fonseca, G. Porter, R. H. Katz, S. Shenker, I. Stoica, "X-Trace: A Pervasive Network Tracing Framework," NSDI'07, (April 2007).
This paper presents X-Trace, a pervasive Network Tracing Framework. The idea is that different applications spanning multiple administrative domains working over different contexts make it extremely difficult to have a comprehensive view of the system. Though various diagnosis tools do exist but mostly they are either layer specific or application specific. X-Trace aims to be pervasive and generic in its approach (at the cost of requiring agreements from ADs to support it).

The idea is that the user invokes X-Trace when initiating an application task by inserting X-Trace metadata in the resulting request consisting of flags (specifying which component of metadata is placed in it), a unique TaskID, TreeInfo field (consisting of ParentID, OpID and EdgeType tuple), Destination Info (optional field to indicate the N/W adress to send reports to) and an Options field to accomodate future extensions. This metadata propagates through the network using pushDown() and pushNext() calls. pushDown() pushes metadata down the logical layer while pushNext() pushes metadata at the same level on the next components. Each node is responsible to store its X-Trace metadata which is then sent back to the report server and aids in the construction of a trace tree. This tree will help the report server to see the entire trace of how packets traveled into the network and based on incomplete/broken traces, can help the administrator to judge possible network problems.

Further the authors talked about additional uses of X-trace in tunneling, ISP connectivity troubleshooting and link layer tracing mechanisms. They evaluated the system by deploying 3 scenarios: DNS resolution, a three-tiered photo-hosting website and a service accessed through i3 overlay network.

Comments
Another well written paper. Talks about a very real problem and proposes a very generic solution. However a few questions arise over the effectiveness and ease of deployment of the protocol:
  1. Requires modifications in clients, servers and network devices to support X-Trace metadata. This limits its quick deployment.
  2. Though the authors claim that partial deployment of X-Trace can help in finding faults at some granularity, it remains an open question as to how useful will be these partial graphs.
  3. Loss of reports back to the reporting servers may be assumed as failures. 
  4. Though I donot understand this quite well, but the authors admitted that tree structure doesn't capture all possible network actions. They talked about quorum protocols or a controller that sends jobs to many working nodes and waits for all to complete.
Overall, I guess it would also be interesting to discuss how X-Trace can be extended on the lines of NetMedic to automatically detect network failures or misconfiguration errors too.