Tuesday, November 24, 2009

BotGraph: Large Scale Spamming Botnet Detection

Y. Zhao, Y. Xie, F. Yu, Q. Ke, Y. Yu, Y. Chen, E. Gillum, "BotGraph: Large Scale Spamming Botnet Detection," NSDI'09, (April 2009).
This paper also talks about botnet detection albeit from an entirely different approach as the previous paper.  The authors mainly focus their attention to the web account abuse attack. That is a spammer can create many accounts (bot users) on popular mail servers and then use a large number of infected computers (bots) to send spam mails using these bot users. The authos design and implement a system called BotGraph to detect these attacks at a large scale (Hotmail logs of 500 million users and identifying over 26 million botnet created users). The whole methodology involves around creating giant user-user graphs and then analyzing the graph to identify bot-user groups by identifying connected components. The analysis was based on the following 3 strategies adopted by spammers:
  1. Assigning bot-users (email accounts) to random bots (infected machines).
  2. Keeping a queue of bot users and whenever the bots come online in a random order, the spammer assigns the requesting bot, the top k bot users. 
  3. Here, there is no limit on the number of request a bot can make, and every time the spammer assigns it a single bot user.
Clearly, these 3 strategies result in different graphs and required a dynamic recursive algorithm. The idea was that each node represented bot users and there was an edge if they both share an IP. Based on the number of shared IPs, weights were assigned to the edges. The algorithm first of all pruned the graph for all edges with weights less than 2 and then identified all connected components. Then it recursively identified connected components within these subgraphs that had edges with even higher weights. Apart from the algorithmic aspect of the paper, its computational aspect was equally challenging. Identifying bot users from 500 million email accounts is really computationally intensive. The authors implemented BotGraph using Dryad/DryadLINQ which is a powerful programming environment for distributed parallel computing. This method was called selective filtering and partitioned inputs based on user ID and distributed only the related records across partitions.

Comments

There can be an interesting comparison over the two solutions provided by NAB and BotGraph. Though both these paper were trying to solve the same problem, they were entirely different in their approaches -- NAB identifies bot vs user traffic whereas BotGraph identifies bot user accounts. However it appears that in a long term, NAB has additional advantages of identifying any bot traffic (DDoS or Click-fraud) and was not just limited to spam email whereas NetGraph had the advantage of being really practical. Email service providers can quickly/easily analyze botgraphs and ban bot user accounts. It would be interesting to discuss what would be more beneficial for say someone like Google which runs an email service as well as is concerned about DDoS attacks and click fraud. At this point, the best answer appears to be 'both'.

No comments:

Post a Comment