Thursday, November 12, 2009

X-Trace: A Pervasive Network Tracing Framework

R. Fonseca, G. Porter, R. H. Katz, S. Shenker, I. Stoica, "X-Trace: A Pervasive Network Tracing Framework," NSDI'07, (April 2007).
This paper presents X-Trace, a pervasive Network Tracing Framework. The idea is that different applications spanning multiple administrative domains working over different contexts make it extremely difficult to have a comprehensive view of the system. Though various diagnosis tools do exist but mostly they are either layer specific or application specific. X-Trace aims to be pervasive and generic in its approach (at the cost of requiring agreements from ADs to support it).

The idea is that the user invokes X-Trace when initiating an application task by inserting X-Trace metadata in the resulting request consisting of flags (specifying which component of metadata is placed in it), a unique TaskID, TreeInfo field (consisting of ParentID, OpID and EdgeType tuple), Destination Info (optional field to indicate the N/W adress to send reports to) and an Options field to accomodate future extensions. This metadata propagates through the network using pushDown() and pushNext() calls. pushDown() pushes metadata down the logical layer while pushNext() pushes metadata at the same level on the next components. Each node is responsible to store its X-Trace metadata which is then sent back to the report server and aids in the construction of a trace tree. This tree will help the report server to see the entire trace of how packets traveled into the network and based on incomplete/broken traces, can help the administrator to judge possible network problems.

Further the authors talked about additional uses of X-trace in tunneling, ISP connectivity troubleshooting and link layer tracing mechanisms. They evaluated the system by deploying 3 scenarios: DNS resolution, a three-tiered photo-hosting website and a service accessed through i3 overlay network.

Comments
Another well written paper. Talks about a very real problem and proposes a very generic solution. However a few questions arise over the effectiveness and ease of deployment of the protocol:
  1. Requires modifications in clients, servers and network devices to support X-Trace metadata. This limits its quick deployment.
  2. Though the authors claim that partial deployment of X-Trace can help in finding faults at some granularity, it remains an open question as to how useful will be these partial graphs.
  3. Loss of reports back to the reporting servers may be assumed as failures. 
  4. Though I donot understand this quite well, but the authors admitted that tree structure doesn't capture all possible network actions. They talked about quorum protocols or a controller that sends jobs to many working nodes and waits for all to complete.
Overall, I guess it would also be interesting to discuss how X-Trace can be extended on the lines of NetMedic to automatically detect network failures or misconfiguration errors too. 

1 comment:

  1. Your last point is actually something I have tried to interest students in. The overhead of embedding X-trace in existing code has been an impediment, but we have it in Hadoop and nobody has really picked it up for debugging and diagnosing purposes.

    ReplyDelete