Thursday, September 17, 2009

Detailed Diagnosis in Enterprise Networks

S. Kandula, R. Mahajan, P. Verkaik, S. Agarwal, J. Padhye, P. Bahl, "Detailed Diagnosis in Enterprise Networks," ACM SIGCOMM Conference, (August 2009).
This paper aimed at identifying the causes of faults in an enterprise network by formulating the detailed diagnosis as an inference problem that captured the behavior and interactions of various fine grained network components. The authors developed a system NetMedic which had the following features:
  • Detailed diagnosis was framed as an inference problem which was more generalized than rule based/classifier based approaches.
  • This was a technique to estimate if A and B are impacting each other without actually having knowledge of how they interact.
  • Introduced many state variables to capture the network state to make the analysis more generalized instead of having just one "health" variable.
  • Had the ability of detect complex interactions between all components and construct a dependency graph between them.
  • Finally it applied history based reasoning, statistical abnormality detection or other learning techniques to identify a cause of faults.
 Overall, the idea was definitely novel and was put forward in a very clear manner. However, the authors mentioned that in spite of collecting 450,000 cases for analysis, they actually used only about 148 cases for their analysis. This kind of places a question mark on the generic nature of this study. What if they had not encountered complex interactions between component in these few cases and missed on important issues? I felt that for a framework that was designed around statistical abnormality detection and history based reasoning, 148 cases were quite a small data-set. Secondly, I was not very convinced how the authors aim at the discovery of redundant variables. Assuming that the cliques always represent redundant variables seem to be a bit of an over-assumption. There can very well be a set of highly intricate tasks that go 'bad' together in a majority of cases, but there might be some cases when they behave independently.  Thirdly I wonder how sound is their assumption that correlation always imply causality which is what was the main focus of their inference engine. I think that it would be really interesting to discuss these aspects in the class.

1 comment:

  1. This paper was actually an extension of an earlier system called Sherlock. I agree with your comment on causality, which is why I talked about active experiments on Thursday.

    ReplyDelete