Monday, August 31, 2009

End-to-End Arguments in System Design

J. H. Saltzer , D. P. Reed , D. D. Clark, End-to-end arguments in system design, ACM Transactions on Computer Systems (TOCS), Nov. 1984

This paper presents one of the most important and fundamental decisions in distributed systems design concerning the placement of functionality among different modules of the system. The authors present the 'End-to-End' argument which suggests that certain functions must be provided at higher levels of the application since providing them at lower levels may not always be economical and may even result in redundancy.

The authors gave an example of a reliable file transfer protocol to support their argument. To ensure reliability,  one approach is to have proper redundancy, recovery and error correction techniques deployed at each level so that the probability of individual threats is reduced to a negligible value. Another approach would be to instead only have a check-sum comparison checks deployed at the "end-to-end" level and in case of failure, a complete retry is attempted. This technique would work well in case of a low failure rate as normal error free transfers would not have to bear the overhead of having redundancies and error checks at every level.  The argument made here is that one might reduce the threats at the lower level to a near negligible values, but still the application designer would have to provide a check at the application level. The "extra effort" in assuring reliability at lower levels may result in reducing frequency of retries but it has no effect on the ultimate correctness of the outcome. So, having extraordinary reliability at lower levels doesnt reduce the burden on the application layer to assure reliability.

However, that said, unreliable lower levels are a problem too! In the above example, totally unreliable channels may result in an exponential increase in number of retries as length of the file increases. The key idea is that aiming for "perfect reliability" is not needed but some reliability assurance should be guaranteed. Further, the authors argued that performing a certain function may cost more at lower levels since the subsystem may be shared by many applications. It is pointless to have a slow but very reliable communication system if the applications that run on top of it may instead want to have a fast and not-so-reliable network (eg. digitized speech transfer). To further support their argument, the authors take examples of delivery acknowledgments, secure transmission of data, duplicate message suppression, guaranteeing FIFO message delivery and the SWALLOW distributed data storage systems wherein it is always beneficial to provide these functionality at the higher level.

Overall, I felt that the authors made convincing arguments and highlighted the fact that there should be a proper balance between the functionality implemented at various levels. End-to-End argument is not necessarily an absolute rule but is more of a property of specific applications or rather a guideline that helps in protocol and application design. Different design arguments hold in the case of transmitting voice in real time and in transmitting a recorded voice.  This balance is obtained by carefully looking at the application so as to minimize redundancies and improving performance. However, since the paper commented on the overall philosophy of system design, I would have liked if the paper had a more varied set of system design examples (like the RISC analogy) rather than focusing mainly on data communication systems. Moreover, I feel that the design decision is not a function of performance alone. Various considerations such as security, modularity and re-usability must be taken into account in making a choice of placing functionality.

1 comment:

  1. This paper is considered a classic, often quoted as though it was a religious text. It is good to read it and understand that it is really a nuanced view of the e2e argument.

    ReplyDelete