Monday, August 31, 2009

The Design Philosophy of the DARPA Internet Protocols

D. Clark, The design philosophy of the DARPA internet protocols, ACM SIGCOMM Computer Communication Review, August 1988.


This paper explores the underlying design philosophy, motivation and reasoning behind the DARPA Internet protocols which includes the Internet Protocol (IP) and the Transmission Control Protocol (TCP). The paper begins by highlighting that the fundamental goal  of the DARPA Internet Architecture was to develop an effective technique for multiplexed utilization of existing interconnected networks. For this, packet switching multiplexing technique was used over circuit switching mainly because of two reasons. Firstly, application such as remote login were better served by packet switching and secondly because the networks that were to be integrated were packet switching networks. Moreover, since the technique of store and forward packet switching was well understood, it was assumed that the networks would be interconnected by a layer of Internet packet switches called gateways.


Keeping the fundamental goal in mind, several secondary goals were set based on priority. First goal was that the Internet communication must continue despite the loss of networks or gateways. Since the network was designed to operate in a military context, it seems pretty reasonable for this to be a high priority goal. For this, a reliability approach called "fate sharing" was proposed where the transport level synchronization information  and other state information was kept on the end points. This resulted in stateless packet switches and a "datagram network". This idea was also in agreement with the End-to-End argument. Second goal was that the Internet must support multiple types of communicationn service. It was realized that not all communication services want to have a bi-directional reliable delivery of data. Examples of XNET, the cross-Internet debugger and that of real-time delivery of digitized speech further strengthened the author's point. Due to this, it was realized that more than one transport service would be required which resulted in TCP and IP to be split into 2 layers. TCP provided a reliable data stream while IP provided the basic datagram approach. Thirdly, the Internet architecture must accommodate a variety of networks. This was an important goal as this allowed the Internet architecture to make a minimum set of assumptions about the functions of the individual networks which made it even more useful and robust. 


Fourth goal was that the Internet architecture must permit distributed management of resources which led to several different management centers (ISPs) deployed in the Internet operating a subset of gateways. Fifth was that the whole Internet architecture must be cost effective which cannot be claimed to have been perfectly met. The headers of Internet packets are pretty long and this becomes an overhead for small packet sizes. Moreover, since lost packets are not recovered at the network level, it is an added overhead to retransmit lost packets. Sixth goal was that the architecture must permit host attachment with low level of effort. This is another objective that was not perfectly met. The cost of attaching a host is somewhat higher because all the reliability and state mechanisms must be implemented in the host. The final goal was that the resources used in the internet architecture must be accountable. This goal was not perceived important earlier when the end hosts were only military nodes however, now that the Internet has expanded to include all types of users, accountability of packet flows has gained a lot of importance.


Overall, I really liked this paper as it helped me appreciate the reasons for the numerous design decisions that led to the development of current Internet protocols. However, I think that many important design considerations that were valid at that time may no longer be valid now. For eg. cost effectiveness and accountability should definitely be considered to be two really important goals. In light of this, it would be really great to discuss as to what design approach the protocol designer would have taken given the current scenario in mind.

End-to-End Arguments in System Design

J. H. Saltzer , D. P. Reed , D. D. Clark, End-to-end arguments in system design, ACM Transactions on Computer Systems (TOCS), Nov. 1984

This paper presents one of the most important and fundamental decisions in distributed systems design concerning the placement of functionality among different modules of the system. The authors present the 'End-to-End' argument which suggests that certain functions must be provided at higher levels of the application since providing them at lower levels may not always be economical and may even result in redundancy.

The authors gave an example of a reliable file transfer protocol to support their argument. To ensure reliability,  one approach is to have proper redundancy, recovery and error correction techniques deployed at each level so that the probability of individual threats is reduced to a negligible value. Another approach would be to instead only have a check-sum comparison checks deployed at the "end-to-end" level and in case of failure, a complete retry is attempted. This technique would work well in case of a low failure rate as normal error free transfers would not have to bear the overhead of having redundancies and error checks at every level.  The argument made here is that one might reduce the threats at the lower level to a near negligible values, but still the application designer would have to provide a check at the application level. The "extra effort" in assuring reliability at lower levels may result in reducing frequency of retries but it has no effect on the ultimate correctness of the outcome. So, having extraordinary reliability at lower levels doesnt reduce the burden on the application layer to assure reliability.

However, that said, unreliable lower levels are a problem too! In the above example, totally unreliable channels may result in an exponential increase in number of retries as length of the file increases. The key idea is that aiming for "perfect reliability" is not needed but some reliability assurance should be guaranteed. Further, the authors argued that performing a certain function may cost more at lower levels since the subsystem may be shared by many applications. It is pointless to have a slow but very reliable communication system if the applications that run on top of it may instead want to have a fast and not-so-reliable network (eg. digitized speech transfer). To further support their argument, the authors take examples of delivery acknowledgments, secure transmission of data, duplicate message suppression, guaranteeing FIFO message delivery and the SWALLOW distributed data storage systems wherein it is always beneficial to provide these functionality at the higher level.

Overall, I felt that the authors made convincing arguments and highlighted the fact that there should be a proper balance between the functionality implemented at various levels. End-to-End argument is not necessarily an absolute rule but is more of a property of specific applications or rather a guideline that helps in protocol and application design. Different design arguments hold in the case of transmitting voice in real time and in transmitting a recorded voice.  This balance is obtained by carefully looking at the application so as to minimize redundancies and improving performance. However, since the paper commented on the overall philosophy of system design, I would have liked if the paper had a more varied set of system design examples (like the RISC analogy) rather than focusing mainly on data communication systems. Moreover, I feel that the design decision is not a function of performance alone. Various considerations such as security, modularity and re-usability must be taken into account in making a choice of placing functionality.