Low Latency TCP Protocol for Beowulf Clusters

 

The cluster protocol is a new network protocol intended to improve network performance on a secure Beowulf cluster. Two versions of the cluster protocol are available: they are named version A and version B.

Version A protocol doesn't use the TCP send queue, doesn't use ACKs, windows, and sequence numbers, and doesn't checksum. It uses a ring buffer to store messages that are being sent, and the length is the kernel parameter /proc/sys/net/ipv4/cluster_output_ring.

Version B protocol is very similar to TCP except that it doesn't checksum and support many of the TCP/IP options. To use the protocols, the user needs to build a new kernel using the sources provided here. Also two user level header files need to be modified. The the user can use SOCK_CLUSTER option instead of SOCK_STREAM in calls to socket.

The following subdirectories contain the relevant code.

kernelA - This contains the code required to build a cluster protocol A compatible kernel. (Or get the Protocol A Kernel patches)
kernelB -This contains the code required to build a cluster protocol B compatible kernel. (Or get the Protocol B Kernel patches)
user - Header files required to build user level programs using the cluster protocol.
tests - Simple test code, timing the differences between the cluster protocol and TCP/IP.

This project was done as a project for Masters in Computer Science by Ira Burton under the supervision of Amit Jain.
The entire project report is available here in PostScript [476KB].