2001-08-14 TJSN [Issue 028] - Multicasting in Java

2001-08-14 TJSN [Issue 028] - Multicasting in Java

Author: Paul van Spronsen

If you are not already subscribed to this newsletter, please send an email to subscribe@javaspecialists.co.za. Be warned that if you are a beginner in Java, you will at times struggle to keep up. The archive of past newsletters is kept at http://www.javaspecialists.co.za

This week we again have a guest author, Paul van Spronsen who owns Blue Label Software in South Africa, a small software development company. I have written a lot of code with Paul and he is the best Java programmer that I've had the pleasure of working with. So, if your project needs some extra horse power (ok, brain power ;-), send him an email and I'm sure his company will be able to help you.

A special word of thanks also to Robin Beetge for hacking together a dirty Perl script to generate XML tags for the syntax highlighting. Your little script is saving me a lot of time :)


Multicasting in Java

1. Introduction

This article deals primarily with the subject of multicast communication in Java. I have, however, included some background information to refresh the memory of those who have forgotten how much they know about data communications. If the concepts "datagram", "IP fragment", "reliable protocol" or "multicast" are not clear to you, try referring to the appendices. If the appendices appear shrouded in mystery, go back to your data comms lecturer and demand a refund.

2. Sending multicast datagrams

In order to send any kind of datagram in Java, be it unicast, broadcast or multicast, one needs a java.net.DatagramSocket:

  DatagramSocket socket = new DatagramSocket();

One can optionally supply a local port to the DatagramSocket constructor to which the socket must bind. This is only necessary if one needs other parties to be able to reach us at a specific port. A third constructor takes the local port AND the local IP address to which to bind. This is used (rarely) with multi-homed hosts where it is important on which network adapter the traffic is received. Neither of these is necessary for this example.

This sample code creates the socket and a datagram to send and then simply sends the same datagram every second:

  DatagramSocket socket = new DatagramSocket();

  byte[] b = new byte[DGRAM_LENGTH];
  DatagramPacket dgram;

  dgram = new DatagramPacket(b, b.length,
    InetAddress.getByName(MCAST_ADDR), DEST_PORT);

  System.err.println("Sending " + b.length + " bytes to " +
    dgram.getAddress() + ':' + dgram.getPort());
  while(true) {
    System.err.print(".");
    socket.send(dgram);
    Thread.sleep(1000);
  }

Valid values for the constants are:

  • DGRAM_LENGTH: anything from 0 to 65507 (see section 5), eg 32
  • MCAST_ADDR: any class D address (see appendix D), eg 235.1.1.1
  • DEST_PORT: an unsigned 16-bit integer, eg. 7777
  • It is important to note the following points:

    1. DatagramPacket does not make a copy of the byte-array given to it, so any change to the byte-array before the socket.send() will reflect in the data actually sent;
    2. One can send the same DatagramPacket to several different destinations by changing the address and or port using the setAddress() and setPort() methods;
    3. One can send different data to the same destination by changing the byte array referred to using setData() and setLength() or by changing the contents of the byte array the DatagramPacket is referring to;
    4. One can send a subset of the data in the byte array by manipulating offset and length through the setOffset() and setLength() methods.

    3. Receiving multicast datagrams

    One can use a normal DatagramSocket to send and receive unicast and broadcast datagrams and to send multicast datagrams as seen in the section 2. In order to receive multicast datagrams, however, one needs a MulticastSocket. The reason for this is simple, additional work needs to be done to control and receive multicast traffic by all the protocol layers below UDP.

    The example given below, opens a multicast socket, binds it to a specific port and joins a specific multicast group:

      byte[] b = new byte[BUFFER_LENGTH];
      DatagramPacket dgram = new DatagramPacket(b, b.length);
      MulticastSocket socket =
        new MulticastSocket(DEST_PORT); // must bind receive side
      socket.joinGroup(InetAddress.getByName(MCAST_ADDR));
    
      while(true) {
        socket.receive(dgram); // blocks until a datagram is received
        System.err.println("Received " + dgram.getLength() +
          " bytes from " + dgram.getAddress());
        dgram.setLength(b.length); // must reset length field!
      }
    

    Values for DEST_PORT and MCAST_ADDR must match those in the sending code for the listener to receive the datagrams sent there. BUFFER_LENGTH should be at least as long as the data we intend to receive. If BUFFER_LENGTH is shorter, the data will be truncated silently and dgram.getLength() will return b.length.

    The MulticastSocket.joinGroup() method causes the lower protocol layers to be informed that we are interested in multicast traffic to a particular group address. One may execute joinGroup() many times to subscribe to different groups. If multiple MulticastSockets bind to the same port and join the same multicast group, they will all receive copies of multicast traffic sent to that group/port.

    As with the sending side, one can re-use ones DatagramPacket and byte-array instances. The receive() method sets length to the amount of data received, so remember to reset the length field in the DatagramPacket before subsequent receives, otherwise you will be silently truncating all your incoming data to the length of the shortest datagram previously received.

    One can set a timeout on the receive() operation using socket.setSoTimeout(timeoutInMilliseconds). If the timeout is reached before a datagram is received, the receive() throws a java.io.InterruptedIOException. The socket is still valid and usable for sending and receiving if this happens.

    4. Multicasting and serialization

    We have seen in the previous sections that we can multicast anything we can fit into a byte array. Conveniently for us, one of those things is a serialized object.

    Object serialization is based on the assumption of a stream (ObjectOutputStream, ObjectInputStream), so we have to do a little massaging to squeeze this into our datagram paradigm. ObjectOutputStream writes a stream header (containing a magic number and version number) to the stream on construction and ObjectInputStream reads and checks this on construction (ever wondered why ObjectInputStream's constructor blocks until the ObjectOutputStream has been constructed on the sending side?). This is the reason one always attaches the ObjectOutputStream to the outgoing side of a socket before attaching the ObjectInputStream to the incoming side.

    In order to multicast objects, we need to arrange that the stream header information is in each datagram. The simplest way to ensure this is to create a new ObjectOutputStream for each datagram we send and a new ObjectInputStream for each one we receive. We could probably avoid these instantiations by extending the two classes in question, but I'm not going into that here.

    On the sending side, we can do something like this:

      ByteArrayOutputStream b_out = new ByteArrayOutputStream();
      ObjectOutputStream o_out = new ObjectOutputStream(b_out);
    
      o_out.writeObject(new Message());
    
      byte[] b = b_out.toByteArray();
    
      DatagramPacket dgram = new DatagramPacket(b, b.length,
        InetAddress.getByName(MCAST_ADDR), DEST_PORT); // multicast
      socket.send(dgram);
    

    In addition, on the receiving side we can do something like this:

      byte[] b = new byte[65535];
      ByteArrayInputStream b_in = new ByteArrayInputStream(b);
      DatagramPacket dgram = new DatagramPacket(b, b.length);
    
      socket.receive(dgram); // blocks
      ObjectInputStream o_in = new ObjectInputStream(b_in);
      Object o = o_in.readObject();
      dgram.setLength(b.length); // must reset length field!
      b_in.reset(); // reset so next read is from start of byte[] again
    

    Note that one can re-use the ByteArray*Streams, byte arrays and DatagramPackets on both sides. Only the Object*Streams need be recreated.

    5. Datagram sizes

    The IP spec allows for datagrams up to 65535 bytes in length, including the IP header. If the underlying protocol layers cannot support this size (Ethernet's MTU is 1500 bytes), IP fragments the datagrams into several smaller datagrams. On the receive side, IP reassembles the datagram before delivering it to higher layer protocols, like UDP. If any of the fragments do not arrive at the destination, the entire datagram is discarded, i.e. there is no partial delivery of IP and therefore UDP datagrams.

    Since the normal IP header is 20 bytes long and the UDP header is always 8 bytes long, one would expect the maximum UDP data length to be 65535-8-20 = 65507. Somehow, however, the combination of Win2k and JDK1.3.1 manages to successfully send as much as 65527 bytes per datagram. I would be interested to hear whether users of a real operating system experienced the same.

    It is very important to note that although the IP spec allows for datagrams up to 65535 bytes, it only requires implementations to support up to 576 byte IP datagrams including IP and higher protocol headers. Since the maximum IP header length is 64 and the UDP header length is 8, it is safe to send up to 504 byte UDP datagrams and expect the receiving side to handle it (yes, even your Palm Pilot if it has a TCP/IP stack). I have not come across a full size (i.e. non-embedded) system that cannot handle the full 64k-1, though.

    6. Effect of fault conditions

    UDP does not gaurantee delivery or notification of non-delivery. If you send a unicast packet to a host that does not exist, is down or is not listening on that port, you will not know about it. If you send a broadcast or multicast packet and nobody receives it or is even listening, you will not know about it.

    On Win2k the network adapter settings are reset if it is detected that the link is not available. With Ethernet, for example, if you unplug the LAN cable so that there is no link available, Win2K detects this and effectively shuts down the adapter at the IP level. It clears its IP address and will not attempt to use it. The effect of this is that sockets cannot bind to a port, so all new *Socket calls fail. Sockets that are already created function correctly if you unplug and replug the cable.

    On my notebook, local communication (sender and listener on the same machine) began to fail when I unplugged the LAN cable. It gets nastier than this:- a listener started before I unplugged the cable could not hear traffic from a sender started after I had plugged the cable back in. But wait, there's more! I started another listener after the cable was back in and it and the listeners started before I unplugged the cable, all receive the multicasts again.

    On WinNT4, my experience has been that the adapter is not "shutdown" when the cable is unplugged and one does not have these weird effects.

    7. Multiple listeners and unicast packets

    Since one can send unicast packets using the same MulticastSocket instance as for ones multicasts, it makes sense to mention how unicasts are handled when there is more than one listener, which can only be when they are all on the same machine.

    Unicast traffic sent to the port will be received by only one of the listeners with a socket bound to the port. With my test setup, the last socket to bind to the port receives the unicast traffic. On WinNT4, the first one to bind receives it. I don't know of any rules covering how unicast traffic should be handled in the case of multiple listeners, so don't rely on it being handled in any particular way.

    8. Further reading

    See the RFCs for IP(791), UDP(768) and IP multicasting(1112). Compared to some of the ISO and IEEE stuff I've seen, they're recreational reading material.

    APPENDICES

    A. Protocol "reliability"

    You may have heard TCP described as a "reliable" protocol and UDP as an "unreliable" protocol. It is easy, but dangerous, to jump to conclusions about what this means. Being "reliable" does not mean that TCP will deliver your data under all circumstances (try unplugging the LAN cable for a day and see). Being "unreliable", does not mean UDP will arbitrarily throw away your data. "Unreliable" is a loaded term and I prefer to use "non-reliable" which indicates more that it lacks the gaurantees of a "reliable" protocol, rather than labelling it as some sort of untrustworthy servant.

    Enough about what reliability, or lack of it, does not mean. A "reliable" protocol like TCP guarantees that it will deliver your data correctly and in order of transmission or inform you that it could not.

    A "non-reliable" protocol, like UDP, does what is called "best-effort delivery". Essentially, given enough available resources (buffers, bandwidth etc) UDP will deliver your data correctly. It will not deliver incorrect data, but it could deliver data in a different order to which it was sent or not at all.

    The NFS (Network File System) protocol uses UDP to communicate between the server and the client. IMHO, this is a testament to the "reliability" of UDP as a transport. Of course, NFS implements its own reliability mechanisms (timeouts and retransmissions) on top of UDP to be sure.

    B. Stream vs Datagrams

    The differences between TCP and UDP don't end with reliability. They are fundamentally different in their data model. TCP is stream based and UDP is datagram based. This means that with UDP, if data is lost or delivered out of order, it happens with datagram granularity.

    Since TCP is stream based, it does not honour your message boundaries. If you implement your own message passing system using TCP, you will find that doing a send() call of n bytes on one side of the connection does not necessarily result in n bytes being returned by the "corresponding" read() call on the other side. TCP rides on top of IP, which is datagram based, so there is packetizing happening when TCP data is sent, but TCP is at liberty to split your send() up into several actual packets or to coalesce several send() operations into one packet.

    C. nCasting

    In the case of TCP, the number of intended recipients of transmitted data is always exactly one (like a telephone call). In general, this is not the case. Everybody is aware of broadcast communication (like radio or television) where there is one sender and any number of recipients. As most people know the same exists in data communications.

    Broadcast communication is frowned upon by network admins because they spend a huge portion of their budget trying to provide bandwidth using network switches, only to have this all defeated by broadcast traffic being delivered to every segment of their LANs. Broadcast communication also causes an interrupt and the associated processing on every node on the connected LAN, always. Ones Ethernet hardware, for example, cannot determine whether the host is interested in any particular broadcast packet and must therefore deliver the packet to the upper protocol layers to make the decision. This is the reason Doom 1.1 network games were banned on many LANs. The number of broadcasts used caused such high interrupt processing loads on all the hosts on networks where it was played. Thankfully, Doom 1.2 came along to avert boredom during my time at university.

    Where broadcasting is a mechanism intended to deliver data to all hosts on a network or subnetwork, multicasting is a mechanism to deliver data to a group of interested hosts on a network. Many network adapters provide some sort of rudimentary multicast filtering. In many cases, a host not interested in a particular multicast group will not even be interrupted by its network hardware.

    In the TCP/IP protocol family, UDP is used for broadcast and multicast (and some unicast) traffic. As a result, broadcast and multicast traffic is datagram based and non-reliable.

    Reliability, datagram vs stream based and unicast vs multicast/broadcast traffic are all orthogonal concepts. It is not inconceivable to have a reliable, stream based multicast protocol, or any other combination of those features.

    D. IP Multicast addresses

    All class D IP addresses are multicast addresses. Class D IP addresses are those that begin with 1110, that is, all addresses from 224.0.0.0 to 235.255.255.255. Some are pre-assigned for specific applications, but most are available for forming ad hoc multicast groups. There is a mapping between IP multicast addresses and Ethernet addresses, described in RFC1112: "An IP host group address is mapped to an Ethernet multicast address by placing the low-order 23-bits of the IP address into the low-order 23 bits of the Ethernet multicast address 01-00-5E-00-00-00 (hex). Because there are 28 significant bits in an IP host group address, more than one host group address may map to the same Ethernet multicast address."

    (C)opyright Maximum Solutions, South Africa

    Reprint Rights. Copyright subsists in all the material included in this email, but you may freely share the entire email with anyone you feel may be interested, and you may reprint excerpts both online and offline provided that you acknowledge the source as follows: This material from The Java(tm) Specialists' Newsletter by Maximum Solutions (South Africa). Please contact Maximum Solutions for more information.

    Java is a trademark or registered trademark of Sun Microsystems, Inc. in the United States and other countries. Maximum Solutions is independent of Sun Microsystems, Inc.