2002-05-24 The Java Specialists' Newsletter [Issue 047] - Lack of Streaming leads to Screaming


Author: Dr. Heinz M. Kabutz

If you are not already subscribed to this newsletter, please send an email to subscribe@javaspecialists.co.za. Be warned that if you are a beginner in Java, you will at times struggle to keep up. The archive of past newsletters is kept at http://www.javaspecialists.co.za

Welcome to the 47th edition of The Java(tm) Specialists' Newsletter, read by over 3600 Java programmers in 82 countries. I have put my Mauritius trip with photos on my website http://www.javaspecialists.co.za, please have a look under "Courses". Warning: prolonged looking at that webpage is known to cause envy - please enter at own risk!

I am planning Java and Design Patterns courses in South Africa for June. Please have a look at our website for more information.

Is your company thinking of venturing into new business but you do not have the necessary resources? Are you scared of subcontracting work to a non-English speaking country due to the communication problems involved? (no offense to my non-English readers - this is my advert, ok? ;-) Then South Africa is your dream come true. In South Africa we speak and write English fluently, so you will not have the typical communication problems that you would find in non-English speaking countries. Our software developers are highly skilled, very good at solving problems and able to pull up their sleeves and get to work. Internationally, South Africans are known for their hard work and dedication to the task at hand. If your company is looking for such resources, please contact me by simply replying to this email, and I will personally see to it that you are contacted within 24 hours.

In the last two weeks I received two questions from readers who ran out of memory when trying to read a big object from the database. In this newsletter I want to explore how you can read a big object from a database without killing your poor JVM, and be scalable as well.

Lack of Streaming leads to Screaming

How do you retrieve big objects from the database in Java? Say you have a database containing previews of movies in DivX format, stored as IMAGE columns. How do you retrieve the 25 megabyte file from the database using JDBC?

Simple. We write a
SELECT statement, execute it, and say result_set.getBytes(1). We run the code and it works well for small movie snippets, but as soon as we have a 25 megabyte file, our poor JVM throws an OutOfMemoryError. What's annoying about an OutOfMemoryError is that the stack trace is not filled in (because hey, you've run out of memory!), so you cannot exactly determine where the error occurred, unless you add trace logging. What makes it even more tricky is that some JDBC drivers try to be too clever, resulting in OutOfMemoryErrors.

Let's look at some test code. I have written two test classes,
TestDatabaseBlobInsert and TestDatabaseBlobFetch. What surprised me was that of the several drivers that I tested (DataDirect, iNet SPRINTA, Avenir, MS SQL Server Type 4, JDBC/ODBC bridge), the JDBC/ODBC bridge was the fastest for inserting big objects into MS SQL Server. For fetching the data it was the slowest.


  import java.sql.DriverManager;
  import java.sql.Statement;
  import java.sql.SQLException;
  import java.sql.Connection;
  import java.sql.PreparedStatement;
  import java.io.ByteArrayInputStream;

  public class TestDatabaseBlobInsert {
    private static final String TABLE_DROP =
      "DROP TABLE MovieArchive";
    private static final String TABLE_CREATE =
      "Create Table MovieArchive (moviedata image, title varchar(255))";
    private static final String TABLE_INSERT =
      "INSERT INTO MovieArchive (title, moviedata) VALUES (?,?)";

    private static final int size = 25 * 1024 * 1024;
    private final byte[] data = new byte[size];

    private final Connection con;

    public TestDatabaseBlobInsert(String driver, String url,
        String user, String password)
        throws SQLException, ClassNotFoundException {
      Class.forName(driver);
      con = DriverManager.getConnection(url, user, password);
      System.out.println("Driver: " + driver);
      for (int i=0; i<data.length; i++)
        data[i] = (byte)(Math.random()*255);
    }

    public void setUp() throws SQLException {
      Statement st = con.createStatement();
      try {
        System.out.println("Dropping old table");
        st.executeUpdate(TABLE_DROP);
      } catch(SQLException ex) {} // table might not exist
      System.out.println("Creating new table");
      st.executeUpdate(TABLE_CREATE);
      st.close();
    }

    public void testInsertWithBinaryStream() throws SQLException {
      long start = -System.currentTimeMillis();
      System.out.println("Inserting via BinaryStream");
      PreparedStatement stmt = con.prepareStatement(TABLE_INSERT);
      ByteArrayInputStream bis = new ByteArrayInputStream(data);
      stmt.setString(1, "Babe");
      stmt.setBinaryStream(2, bis, data.length);
      stmt.executeUpdate();
      start += System.currentTimeMillis();
      System.out.println("That took " + start + "ms");
      stmt.close();
    }

    public void testInsertWithSetBytes() throws SQLException {
      long start = -System.currentTimeMillis();
      System.out.println("Inserting via setBytes()");
      PreparedStatement stmt = con.prepareStatement(TABLE_INSERT);
      stmt.setString(1, "On Her Majesty's Secret Service");
      stmt.setBytes(2, data);
      stmt.executeUpdate();
      start += System.currentTimeMillis();
      System.out.println("That took " + start + "ms");
      stmt.close();
    }

    public void testAll() throws SQLException {
      setUp();
      testInsertWithBinaryStream();
      testInsertWithSetBytes();
    }

    public static void main(String[] args) throws Exception {
      if (args.length != 4) usage();
      TestDatabaseBlobInsert test = new TestDatabaseBlobInsert(
        args[0], args[1], args[2], args[3]);
      test.testAll();
    }

    private static void usage() {
      System.out.println(
        "Usage: TestDatabaseBlobInsert driver url username password");
      System.exit(1);
    }
  }

I ran this code by setting up an ODBC source pointing to the MS SQL Server database Movies, and then running it with a maximum heap space of 256MB:


  java -Xmx256m -classpath . TestDatabaseBlobInsert sun.jdbc.odbc.JdbcOdbcDriver jdbc:odbc:Movies sa ""

The result on my little notebook was the following:


  Driver: sun.jdbc.odbc.JdbcOdbcDriver
  Dropping old table
  Creating new table
  Inserting via BinaryStream
  That took 78975ms
  Inserting via setBytes()
  That took 73419ms

Back to the issue at hand - how do we get this data out of the database? The seemingly easiest way is to do the following:


1:  PreparedStatement st = con.prepareStatement(
    "SELECT moviedata FROM MovieArchive WHERE title = ?");
2:  st.setString(1, "Babe");
3:  ResultSet rs = st.executeQuery();
4:  if (rs.next()) {
5:    byte[] data = rs.getBytes(1);
  }

This code can easily cause an OutOfMemoryError if the available heap memory is less than the size of the data that you are reading. Now for the 1'000'000 dollar question: Where does OutOfMemoryError occur? That depends on your driver. If you are using the iNet SPRINTA or the Avenir drivers, then you will run out of memory on line 4, i.e. when you call rs.next(). If you are using the DataDirect, Microsoft or ODBC bridge drivers, you will only get the out of memory error on line 5.

How can we write this so that we won't get an out of memory error? Here is some sample code. It is very important that you read the data blocks of bytes at a time, rather than in one big chunk, otherwise your system will definitely not scale to support many users.


  import java.sql.*;
  import java.io.*;

  public class TestDatabaseBlobFetch {
    private static final String TABLE_SELECT =
      "SELECT moviedata FROM MovieArchive WHERE title = ?";

    private final Connection con;

    public TestDatabaseBlobFetch(String driver, String url,
        String user, String password)
        throws SQLException, ClassNotFoundException {
      Class.forName(driver);
      con = DriverManager.getConnection(url, user, password);
      System.out.println("Driver: " + driver);
    }

    public void testSelectBlocksAtATime() throws SQLException {
      long start = -System.currentTimeMillis();
      System.out.println("SELECT: 64kb blocks at a time");
      PreparedStatement stmt = con.prepareStatement(TABLE_SELECT);
      stmt.setString(1, "Babe");
      ResultSet rs = stmt.executeQuery();
      int count=0;
      if (rs.next()) {
        try {
          System.out.println("Retrieving Data");
          OutputStream out = new BufferedOutputStream(
            new FileOutputStream("Data.1"));
          InputStream in = new BufferedInputStream(
            rs.getBinaryStream(1));
          byte[] buf = new byte[65536];
          int i;
          while((i = in.read(buf, 0, buf.length)) != -1) {
            out.write(buf, 0, i);
            count += i;
          }
          out.close();
        } catch(IOException ex) { ex.printStackTrace(); }
      }
      System.out.println("fetched " + count + " bytes");
      start += System.currentTimeMillis();
      System.out.println("That took " + start + "ms");
      stmt.close();
    }

    public void testSelectWithGetBytes() throws SQLException {
      long start = -System.currentTimeMillis();
      System.out.println("SELECT: all at once");
      PreparedStatement stmt = con.prepareStatement(TABLE_SELECT);
      stmt.setString(1, "Babe");
      ResultSet rs = stmt.executeQuery();
      byte[] data = null;
      if (rs.next()) {
        System.out.println("Retrieving Data");
        data = rs.getBytes(1);
        try {
          FileOutputStream out = new FileOutputStream("Data.2");
          out.write(data, 0, data.length);
          out.close();
        } catch(IOException ex) { ex.printStackTrace(); }
      }
      System.out.println("fetched " + data.length + " bytes");
      start += System.currentTimeMillis();
      System.out.println("That took " + start + "ms");
      stmt.close();
    }

    public void testAll() throws SQLException {
      testSelectBlocksAtATime();
      testSelectWithGetBytes();
    }

    public static void main(String[] args) throws Exception {
      if (args.length != 4) usage();
      TestDatabaseBlobFetch test = new TestDatabaseBlobFetch (
        args[0], args[1], args[2], args[3]);
      test.testAll();
    }

    private static void usage() {
      System.out.println(
        "usage: TestDatabaseBlobFetch driver url username password");
      System.exit(1);
    }
  }

I tried this with several JDBC drivers, the only Type 4 driver that worked correclty was the DataDirect driver (now released under the Microsoft label). I will not go into the differences between the Microsoft driver and the others, that's for another article. If you want to try this out, you can run it like this:


  java -Xmx2m -classpath .;msbase.jar;mssqlserver.jar;msutil.jar
    TestDatabaseBlobFetch com.microsoft.jdbc.sqlserver.SQLServerDriver
    jdbc:microsoft:sqlserver://localhost:1433;DatabaseName=Movies sa ""

Naturally you have to download the Microsoft SQL Server Type 4 driver and put the jar files into the directory from which you are running this code. The output from using the DataDirect Microsoft driver is the following on my machine:


  Driver: com.microsoft.jdbc.sqlserver.SQLServerDriver
  SELECT: 64kb blocks at a time
  Retrieving Data
  fetched 26214400 bytes
  That took 62746ms
  SELECT: all at once
  Retrieving Data
  Exception in thread "main" java.lang.OutOfMemoryError
          <<no stack trace available>>

The iNet SPRINTA driver falls over much sooner - actually when you call rs.next():


  Driver: com.inet.tds.TdsDriver
  SELECT: 64kb blocks at a time
  Exception in thread "main" java.lang.OutOfMemoryError
          <<no stack trace available>>


What is wrong here?

I am finding it very hard to think of a reason to store 25mb files in a database. They are too big to stay in the database's cache for very long. I think that the design is flawed to start with. I would personally rather store the URL to the file in the database, instead of the actual data, and then retrieve the data directly from the file system.

I don't know all the conditions why someone would want to do that, but just remember, you have to stream such big data out of the database chunk by chunk, otherwise you have a serious problem.

Until the next issue ...

Heinz


(C)opyright Maximum Solutions, South Africa

Reprint Rights. Copyright subsists in all the material included in this email, but you may freely share the entire email with anyone you feel may be interested, and you may reprint excerpts both online and offline provided that you acknowledge the source as follows: This material from The Java(tm) Specialists' Newsletter by Maximum Solutions (South Africa). Please contact Maximum Solutions for more information.

Java is a trademark or registered trademark of Sun Microsystems, Inc. in the United States and other countries. Maximum Solutions is independent of Sun Microsystems, Inc.