Skip to main content

NHibernate - Using ADO to avoid congestion on ingestion

As I've been learning NHibernate over the past week I've found the actually functionality of it to be incredibly clean and simple once the framework is in. For example, I have some of the following functions in my DAO:

  1.         /// <summary>
  2.         /// Gets the customer for a specific ID.
  3.         /// </summary>
  4.         /// <param name="customerId">The ID of the customer to retrieve.</param>
  5.         /// <returns>A customer for a given ID.</returns>
  6.         [Transaction(ReadOnly = true)]
  7.         public CustomerGet(int customerId)
  8.         {
  9.             return CurrentSession.Get<Customer>(customerId);
  10.         }
  11.  
  12.         /// <summary>
  13.         /// Retrieves all customers.
  14.         /// </summary>
  15.         /// <returns>An IList of customers.</returns>
  16.         [Transaction(ReadOnly = true)]
  17.         public IList<Customer> GetAll()
  18.         {
  19.             ICriteria criteria = CurrentSession.CreateCriteria<Customer>();
  20.             return criteria.List<Customer>();
  21.         }

Usage in code to retrieve all Customers:

  1.     IList<Customer> = customerDao.GetAll();

No SQL written whatsoever, NHibernate does it all.

Awesome...but...

Although it's just as insanely easy to do an insert:

  1.     CurrentSession.Save(customer);

This is NOT GOOD for ingesting large amounts of data. Moving on, my current project has to do with tokenizing, parsing, and inserting large amounts of unstructured data. On a test base of 1000 documents I started this puppy up thinking it'd be a bit slower than if I did straight ADO but with using the NHibernate ORM it would still be worth it. I started up my Integration test and left it running overnight. By the time I got back, I was at document #5....5...that's right...5! I was doing a few selects on my ingestion that were taking a long time. So I got rid of those and lowered my test base down to about 100 documents instead of 1000. Still, 3 hours.

So...I poked around looking at ways to optimize NHibernate...setting batches, turning off logging, etc... still, extremely long ingest time. So I started looking for ways of doing some sort of bulk insert. But, I'm not pulling structured data from a CSV or another database...I'm doing tokenizing and parsing per document and reusing some data via foreign keys.

The only choice I saw was ADO, but one of the beauties of NHibernate was that between it and Spring.Net I haven't so much had to worry about transactions or connections. With proper Spring setup most of it's pretty much taken care of. So how was I to deal with properly transactioning ADO and supplying it the connection all in a modular way? I was planning on using Spring and it's a TransactionInterceptor and trying to figure out a way to do all that without interfering with NHibernate's transactions when I stumbled upon a huge breakthrough...

This wonderful blog post here by Joshua Lockwood talks about enlisting ADO calls into NHibernate's transaction and connection. My ADO calls now use the transaction and connection that nhibernate is currently using.

Here's an example of how it works. Take special notice of lines 9 and 13 where I'm getting both the connection and the active Transaction from NHibernate without doing any additional project setup.

  1.         /// <summary>
  2.         /// Saves a particular customer.
  3.         /// </summary>
  4.         /// <param name="entry">The customer to save.</param>
  5.         /// <returns>The id of the saved document.</returns>
  6.         [Transaction]
  7.         public int Save(Customer entry)
  8.         {
  9.             IDbConnection connection = CurrentSession.Connection;
  10.  
  11.             using (IDbCommand cmd = connection.CreateCommand())
  12.             {
  13.                 CurrentSession.Transaction.Enlist(cmd);
  14.                 cmd.CommandText = "INSERT INTO dbo.Customers(UserName, FirstName, LastName) VALUES (@username, @firstname, @lastname); select SCOPE_IDENTITY();";
  15.  
  16.                 IDbDataParameter username= cmd.CreateParameter();
  17.                 cmd.Parameters.Add(username);
  18.                 username.Value = entry.UserName;
  19.                 username.ParameterName = "@username";
  20.  
  21.                 IDbDataParameter firstname= cmd.CreateParameter();
  22.                 cmd.Parameters.Add(firstname);
  23.                 firstname.Value = entry.FirstName;
  24.                 firstname.ParameterName = "@firstname";
  25.  
  26.                 IDbDataParameter lastname= cmd.CreateParameter();
  27.                 cmd.Parameters.Add(lastname);
  28.                 lastname.Value = entry.LastName;
  29.                 lastname.ParameterName = "@lastname";
  30.  
  31.  
  32.                 entry.Id = Convert.ToInt32(cmd.ExecuteScalar());
  33.             }
  34.  
  35.             return entry.Id;
  36.         }

This method took my test ingest of 1000 documents, with around 450,000 entries into the database and took it from HOURS down to 508.6627236 Seconds