Skip to main content

NHibernate - Using ADO to avoid congestion on ingestion

As I've been learning NHibernate over the past week I've found the actually functionality of it to be incredibly clean and simple once the framework is in. For example, I have some of the following functions in my DAO:

  1.         /// <summary>
  2.         /// Gets the customer for a specific ID.
  3.         /// </summary>
  4.         /// <param name="customerId">The ID of the customer to retrieve.</param>
  5.         /// <returns>A customer for a given ID.</returns>
  6.         [Transaction(ReadOnly = true)]
  7.         public CustomerGet(int customerId)
  8.         {
  9.             return CurrentSession.Get<Customer>(customerId);
  10.         }
  11.  
  12.         /// <summary>
  13.         /// Retrieves all customers.
  14.         /// </summary>
  15.         /// <returns>An IList of customers.</returns>
  16.         [Transaction(ReadOnly = true)]
  17.         public IList<Customer> GetAll()
  18.         {
  19.             ICriteria criteria = CurrentSession.CreateCriteria<Customer>();
  20.             return criteria.List<Customer>();
  21.         }

Usage in code to retrieve all Customers:

  1.     IList<Customer> = customerDao.GetAll();

No SQL written whatsoever, NHibernate does it all.

Awesome...but...

Although it's just as insanely easy to do an insert:

  1.     CurrentSession.Save(customer);

This is NOT GOOD for ingesting large amounts of data. Moving on, my current project has to do with tokenizing, parsing, and inserting large amounts of unstructured data. On a test base of 1000 documents I started this puppy up thinking it'd be a bit slower than if I did straight ADO but with using the NHibernate ORM it would still be worth it. I started up my Integration test and left it running overnight. By the time I got back, I was at document #5....5...that's right...5! I was doing a few selects on my ingestion that were taking a long time. So I got rid of those and lowered my test base down to about 100 documents instead of 1000. Still, 3 hours.

So...I poked around looking at ways to optimize NHibernate...setting batches, turning off logging, etc... still, extremely long ingest time. So I started looking for ways of doing some sort of bulk insert. But, I'm not pulling structured data from a CSV or another database...I'm doing tokenizing and parsing per document and reusing some data via foreign keys.

The only choice I saw was ADO, but one of the beauties of NHibernate was that between it and Spring.Net I haven't so much had to worry about transactions or connections. With proper Spring setup most of it's pretty much taken care of. So how was I to deal with properly transactioning ADO and supplying it the connection all in a modular way? I was planning on using Spring and it's a TransactionInterceptor and trying to figure out a way to do all that without interfering with NHibernate's transactions when I stumbled upon a huge breakthrough...

This wonderful blog post here by Joshua Lockwood talks about enlisting ADO calls into NHibernate's transaction and connection. My ADO calls now use the transaction and connection that nhibernate is currently using.

Here's an example of how it works. Take special notice of lines 9 and 13 where I'm getting both the connection and the active Transaction from NHibernate without doing any additional project setup.

  1.         /// <summary>
  2.         /// Saves a particular customer.
  3.         /// </summary>
  4.         /// <param name="entry">The customer to save.</param>
  5.         /// <returns>The id of the saved document.</returns>
  6.         [Transaction]
  7.         public int Save(Customer entry)
  8.         {
  9.             IDbConnection connection = CurrentSession.Connection;
  10.  
  11.             using (IDbCommand cmd = connection.CreateCommand())
  12.             {
  13.                 CurrentSession.Transaction.Enlist(cmd);
  14.                 cmd.CommandText = "INSERT INTO dbo.Customers(UserName, FirstName, LastName) VALUES (@username, @firstname, @lastname); select SCOPE_IDENTITY();";
  15.  
  16.                 IDbDataParameter username= cmd.CreateParameter();
  17.                 cmd.Parameters.Add(username);
  18.                 username.Value = entry.UserName;
  19.                 username.ParameterName = "@username";
  20.  
  21.                 IDbDataParameter firstname= cmd.CreateParameter();
  22.                 cmd.Parameters.Add(firstname);
  23.                 firstname.Value = entry.FirstName;
  24.                 firstname.ParameterName = "@firstname";
  25.  
  26.                 IDbDataParameter lastname= cmd.CreateParameter();
  27.                 cmd.Parameters.Add(lastname);
  28.                 lastname.Value = entry.LastName;
  29.                 lastname.ParameterName = "@lastname";
  30.  
  31.  
  32.                 entry.Id = Convert.ToInt32(cmd.ExecuteScalar());
  33.             }
  34.  
  35.             return entry.Id;
  36.         }

This method took my test ingest of 1000 documents, with around 450,000 entries into the database and took it from HOURS down to 508.6627236 Seconds

Comments

Awesome!

I love crazy performance improvements like that. Did you try out the bulk data copier? I'd be interested in seeing if that
yielded any further improvements. You'd have to change your Save method so that it accepted a collection of domain objects thought.

Thought about it but didn't.

Thought about it but didn't. The first couple rounds of inserts I do required that I get their primary key id back so I can set the foreign keys of other objects since I'm not using NHibernate for the inserts any more. I thought about using it only for a portion of the inserts where I already have all the data and don't need it back. It could still possibly get large improvement. I'll probably look into it sometime.

NHibernate

I don't have an idea about the NHibernate and I have searched it in essay-writing-services-online.blogspot.com. But I didn't get the idea about the NHibernate. After reading this post, I got a clear idea about this. Thank for sharing this information.

Being one of the world’s top

Being one of the world’s top 3 best search engines, We can expect an excellent service from Yahoo. Few of the best service are like Flickr, which is an image and video hosting site which was owned by Yahoo on March 20th, 2005. And similarly, forgot yahoo password

Do you have your own

Do you have your own transportation vehicles and necessary equipments?
http://topaussiereviews.com/
The good moving company has its own especially designed goods carriers and moving trucks for safe vehicles of goods from one location to another. A good company has its own necessary equipments for easy packing, loading, unloading and undoing of goods.

You can watch your favorite

You can watch your favorite videos or movies on the internet but, there are very limited websites that have the movies of your choice. It might be possible that you don’t get the link for watching your favorite movie on the internet as well. In that case, vivavideo app for pc

I am really enjoying reading

I am really enjoying reading your well written articles.
I think you spend numerous effort and time updating your site. I have bookmarked it and I am taking a look ahead to reading new articles.

Paul

Ben benieuwd hoe het met je gaat. Je log lijkt een tijdelijke break ondergaan te hebben. Ben zelf ook ruim 4 maanden niet actief geweest, maar heb de draad weer opgepakt.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <asp>, <c>, <cpp>, <cs>, <css>, <drupal5>, <drupal6>, <html4strict>, <java>, <javascript>, <jquery>, <php>, <python>, <ruby>, <sql>, <xml>. The supported tag styles are: <foo>, [foo]. PHP source code can also be enclosed in <?php ... ?> or <% ... %>.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.