/ Connection pool leak using Hibernate 3.2.3 and Weblogic 10.3 ~ Java EE Support Patterns

3.02.2011

Connection pool leak using Hibernate 3.2.3 and Weblogic 10.3

This case study describes the complete steps from root cause analysis to resolution of a JDBC connection pool leak problem experienced with Oracle Weblogic 10.3 and using the open source Java persistence framework Hibernate 3.2.3.

This case study will also demonstrate the importance of best coding practices when using Hibernate and how a simple code problem can have severe consequences on a production environment.

Environment specifications
 
·         Java EE server: Oracle Weblogic Portal 10
·         OS: AIX 5.3 TL9 64-bit
·         JDK: IBM JRE 1.6.0 SR2 64-bit
·         RDBMS: Oracle 11g
·         Persistence API: Hibernate 3.2.3
·         Platform type: Portal application


Monitoring and troubleshooting tools
·         Enterprise internal remote JMX client monitoring tool

Problem overview
 
·         Problem type: JDBC Connection Pool Leak

JDBC Data Source Connection Pool depletion was observed in our production environment following an upgrade from Weblogic Portal 8.1 to Weblogic Portal 10.3.

Initial problem mitigation did involve restarting all the Weblogic managed servers almost every hour until decision was taken to rollback to the previous Weblogic Portal 8.1 environment.

Gathering and validation of facts

As usual, a Java EE problem investigation requires gathering of technical and non technical facts so we can either derived other facts and/or conclude on the root cause. Before applying a corrective measure, the facts below were verified in order to conclude on the root cause:

·         What is the client impact? HIGH
·         Recent change of the affected platform? Yes, the application was migrated recently from Weblogic Portal 8.1 to Weblogic Portal 10.3. Also, the application code was migrated from EJB Entity Beans to Hibernate for the read and write operations to the Oracle database
·         Any recent traffic increase to the affected platform? No
·          Since how long this problem has been observed?  Right after the upgrade project deployment
·          Is the JDBC Connection Pool depletion happening suddenly or over time? It was observed via our internal enterprise JMX client monitoring tool that the connection pool is increasing over time at a fast rate. The current pool capacity is 25
·         Did a restart of the Weblogic server resolve the problem? No, rollback to the previous environment was required

·         Conclusion #1: The problem is related to a JDBC Connection Pool leak of the primary application JDBC data source
·         Conclusion #2: This problems correlates with the Weblogic upgrade and migration of our application code to Hibernate

JDBC Pool utilization

The history of JDBC Pool utilization was captured using our internal enterprise remote JMX monitoring tool. Such monitoring is achieved by connecting remotely to the Weblogic server and pull detail from the runtime Weblogic MBeans:

StringBuffer canonicalObjNameBuffer = new StringBuffer();

// MBean query
canonicalObjNameBuffer.append("com.bea:Name=");
canonicalObjNameBuffer.append(jdbcDSName);
canonicalObjNameBuffer.append(",ServerRuntime=");
canonicalObjNameBuffer.append(instanceName);

canonicalObjName = CanonicalObjectNameObjectCacheFactory.getInstance().getCanonicalObjectName(canonicalObjNameBuffer.toString(), null);

// JDBC data source MBean metrics extraction
int activeConnectionCount = (Integer)adapter.getJMXService().getJMXConnection().getMBeanAttribute(canonicalObjName, "ActiveConnectionsCurrentCount");
int leakedConnectionCount = (Integer)adapter.getJMXService().getJMXConnection().getMBeanAttribute(canonicalObjName, "LeakedConnectionCount");
long reservedRequestCount = (Long)adapter.getJMXService().getJMXConnection().getMBeanAttribute(canonicalObjName, "ReserveRequestCount");
int connectionDelayTime = (Integer)adapter.getJMXService().getJMXConnection().getMBeanAttribute(canonicalObjName, "ConnectionDelayTime");

The graph below represents a history of the ActiveConnectionsCurrentCount MBean metric. This corresponds to your current connection pool utilization.  


The results were quire conclusive as it did reveal such connection pool leak and some sudden surge leading to full depletion.

Error log review

The log review did reveal the following error during these 2 episodes of connection leak surge. There was a very good correlation with # of errors found in these logs and the # of leaked connections increase observed from graph. The error was thrown during execution of the Hibernate Session.flush() method due to null value injection.

Unexpected exception has occured: org.hibernate.PropertyValueException: not-null property references a null or transient value: app.AppComponent.method UnExpected Exception has been occured:
org.hibernate.PropertyValueException: not-null property references a null or transient value: app.AppComponent.method
               at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
               at org.hibernate.event.def.DefaultFlushEntityEventListener.scheduleUpdate(DefaultFlushEntityEventListener.java:263)
               at org.hibernate.event.def.DefaultFlushEntityEventListener.onFlushEntity(DefaultFlushEntityEventListener.java:121)
               at org.hibernate.event.def.AbstractFlushingEventListener.flushEntities(AbstractFlushingEventListener.java:196)
               at org.hibernate.event.def.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:76)
               at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:26)
               at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
               at app.DaoComponent.insert

Weblogic Profile Connection Leak

The next step was to enable to Profile Connection Leak; similar to what we did with another JDBC connection pool leak using Weblogic 9.0.


The exercise did reveal the source of the leak from one of our application DAO using Hibernate; same DAO component which was generating errors in our log due to null value injection.

Hibernate best practices and code review

Proper coding best practices are important when using Hibernate to ensure proper Session / Connection resource closure. Any Hibernate Session along with JDBC resource must be closed in a finally{} block to properly handle any failure scenario. Such finally{} block must also be shielded against any failure condition in order to guarantee closure of the Hibernate session by adding proper try{} catch{} block when applicable.

The code analysis did reveal a problem with the Hibernate Session / Connection closure code; bypassing the execution of the Hibernate session.close() in the scenario an Exception is thrown during execution of session.flush().

finally {
       if (session != null) {
       session.flush();  
       session.close();
       }
}

Root cause

The root cause of the connection leak was concluded as a combination code defect within our DAO component. The code portion that is taking care of the Hibernate session flush was not embedded in a try{} catch{} block. Any failure of the Session.flush() was bypassing closure of the Hibernate session; triggering a leak of the associated JDBC connection.

Solution and results

A code fix was applied to the problematic DAO component as per below and deployed to our production environment following proper testing.

BEFORE
Hibernate Session Closure Code Problem
AFTER

Hibernate Session Closure Code Fix

The results were quite conclusive and did confirm a complete resolution of the JDBC connection pool leak.

Conclusion and recommendations

·         Perform regular code walkthrough of any new code with an increased focus for any new code dealing with JDBC and third party API implementation such as Hibernate. Always ensure that your JDBC and Hibernate related resource closure code is bullet proof against any negative scenario.
·          Perform regular monitoring of your production Weblogic JDBC Pool utilization, ensure no JDBC Pool leak is present (proactive monitoring). Same exercise should be done in your load testing environment before deploying a major release.

2 comments:

Great article! and right to the point!
this helped me understand the problems we're facing on the same context after upgrading from WL8 -> 10.3 and EJB to Hibernate..
but my only comment is that with WL10, the server should've being able to resolve this problem irregardless. This does show that the server isn't that promising!

Thanks Joey for your comments,

Actually Weblogic 10.3 has a workaround tuning you can implement at the JDBC Connection Pool level allowing it to get back control of leaking JDBC Connections. It can be used as a workaround while investigating the root cause of a JDBC Connection Pool leak.

Please let me know if you need further detail and help on this problem.

Thanks.
P-H

Post a Comment