Recently I was asked to look into the stability aspects of a J2EE application. It had started to crash taking the Oracle DB server along with it during peak production load. An analysis of the Java Stack Trace indicated that almost all the hung threads were performing searches. Application was not able to take a load of 12 search requests per minute. As search request came into the system they kept on accumulating and degraded the performance of the system even more. Some of the searches ran for as long as 20 minutes. The search queries were using Oracle text search using the context index where the number of documents was around 900,000 and the size of the context index around 900MB. After a few load tests and analysis of the stats pack report what we found was that the search queries were spending more than 95% of time in I/O. So I focused the tuning exercise to reduce the disk I/O by reducing the buffer gets and increasing the buffer hit rate. This essentially means: –
- Reduce the number of rows on which oracle will operate to execute the query
- Reduce the size of each row
- Keep the most frequently accessed rows in memory
To achieve this; I did the following optimizations: –
1. Added the “FIRST_ROWS(n)” Hint to the query.
This documentation from oracle suggests that it should improve performance.
2. Generated a report on the Context index to discover any scope of optimizations
The report can be generated as described in the following section:-
This report gave very useful insights; particularly
- Number of $I rows, which was very large
- Most frequent tokens
- Fragmentation level of the index
Many of our queries searched on codes within the documents which have “/” as a part of the code. From this report, I found that Oracle by default breaks this one code into two tokens. For e.g. “A/B” which we intended to be stored as a single token was broken into two tokens “A” & “B”. So now, oracle has to work on a lot more number of $I rows when we search on “A/B”. Taking a lead from this I defined a printjoin with a “/” and a custom lexer to use the printjoin and rebuild the index.
3. Disabled “storage in row” for the CLOB
By default oracle stores CLOB data in-line; i.e. in the row if its size is less than 4K. So, if oracle fetches the row it fetches all the CLOB data as well. In our case most of the search text was less than 4k and so most of it was stored inline. This was one of the major contributors to the I/O. I disabled storage in row which greatly helped in reducing the size of individual row as well as the amount of I/O required. This is suggested in the following section of Oracle text search documentation.
4. Partitioned the table by Range and created a partitioned context index (search index)
I partitioned the table and the index by range as suggested by the following documentation.
This helped in reducing the number of rows on which oracle operates by leveraging partition pruning. Also, it reduced the effective size of the search index as we limited searches only to one partition.
5. Defined a keep pool equal to the size of the documents table & configured the documents table to be kept in the keep pool.
Most of the previous optimizations helped in greatly reducing the amount of I/O by reducing the size of the table as well as by breaking the table & search index into multiple partitions. Now, since we have much less amount of working data, if we can cache all of it in memory then the performance can improve significantly.
This was achieved by defining a keep pool big enough to store the latest partitions of the table & the index. More details on how this can be done as well as how the index can be preloaded can be found in the following article.
The results were impressive. We did a load test with twice the production load i.e. 24 search requests per minute. Before the optimizations the system crashed within 500 search requests and the successful requests had an average response time of 5 minutes. After the final optimizations; searches completed with an average response time of 350 ms.