Friday, September 28, 2012

Virtual Directory Performance Tuning Guidelines

In its simplest deployment possible, a Virtual directory has a listener, a server component and an adapter that talks to a backend target. In such a deployment, the Virtual directory only plays the role of being a proxy that receives a request, forwards it to the target and sends the response back from the target to the client.

In such a deployment, one can still encounter performance issues if OVD isn’t tuned adequately.

Performance of OVD depends on the following factors:

• OS tuning
• Server Processors cores
• JVM tuning
• OVD server configuration (threads, work queue capacity) adsad
• Data size of requests issued to Target
• The performance of backend systems (directories, DBs, proprietary stores) that OVD is virtualizing.

Before you conduct any tuning, gather a baseline performance metrics for overall solution. Follow these steps to gather these base line numbers:

1.     Start with the official documentation. It is a good reference for tuning OVD.
2.     Collect a sampling of requests that are likely to be sent to OVD by your intended client applications.
3.     Test OVD by manually issuing each of these requests to confirm that the wiring to the Target is proper and that there are no functional issues with OVD or the Target it is talking to.
4.     Disable TRACE level logging for OVD server
5.     Install a load generation tool like Slamd on a server other than the server hosting OVD (I have seen situations where Slamd is installed on the same host as OVD and it tends to consume the CPU capacity there by leaving OVD gasping for CPU)
6.     Configure scripts in Slamd to execute the sampling of requests you collected in step (1) with adequate number of clients.
7.     Gather the OVD access logs and mark down the request/response times.

Look at the Log analysis section below for information on how to parse the OVD access logs. Before you tune OVD, if you notice that the Target itself is taking a long time to respond, work on improving the performance of the target. Usually, the responses from OVD should consume less than a second but depending on the performance of the Target data source, this could vary drastically. There is no appropriate answer. It is very important to note that the performance of the Target data source directly impacts the performance of OVD because OVD is just a proxy in this case.

OS tuning

Allocate adequate number of file handles to the user who owns the OVD process so that OVD can open required number of connections with clients and Targets. The ability of OVD to support a higher number of concurrent client connections is directly based on this. On Unix platforms, it is recommended that you start with a  ulimit of 8192 for the OS user’s environment settings.

Server Processor cores

OVD is multi-threaded by design and can receive multiple requests, process them simultaneously via worker threads. Therefore, the more processors you have, the better it is for OVD.

And OVD can be directly configured to take advantage of these processors by allocating a minimum of 10 to 20 threads per processor. That means, if you have 10 cores, configure up to 200 threads via OVD configuration.

JVM tuning

Update your JVM to the latest minor version. As of the writing of this article, the latest JDK version is 1.6_035. The minor version is 35.

The default heap size for a OVD is 512MB upon installation. For a production environment, configure a heap size of at least 1GM and make sure that the min and the max heap size is set to the same value.  On a 64-bit OS, you can increase the heap size beyond 3.6 GB but Full GCs can cost you significant performance hits.  Unless your request sizes are big or OVD is running out of memory, I do not recommend increasing the heap size beyond 2 GB.

Make sure the JVM is configured to start with –server option. Otherwise, OVD is running in a client mode.

Here is a snippet of opmn.xml

</ias-component><ias-component id="ovd1">
   <process-type id="OVD" module-id="OVD">
     <module-data>
       <category id="start-options">
           <data id="java-bin" value="$ORACLE_HOME/jdk/bin/java"/>
           <data id="java-options" value="-server -Xms512m -Xmx512m
-Dvde.soTimeoutBackend=0
-Didm.oracle.home=$ORACLE_HOME
-Dcommon.components.home=$ORACLE_HOME/../oracle_common
-Doracle.security.jps.config=$ORACLE_INSTANCE/config/JPS/jps-config-jse.xml"/>
           <data id="java-classpath" value="$ORACLE_HOME/ovd/jlib/vde.jar$:$ORACLE_HOME/jdbc/lib/ojdbc6.jar"/>
        </category>
      </module-data>
     <stop timeout="120"/>
    </process-type>
 </ias-component>

I recommend not setting any specific size for Permgen space or young or old generation space. I also recommend that you not specify a particular Garbage collector. I will publish another blog post about GC issues and how to resolve those with a real customer situation I dealt with recently.

OPMN (Oracle Process Monitoring and Notification) server monitors OVD. If you notice that your OVD instance is being restarted abruptly, that means, OPMN is trying to ping OVD but OVD is not responding. Try increasing the polling interval. This is documented in the aforementioned documentation link.

While you can increase the polling interval, it is better to investigate why OVD is not responding and if there is a problem that is preventing OVD from responding.

OVD server tuning

There are three files of significance for OVD tuning. Those are listeners_os.xml , adapters_os.xml and server_os.xml, both located in the directory $ORACLE_INSTANCE/config/OVD/ovd<number>/conf .

listeners_os.xml

<anonymousBind>deny</anonymousBind>

I recommend turning off Anonymous binds. While OVD supports such binds, it is a bad habit to allow any one to bind to OID without a proper userid and password. This is a unnecessary waste of resources on OVD. Even if you have a load balancer, configure the LBR to bind to OVD with a real userid and a password.  This allows you to only permit authorized clients to connect to OVD.

<threads>100</threads>

Set this to a value equal to 10 to 20 times the number of threads per CPU Core available on your hardware server that is hosting OVD. If you have 10 Cores, set this value to 100 or a maximum of 200.

<useNIO>false</useNIO>

At this time, OVD provide only partial support for non-blocking IO in Java. Turn off this parameter.

<workQueueCapacity>8096</workQueueCapacity>

This parameter tells the server to hold requests that cannot be processed by the specified threads for the given listener. If there are more requests than threads, those requests end up in this queue to be processed as soon as a worker thread is available. Set this to a value of 4 to 8K. I would adjust this parameter only if you see that OVD is denying requests (not Anonymous binds of course).

<socketOptions>
  <tcpNoDelay>true</tcpNoDelay>
  ...
 </socketOptions>

By default, this is set to true. This parameter controls buffering so as to support scenarios where there is large amount of data to be returned to a client per request. Unless recommended, you should not set this parameter to false. This ensures that OVD responds as soon as the target responds to a given request.

<socketOptions>…
  <keepAlive>false</keepAlive>
  ...
 </socketOptions>

Turn off keepAlive. This parameter is only required to ensure that there is a tcp keep alive sent to the client to make sure that the connection opened by the client to OVD is still valid. On Linux OS, the timing of this keepAlive parameter is controlled by the OS parameter net.ipv4.tcp_keepalive_time in seconds.

server_os.xml

<inactiveConnectionTimeout>5</inactiveConnectionTimeout>
                                                                                                                
By default, OVD does not close any connections to a client no matter how long the connection is idle. I recommend setting this to a value of 5 minutes so that connections that are idle are automatically closed. In such cases, OVD will close the connection and a FIN will be sent to the client so as to inform the client that the connection is closed by the server. The client can send an ACK and terminate the connection to the server. This parameter is in minutes.

adapters_os.xml

<referals>false</referals>

Turn off referrals. Even if your Target supports referrals, configure OVD not to follow referrals because a request issued to OVD can take far longer than the connection timeout period specified on the client side. In such cases, OVD will still be busy processing the request, while the client is no longer willing to wait for the response. In the worst case, the client decides to reissue the same request on a new connection and that just bogs down OVD by consuming thread after thread for a request that no client is willing to wait for.

<initialPoolSize>50</initialPoolSize>
<maxPoolSize>100</maxPoolSize>

Your maxPoolSize should be equal to the maximum number of concurrent clients you expect OVD to respond to at any given time. But, I do not recommend setting the initialPoolSize to a high value because it can result in a significant number of connections being opened to the target. And if you are using SSL, this is a significant burden on both OVD and the target.


Play around with these parameters and once you have a good idea of the performance of your OVD deployment, you can adjust to your specific needs. It is important to note that OVD will never perform faster than a Target it is wired to. If the Target takes 10 milliseconds to respond to a query, OVD will take 10+x milliseconds to respond to the client. OVD does not cache results and you should never assume that caching will improve your performance. It can improve your performance but it can also create other problems with stale data.

It is easy to acquire stuff and store them but it is much harder to know when and how to get rid of them. What is true in life also applies to caching in OVD. Just don’t assume that OVD will perform better than the Target data source such as a LDAP server or DB.

How many OVD instances should you deploy?

That is a as good as any one’s guess. Without a good understanding of your performance requirements and the performance of a pair of OVD instances (set up for HA), you have no way to find out.

Once you followed the recommendations in this article and if you are still short of performance, and you see that the physical server hosting OVD is more than 50% idle, I would install an additional OVD instance on the same host. Configure your LBR to load balance to this OVD instance also. And that will enhance your performance.

How to keep your OVD healthy and happy?

Some of you who are LDAP administrators know very well that every thing gets blamed on a LDAP server. There used to be a time when Database was always the culprit when it came to performance. No application developer ever wrote code that was poorly designed or no client is ever mis-configured until proven otherwise. Often, LDAP administrators face the burden of having to prove that their LDAP server is indeed responding properly and that it is the client application that is at fault.

Well, it starts with some investment on your part. If you as a LDAP administrator want to deal with such allegations effectively and decisively, start by monitoring the following:

a)    Connections opened by each client to OVD
b)   Queries issued by the client to OVD and the corresponding response times
c)    Hardware server capacity utilization (on Linux, vmstat command is a good starting point)
d)   A sampling of the aforementioned two items during your peak hours and off-peak hours

Oracle Directory Server Enterprise Edition (former Sun DSEE) or Oracle Internet Directory (OID) are excellent at providing such metrics so you can easily isolate the cause for poor performance. OVD access logs provide similar information. Write some scripts to parse this data and generate a summary report.

And look at these reports over a period of time to ensure that you have a proper understanding of your client applications and the behaviours that are normal vs abnormal. In the end, a healthy OVD instance can only deliver good performance to your enterprise applications if you give it the attention needed.

OVD Access log

Access logs for OVD give you information about when a request was sent and a response was sent back to the client. This is OVD’s perspective of the request/response times. A typical access log looks like as follows:


[2012-09-17T11:11:19.259-07:00] [octetstring] [NOTIFICATION] [OVD-20043] [com.octetstring.accesslog] [tid: 66] [ecid: b9af6bb1052db062:da036ce:139c63d427d:-8000-000000000000001c,1:109818:2] conn=41 op=2,628 SRCH base=dc=myorg,dc=mycompany scope=2 filter=(&(uid=userA)(objectclass=inetorgperson))
[2012-09-17T11:11:19.263-07:00] [octetstring] [NOTIFICATION] [OVD-20044] [com.octetstring.accesslog] [tid: 66] [ecid: b9af6bb1052db062:da036ce:139c63d427d:-8000-000000000000001c,1:109818:2] conn=41 op=2,628 RESULT err=0 tag=0 nentries=1 etime=4 dbtime=0 mem=1,564,201/2,232,496
[2012-09-17T11:11:19.265-07:00] [octetstring] [NOTIFICATION] [OVD-20043] [com.octetstring.accesslog] [tid: 29] [ecid: ac28ba5c19ecb1ba:1155e622:139c63d109b:-8000-0000000000000017,1:108956:2] conn=43 op=2,399 SRCH base=dc=myorg,dc=mycompany scope=2 filter=(&(uid=userB)(objectclass=inetorgperson))
[2012-09-17T11:11:19.265-07:00] [octetstring] [NOTIFICATION] [OVD-20043] [com.octetstring.accesslog] [tid: 20] [ecid: b9af6bb1052db062:da036ce:139c63d427d:-8000-000000000000001c,1:109819:2] conn=44 op=1,309 SRCH base=dc=myorg,dc=mycompany scope=2 filter=(&(uid=userC)(objectclass=inetorgperson))
[2012-09-17T11:11:19.270-07:00] [octetstring] [NOTIFICATION] [OVD-20044] [com.octetstring.accesslog] [tid: 29] [ecid: ac28ba5c19ecb1ba:1155e622:139c63d109b:-8000-0000000000000017,1:108956:2] conn=43 op=2,399 RESULT err=0 tag=0 nentries=1 etime=5 dbtime=0 mem=1,564,201/2,232,496
[2012-09-17T11:11:19.270-07:00] [octetstring] [NOTIFICATION] [OVD-20044] [com.octetstring.accesslog] [tid: 20] [ecid: b9af6bb1052db062:da036ce:139c63d427d:-8000-000000000000001c,1:109819:2] conn=44 op=1,309 RESULT err=0 tag=0 nentries=1 etime=5 dbtime=0 mem=1,564,201/2,232,496
[2012-09-17T11:11:19.271-07:00] [octetstring] [NOTIFICATION] [OVD-20043] [com.octetstring.accesslog] [tid: 28] [ecid: b9af6bb1052db062:da036ce:139c63d427d:-8000-000000000000001c,1:109818:4] conn=492 op=2,629 SRCH base=dc=myorg,dc=mycompany scope=2 filter=uniquemember=uid=userA,cn=users,dc=myorg,dc=mycompany


Each request and its corresponding response from OVD can be matched using these entries:

“conn=<connect number> op=<operation number since last start> and tid: <thread id>”

For example,  a request issued with a filter (&(uid=userB)(objectclass=inetorgperson)) was processed  by thread id 29. The connection number is 43 and the operation number is 2399.


[2012-09-17T11:11:19.265-07:00] [octetstring] [NOTIFICATION] [OVD-20043] [com.octetstring.accesslog] [tid: 29] [ecid: ac28ba5c19ecb1ba:1155e622:139c63d109b:-8000-0000000000000017,1:108956:2] conn=43 op=2,399 SRCH base=dc=myorg,dc=mycompany scope=2 filter=(&(uid=userB)(objectclass=inetorgperson))
[2012-09-17T11:11:19.265-07:00] [octetstring] [NOTIFICATION] [OVD-20043] [com.octetstring.accesslog] [tid: 20] [ecid: b9af6bb1052db062:da036ce:139c63d427d:-8000-000000000000001c,1:109819:2] conn=44 op=1,309 SRCH base=dc=myorg,dc=mycompany scope=2 filter=(&(uid=userC)(objectclass=inetorgperson))
[2012-09-17T11:11:19.270-07:00] [octetstring] [NOTIFICATION] [OVD-20044] [com.octetstring.accesslog] [tid: 29] [ecid: ac28ba5c19ecb1ba:1155e622:139c63d109b:-8000-0000000000000017,1:108956:2] conn=43 op=2,399 RESULT err=0 tag=0 nentries=1 etime=5 dbtime=0 mem=1,564,201/2,232,496



The corresponding result was sent back to the client successfully because you see a string “RESULT err=0 etime=5….”

This entry has the same connection number, operation number and thread id.

It is easy to write a simple parser using something like Python or Perl to generate request/response timings. I use one to identify exceptions, determine requests that exceeded a given threshold and those requests that never got a response back etc.

In these sample log statements, the string etime=<time in milli seconds> refers to the time consumed by OVD and the Target data source to process the given query issued to OVD.

I will discuss stuck threads and how to diagnose them in a future article.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.