Wednesday, April 10, 2013

Don’t Be that Guy – Part 2: Avoiding Outages Due to Full Disks and Partitions

A while back, I wrote about the fact that many customers experience severe outages with their Fusion Middleware products when they let the digital certificates associated with the SSL connections in their deployments expire.

To be fair, certificates are often “out of sight and out of mind” and indeed many system administrators don’t have much experience managing certificates.  However, the same cannot be said about disk space.  We all deal with managing disk space on multiple systems including our desktop clients, home PCs, and even phones. 

Today as a public service announcement I’d like to discuss the dangers of not paying attention to whether or not you have adequate disk space on your dev, test, and production machines running your middleware software. 

I’ll be honest, I see a surprising number of customers experience everything from long delays in their dev and QA cycles to real production outages because of instability caused by running out of disk space.  So, size your machines with adequate disk space, monitor your disk usage, and be aware of your logger and auditing configurations in your Fusion Middleware Products.

Most Fusion Middleware / IAM products including OAM and OIM log to the standard JAVA/WLS logs .out and .log; as well as to the Oracle diagnostic log -diagnostics.log.  The standard logs can be configured in the WLS console while the diagnostic log can be configured by editing the logging.xml file, through WLST, or in EM.
Most customers that use our auditing capabilities log directly to a database.  However, the default storage is “bus-stop files” which do reside on the local file system and obviously take up space.
 

Speaking of databases, I see a fair amount of similar pain being caused by databases running up against various size limits like tablespace or data file limits.  So, make sure you are also actively managing data size limits on the DB.