Monday, April 26, 2010

Custom caching for OES attributes

First in the interest of full disclosure: this wasn't my idea. It's exactly the sort of thing I usually come up with but in this case someone else suggested it to me.


One of the most powerful things about OES is the ability to write policies on attributes. I talked about this in a post about writing OES policies back in December, but for those of you that missed that post here's a cliff's notes version:

Objects can have attributes hanging off of them
So can Roles, Groups, and Users.
The best way to write policies that make sense and scale is often to use a constraint to compare the values of attributes.

I gave a decent example in that post so I'm not going to go back into the details.

One of the things people ask about OES is whether it caches the values it retrieves. The answer is "of course!", and then we get into a whole bunch of questions about how the attribute cache works. Yes there are TTLs on the attributes. Yes they are configurable per attribute. Yes the caches get flushed automatically at appropriate times. Yes there's an API to flush the attributes out of cache.

Even so, sometimes there are cases where customers need to do something slightly different.

First an example Policy:
Deny access if AccountStatus is set to Expired

In OES' internal format this looks like
deny( any, //app/policy, //role/Everyone ) if AccountStatus="Disabled"
If getting AccountStatus is "cheap" then you can just go get it whenever you want from the database and not bother to cache it at all; then if the account gets disabled you can deny access to everything immediately. If getting AccountStatus is "expensive" (e.g. it requires a call back to your mainframe) you don't want to make it on every access attempt so you'll want to cache it for a while. Caching the value in a regular LRU cache with a TTL of something like an hour means you might continue granting the user access to resources even after their account has been disabled. Set the cache too high and you have a security problem, set it too low and you have a cost problem. I'm using cost here in the most generic sense possible - it could mean compute cycles, CPU time, clock time or literally dollars and cents.

Customers usually have to try to figure out how to balance these two requirements. It's one of my maxims all security winds up coming down to this very trade off.

What if you didn't have to make that trade off?

Oracle Coherence lets you build a cache that runs on individual machines in the same VM as the rest of your code, but automatically syncs up across your network. Coherence's Java API, at its simplest, is a Map you can put stuff into or get stuff out of. If the data you want is already known by someone in the cluster your call to get() it out comes back seamlessly. Similarly when you put something into the cache anyone else can get it out. And if someone removes data from the cache it gets removed everywhere in the grid. Coherence handles all of the painful stuff like node failure and recovery as well as nodes being added to the grid. Coherence can also persist the data or go get it for you if you want, but I'm going to try to keep this more generic and ignore those features and treat Coherence as a simple data storage cache and nothing more.

Steve Poz blogged about using a Coherence cache for OES attribute data back in September. I'm going to build on that work. Because if there's one thing I know it's that good programmers write good code, but great programmers know where to steal good code.

OES ships with a bunch of attribute retrievers for sources like databases and LDAP directories out of the box. If you're like me you don't want to build that code again; configuration strings, connection pools, handling data retrieval failures and logging are all solved problems so why waste your time reinventing the wheel?

Back to my example policy:
deny( any, //app/policy, //role/Everyone ) if AccountStatus="Disabled"
AccountStatus comes from an existing OES attribute retriever. Let's leave that as it is and go ahead and create a new attribute CachedAccountStatus. We'll write an OES Attribute Retriever that looks for any attribute that begins with the keyword Cached and just layer a cache on top of the existing attribute data.

To do that we import com.bea.security.providers.authorization.asi.AttributeRetrieverV2 and implement the AttributeRetrieverV2 interface.

The two methods are getHandledAttributeNames() and getAttributeValue(). For the former we just return null so that we get called for every attribute and have an opportunity to resolve the data or pass. Here's what the code looks like:


public String[] getHandledAttributeNames() {
return null;
}

public Object getAttributeValue(String name,
RequestHandle requestHandle,
Subject subject,
Map roles,
Resource resource,
ContextHandler contextHandle) {

// Set default value
String attrValue = null;

if (name.startsWith("Cached")) {

LOGGER.debug("Request received for Cached attribute." );
LOGGER.debug("Requested attribute name '" + name + "'" );

// skip forward 6 characters to get the real attribute's name
String uncachedName = name.substring( 6 );

LOGGER.debug("Uncached attribute name '" + uncachedName + "'" );

try {
AttributeElement ae = requestHandle.getAttribute( uncachedName, false );

if ( null == ae )
LOGGER.error( "Failed to get attribute named '" + uncachedName + "'" );
else {
// this code only "does" Strings. Real code would need to be smarter
attrValue = (String)ae.getValue();

LOGGER.debug("Uncached attribute '" + uncachedName + "' => '" + attrValue + "'" );
}
}
catch ( Exception ex ){
LOGGER.error( "Exception caught getting attribute", ex );
}

}

return attrValue;
}
Then the only thing left is to wire in the actual caching. And Steve showed us how to do that on his blog. Basically add the necessary Coherence jars to your classpath and add a couple of -D options to the line that starts your VM up. Then tweak the code above so that the constructor connects to the cache:

private NamedCache myCache = null;
public MyAttributeRetriever() {

CacheFactory.ensureCluster();
myCache = CacheFactory.getCache("oesattributecache");
}
And add a line to my code to try to get the data from the grid and another to insert the value into the cache if I find it.

I'll leave the copy/pasting to you. :-)

The only other bit we need to solve is how to remove the cached value when necessary. Basically you have the system that changes the value or the one that stores the value call the coherence API to remove the cached value or at least kick that process off. For example if you disable users through OIM you could have OIM treat the Coherence cache as a provisioned endpoint. Or you could use a database trigger. Or you could be listening for changes to your LDAP directory. Or any of an infinite number of other ways.

Let us know if you try this out on your own!

2 comments:

  1. How does this solve the problem? Coherence merely makes the cache "consistent" (quotes intentional) across the cluster. If your identity store where the attribute lives has been changed by another application such as a provisioning tool, Coherence won't be magically aware of it unless you implement a custom eviction policy that hooks into the guts of the identity store. (The provisioning tool could also flush the cache and/or evict cache items; again a custom task).

    ReplyDelete
  2. I talked about that in the last graf - "The only other bit we need to solve is how to remove the cached value when necessary. Basically you have the system that changes the value or the one that stores the value call the coherence API to remove the cached value or at least kick that process off."

    So yes, you need to have something tell Coherence to toss the now invalid value from the cache.

    You might do it as part of an OIM workflow, or use a trigger in your database, or by monitoring LDAP changelogs. But that's just a wiring problem. ;-)

    ReplyDelete

Note: Only a member of this blog may post a comment.