Monday, January 24, 2011

Using the x.509 Attribute Sharing profile responsibly

I'm back, rested and I've had some time to think about the crazy (clever?) OVD adapter I wrote for last week's PoC. You know, the one that lets you do a search for certdn= the user's certificate DN and it makes a SOAP call over to OIF to get the user info?

I've talked to a few people internally about how this thing works and at first everyone has had the same reaction - that's kinda cool. Then we get to talking about the fine print warning:

Before you go further a warning: If you're going to try this at home make sure you test if for scalability. It's not entirely clear that this will scale up to thousands of concurrent users. OAM and OVD will easily support that, as will OIF (as both the SP and IdP). But the entire architecture relies on SOAP calls over the Internet and those are notoriously latency heavy. As a result the initial access by a user will be relatively slow and that could cause any number of issues. If a large percentage of your users visit for only a short time those problems will be worse.

Not everyone I've talked to is as concerned. At least not before seeing it run. Perhaps I'm a little more conservative about performance in large scale deployments. Or maybe I've seen one too many super clever solutions fall flat on their faces when faced with the real world. But in any case I cooked this crazy idea up and am still not convinced it's a good one.

Read on for why I'm a little gun shy.

Imagine that you have a web server that's seeing one of these special user logins per second (I'm using stupid numbers as examples here, bear with me). Everything works just fine as long as the back end (the OAM server) can handle that authentication - again just one special authentication per second. Each of those funky authentication requests coming in requires OAM to do a search for certdn= and the user's certificate DN. If the user isn't in cache already my custom OVD adapter has to make a SOAP call over to the SAML Service Provider (OIF in this case) which in turn makes a SOAP call over the Internet to the IdP. The local call is probably pretty safe and fast - it's on your own infrastructure after all; the other call is riskier since it goes out over the Internet (via SOAP over an HTTPS connection). As long as those calls over the Internet can keep up things will go fine. But what happens if the IdP has some problem or can't keep up?

Let's say 9:00 AM rolls around and a bunch of users read the email someone sent them about our site. Each of those users immediately clicks on the link, pops their smart cards into their machine and a whole bunch of "new" users hit OAM. OAM makes the LDAP calls over to OVD as normal. The LDAP calls for these users the plug-in has never seen before are going to take time so the from OAM to OVD "hangs" waiting for a response while our plug-in is waiting for the SOAP call to complete.

And that's the dangerous bit.

If the rate of new users coming in gets too large the whole site starts to strain under the load. It's not any one component's fault - the SOAP calls just take time.

I'm not sure I've explained it well enough, so if it's not clear someone let me know in the comments.

So what do I think is a better solution?

I'm so glad you asked!

I think a better solution would be to take out all of the XASP code from OVD and instead use a conventional database store. The first time OAM searches for the user with the same search I discussed above (certdn equal to the user's certificate DN) OAM won't find the user. We take advantage of this and configure OAM to redirect the user to another really simple application that pulls the certificate DN out of the HTTP header, makes the XASP call and then inserts the user into the user cache database, and then redirects the user back to their original URL. The revised architecture looks something like this:

You'd probably deploy both applications on the same web server so that the user doesn't get two certificate prompts, but that's easy enough. In fact the application might even be deployed in the same WebLogic server as the rest of your applications - it's all behind a Web Server and WebGate anyway.

Despite having more boxes in the diagram this architecture is actually simpler. You can use an out of the box OVD adapter and you only need to write a small chunk of a conventional web application.

I'm curious about what others think of my two alternative solutions so please comment below!


  1. Chris -

    Have you worked with XASP and OIF? I am curious about the performance via OIF vs OVD as you discussed here.


  2. I have used it a few different times and a few different ways.

    In general in the server to server world LDAP clients expect their server to respond in bigO milliseconds. What I mean is that something like OAM is architected to above expects OVD to come back really quickly because its client (a WebGate) is expecting an answer quickly because the web browser is expecting a response quickly.

    XASP requires two hops - one from OVD to the local SAML SP and then another from the SAML SP to the IdP. When you wire up something like I described in the earlier post and send both of those SOAP call over an intranet link to an internal server it should be able to get a response in the high millis. Change the second of those calls to go over the Internet and the calls are more likely to take seconds than millis; plus you've added the risks of Internet communication snafus.

    That's why I have a feeling the above might be just a bit too clever.

    Why do you ask?

  3. Oh Chris you forgot another requirement of that OIC we built together. "There are to be no HTTP 403 redirects in the architecture."

    I agree that there are some definite major concerns with the performance X.509 and SOAP that are still being played out. So far the caching in OVD seems to be enough. So far majority of the users are local to their own domain so this hasn't really been tested at monster scale.


Note: Only a member of this blog may post a comment.