In this post, we investigate a complication that can occur if you require a firewall between your WebGate agents and your OAM 11g servers within your deployment topology. We provide some guidance related to how to configure your WebGates in this case. This post is part of a larger series on Oracle Access Manager 11g called Oracle Access Manager Academy. An index to the entire series with links to each of the separate posts is available.
Imagine a fairly-typical scenario where an organisation has a number of web servers, within a demilitarized zone (DMZ) that they want to protect with OAM. WebGate plugins will, of course, need to be deployed to these web servers and those plugins will need to establish and maintain Oracle Access Protocol (OAP) connections back to one or more OAM 11g Servers. It is quite likely that these OAP connections will need to pass through a firewall (or two) on their journey from the WebGates to the OAM servers, which is not, in itself, too much of a problem.
An issue frequently does occur, though, in the case that the firewall imposes either a maximum connection TTL (time to live) or idle timeout - and to be honest, most firewalls will do this as a matter of course. OAP, as has already been widely discussed, is a long-lived protocol and as such, the standard behaviour of an OAP client (such as WebGate) is to initially establish a number of connections to a server and then use those connections repeatedly over a long period of time. OAP clients in general (and WebGates in particular) typically do not react well to an established OAP session being "torn down" ungracefully - and this is exactly what a typical firewall will do if, according to its configuration, a connection has exceeded either its maximum TTL, or its inactive timeout.
The term "ungracefully" above simply means the scenario where the firewall does not send a TCP connection reset message to the client when invalidating the connection. Should you have a well-mannered firewall that does notify the WebGate in this way once it does close the connection, all should be fine and you can probably stop reading at this point. The reality, though, is that most firewalls will not fit into this "polite" category.
When a firewall does invalidate or tear down an OAP connection that WebGate "thinks" is still good, the first time the WebGate attempts to use that connection, the request will obviously fail, Depending on various configuration parameters and the number of connections available, the WebGate may indeed recover from this situation without any major impact to end users (apart from somewhat degraded performance as the WebGate hunts around for a good connection within its pool). It is always possible, though, that all connections could be invalid at once, such as the case of a system that sees little traffic outside of office hours having all of its connections timed out due to inactivity overnight and in this event, there will be a definite impact to end users, with requests failing until WebGate has managed to re-establish its connection pool.
The way to avoid this problem is to ensure that the firewall is never given cause to close a WebGate connection - in other words, ensuring that WebGate connections never exceed the configured TTL or inactivity timeout as defined at the firewall. This is achieved by configuring a maximum connection lifespan, or TTL, at the WebGate side that is less than the firewall's maximum TTL or idle timeout.
As an example, let's assume that our firewall imposes an idle timeout of 30 minutes for TCP connections. In this case, we would need to configure WebGate to automatically re-establish any connection older than, say, 25 minutes in order to ensure that the firewall would never need to time out one of its OAP connections. This is done by altering a WebGate setting called "Max Session Time".
Now, we need to have a bit of a discussion about this particular setting, for a number of reasons. The first is that it really isn't very well named, considering what it does; it has nothing to do with sessions, but everything to do with connections back to the OAM server and how long they will be allowed to last before being re-established. It should, correctly, be called something like "Max Connection Time" and perhaps in a later version of OAM it will be. As of the time of writing, though (when OAM 11.1.2.1 is the most recent version) we will have to live with the current name.
Perhaps more confusing, though, is the fact that, over the various incarnations of the 11g OAM product, the OAM Console page that allows this WebGate parameter to be defined has been changed repeatedly - consider the screenshots below:
As we can see above, the OAM Console UI, across several releases, has changed the expected unit of time in which this parameter is specified, starting with no unit at all, then moving to hours and then to seconds. What's more, the default value tends to vary as well, depending on the version you are using and the mechanism that was used to create the initial WebGate profile. The reality of the situation, though, is that you can and should ignore the unit of time that is reflected in the UI, because the default unit for this setting is (and always has been) hours. That probably worth repeating and highlighting, just to be completely clear:
In all OAM 11g versions up through the current release, 11.1.2.1, the default unit for Max Session Time is hours, regardless of what is reflected in the OAM Console UI.
This means that the default maximum TTL for a WebGate connection in OAM 11.1.2.0 and 11.1.2.1 is, in fact 3600 hours! We did say it was meant to be a long-lived connection...
Understanding the default value and the default unit is great, of course, provided that your firewall is (or can be) configured to allow connections to last (or remain idle) for at least an hour. This is often not the case, though.
The good news that that OAM 11g WebGates support a user defined parameter that can be used to change the unit used for Max Session Time. In order to change the unit from "hours" to "minutes", add the following to the "User Defined Parameters" section in the WebGate profile:
maxSessionTimeUnits=minutes
Once you've done that, then whatever number you've entered in the "Max Session Time" box will be interpreted in minutes, rather than hours (again, regardless of what the UI label tells you). When the change is reflected in the WebGate's ObAccessClient.xml file, you should see entries similar to the following (these reflect the correct settings for our "25 minute" example above.
Once you've made the appropriate changes, it's always a good idea to verify that things are working as expected. In order to do this, you should increase the log level of your WebGate to at least "INFO" and then filter out lines from the WebGate log file (oblog.log) containing the string "CONN_MGMT".
That will allow you to monitor the connections that are opened and closed by WebGate over time. I include a log snippet from my own system (when I set the Max Session Time value to 2 minutes) to highlight the messages to look out for. Note that, just to increase confusion further, the timeout value in the log is printed in seconds, rather than minutes or hours.
As a closing note, remember to reduce the log level of your production WebGates again once you've verified that the correct connection time setting is in force.
The problem we are trying to solve
An issue frequently does occur, though, in the case that the firewall imposes either a maximum connection TTL (time to live) or idle timeout - and to be honest, most firewalls will do this as a matter of course. OAP, as has already been widely discussed, is a long-lived protocol and as such, the standard behaviour of an OAP client (such as WebGate) is to initially establish a number of connections to a server and then use those connections repeatedly over a long period of time. OAP clients in general (and WebGates in particular) typically do not react well to an established OAP session being "torn down" ungracefully - and this is exactly what a typical firewall will do if, according to its configuration, a connection has exceeded either its maximum TTL, or its inactive timeout.
The term "ungracefully" above simply means the scenario where the firewall does not send a TCP connection reset message to the client when invalidating the connection. Should you have a well-mannered firewall that does notify the WebGate in this way once it does close the connection, all should be fine and you can probably stop reading at this point. The reality, though, is that most firewalls will not fit into this "polite" category.
When a firewall does invalidate or tear down an OAP connection that WebGate "thinks" is still good, the first time the WebGate attempts to use that connection, the request will obviously fail, Depending on various configuration parameters and the number of connections available, the WebGate may indeed recover from this situation without any major impact to end users (apart from somewhat degraded performance as the WebGate hunts around for a good connection within its pool). It is always possible, though, that all connections could be invalid at once, such as the case of a system that sees little traffic outside of office hours having all of its connections timed out due to inactivity overnight and in this event, there will be a definite impact to end users, with requests failing until WebGate has managed to re-establish its connection pool.
How to prevent the firewall closing connections
As an example, let's assume that our firewall imposes an idle timeout of 30 minutes for TCP connections. In this case, we would need to configure WebGate to automatically re-establish any connection older than, say, 25 minutes in order to ensure that the firewall would never need to time out one of its OAP connections. This is done by altering a WebGate setting called "Max Session Time".
Now, we need to have a bit of a discussion about this particular setting, for a number of reasons. The first is that it really isn't very well named, considering what it does; it has nothing to do with sessions, but everything to do with connections back to the OAM server and how long they will be allowed to last before being re-established. It should, correctly, be called something like "Max Connection Time" and perhaps in a later version of OAM it will be. As of the time of writing, though (when OAM 11.1.2.1 is the most recent version) we will have to live with the current name.
Perhaps more confusing, though, is the fact that, over the various incarnations of the 11g OAM product, the OAM Console page that allows this WebGate parameter to be defined has been changed repeatedly - consider the screenshots below:
As we can see above, the OAM Console UI, across several releases, has changed the expected unit of time in which this parameter is specified, starting with no unit at all, then moving to hours and then to seconds. What's more, the default value tends to vary as well, depending on the version you are using and the mechanism that was used to create the initial WebGate profile. The reality of the situation, though, is that you can and should ignore the unit of time that is reflected in the UI, because the default unit for this setting is (and always has been) hours. That probably worth repeating and highlighting, just to be completely clear:
In all OAM 11g versions up through the current release, 11.1.2.1, the default unit for Max Session Time is hours, regardless of what is reflected in the OAM Console UI.
This means that the default maximum TTL for a WebGate connection in OAM 11.1.2.0 and 11.1.2.1 is, in fact 3600 hours! We did say it was meant to be a long-lived connection...
Understanding the default value and the default unit is great, of course, provided that your firewall is (or can be) configured to allow connections to last (or remain idle) for at least an hour. This is often not the case, though.
But what if my firewall timeout is less than an hour?
maxSessionTimeUnits=minutes
Once you've done that, then whatever number you've entered in the "Max Session Time" box will be interpreted in minutes, rather than hours (again, regardless of what the UI label tells you). When the change is reflected in the WebGate's ObAccessClient.xml file, you should see entries similar to the following (these reflect the correct settings for our "25 minute" example above.
... <SimpleList> <NameValPair ParamName="maxSessionTime" Value="25"></NameValPair> </SimpleList> ... <userDefinedParameters> <name>maxSessionTimeUnits</name> <value>minutes</value> </userDefinedParam>Remember again - if the maxSessionTimeUnits parameter is not specified, then maxSessionTime will be interpreted in hours.
Seeing the effect
2013/12/05@18:30:59.01090 15928 15999 CONN_MGMT INFO 0x00001C04 /ade/brmohant_17700080/ngamac/src/palantir/aaa_client/src/watcher_thread.cpp:504 "Session expired" Connection^object{ObConnectionAAA:0x7F2390019D80{_socket=object{ObSocket:0x0241C020{_sock=17}{_my_addr=}{_my_port=0}{_remote_addr=192.168.56.245}{_remote_port=5576}{_use_blocking_calls=false}{_timeout=10000}{_req_pending=0}}}{_state=ObConnUp}{_priority=1}{_debug=false}{_host=oamr2ps1.oracle.com}{_port=5576}{replyMapSize=0}{_seqno=9}{_isSpare=false}{_createTime=1386268079}{_closedTime=0}{_retries=0}} Maximum Session Time^120 Current Time^1386268259 2013/12/05@18:30:59.01573 15928 15999 CONN_MGMT INFO 0x00001C02 /ade/brmohant_17700080/ngamac/src/palantir/aaa_client/src/watcher_thread.cpp:474 "New connection opened to Access Server" Connection^object{ObConnectionAAA:0x7F2390306270{_socket=object{ObSocket:0x023E8390{_sock=19}{_my_addr=}{_my_port=0}{_remote_addr=192.168.56.245}{_remote_port=5576}{_use_blocking_calls=false}{_timeout=10000}{_req_pending=0}}}{_state=ObConnUp}{_priority=1}{_debug=false}{_host=oamr2ps1.oracle.com}{_port=5576}{replyMapSize=0}{_seqno=0}{_isSpare=true}{_createTime=1386268259}{_closedTime=0}{_retries=0}} 2013/12/05@18:30:59.01584 15928 15999 CONN_MGMT INFO 0x00001C04 /ade/brmohant_17700080/ngamac/src/palantir/aaa_client/src/watcher_thread.cpp:504 "Session expired" Connection^object{ObConnectionAAA:0x7F239028F770{_socket=object{ObSocket:0x023D3000{_sock=18}{_my_addr=}{_my_port=0}{_remote_addr=192.168.56.245}{_remote_port=5576}{_use_blocking_calls=false}{_timeout=10000}{_req_pending=0}}}{_state=ObConnUp}{_priority=1}{_debug=false}{_host=oamr2ps1.oracle.com}{_port=5576}{replyMapSize=0}{_seqno=9}{_isSpare=false}{_createTime=1386268079}{_closedTime=0}{_retries=0}} Maximum Session Time^120 Current Time^1386268259 2013/12/05@18:31:59.10580 15928 15999 CONN_MGMT INFO 0x00001C02 /ade/brmohant_17700080/ngamac/src/palantir/aaa_client/src/watcher_thread.cpp:474 "New connection opened to Access Server" Connection^object{ObConnectionAAA:0x7F239028F770{_socket=object{ObSocket:0x0241C020{_sock=17}{_my_addr=}{_my_port=0}{_remote_addr=192.168.56.245}{_remote_port=5576}{_use_blocking_calls=false}{_timeout=10000}{_req_pending=0}}}{_state=ObConnUp}{_priority=1}{_debug=false}{_host=oamr2ps1.oracle.com}{_port=5576}{replyMapSize=0}{_seqno=0}{_isSpare=false}{_createTime=1386268319}{_closedTime=0}{_retries=0}}
As a closing note, remember to reduce the log level of your production WebGates again once you've verified that the correct connection time setting is in force.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.