Most of us are well versed with Fault Management Framework in 11g, where one of the generic feature that we implement is Retry mechanism. Recently I heard about the feature ‘Auto Recovery’ in BPEL and was a part of discussion to conclude when we should (not) rely on this feature during BPEL process execution. Actually this was a new feature for me as I ever explored and considered during the development though I heard of manual recovery. This made me realize that I am still novice :).
So the purpose of this post is to explore ‘Auto Recovery’ in BPEL that include the following. And does not discuss about required configuration in clustered environment, startup related configuration and Callback Recovery.
- Configuration
- BPEL Recovery Console
- Auto Recovery Behavior
- Auto Recovery in Action
Configuration:
‘Auto Recovery’ configuration is done by setting few of the MBean properties in EM console. To configure it in EM console one should navigate to soa-infra -> SOA Administration -> BPEL Properties -> More BPEL Configuration Properties -> RecoveryConfig. This will bring up the following screen showing the default parameters. BPEL Auto recovery is enabled by default.

The properties startWindowTime and stopWindowTime specify the period during which Auto Recovery is active. By default auto recovery feature will be active from 12AM to 4AM everyday (remember that it’s SOA server time), shown in above screenshot. We can change these settings by simply updating the time values in 24 hr format and do click on Apply.
The property maxMessageRaiseSize specifies the number of messages to be sent in each recovery attempt, in effect resembles the batch size.
The property subsequentTriggerDelay specifies interval between consecutive auto recovery attempts and the value is 300 sec by default.
The property threshHoldTimeInMinutes is used by BPEL engine, to mark particular instance eligible for auto recovery once the recoverable fault occurs which is 10 min by default.
If we observe closely, none of these properties mention about number of recovery attempts to be made which is altogether a separate MBean property. To set, navigate to soa-infra -> SOA Administration -> BPEL Properties -> More BPEL Configuration Properties -> MaxRecoverAttempt. The default value is 2.

To disable ‘Auto Recovery’, set the maxMessageRaiseSize property value to 0.
BPEL Recovery Console:
Navigate to soa-infra -> Service Engines -> BPEL -> Recovery to view the recoverable instances. Note that, the console shows all recoverable instances irrespective of enabled/disabled ‘Auto Recovery’. We can manually recover the faulted instances from this console when Auto recovery is not enabled.

Auto Recovery Behavior:
Whenever a recoverable fault (this term is more abstract, I verified this behavior with Remote, Binding and User Defined Faults) occurs during the BPEL processing, it will be visible in Recovery console. If Auto Recovery is enabled, after threshHoldTimeInMinutes BPEL runtime will try to auto recover the instance. If it’s not successful, again number of recovery attempts will be made as given for MaxRecoverAttempt with an interval as given for subsequentTriggerDelay. If instance fails even after these maximum recover attempts, the instance will be marked as exhausted (can be queried on recovery console using message state as exhausted). We can use ‘Reset’ button to make these instances eligible for Auto Recovery again.
Note that, we observe this behavior only when the fault is thrown back to BPEL runtime or fault is not caught within BPEL process.
Auto Recovery in Action:
Developed a simple one-way BPEL process for demonstration. This BPEL has invoke activity that results in RemoteFault and dehydrate activity after that.
Scenarios Verified:
- No Catch -> Got Remote Fault -> Auto Recovery happened.
- Catch All -> Got Remote Fault -> Auto Recovery did not happen.
- Catch All (Scope level) -> Got Remote Fault -> Re-throw Remote Fault -> Auto Recovery happened.
- Catch All (Scope level) -> Got Remote Fault -> Re-throw User Defined Fault -> Auto Recovery happened.
- Catch All (Scope level) -> Got Binding Fault -> Re-throw User Defined Fault -> Auto Recovery happened.
- Catch All (Scope level) -> Got User Defined Fault -> Re-throw User Defined Fault -> Auto Recovery happened.
Configuration Used:
startWindowTime – 0.00
stopWindowTime – 7.00
maxMessageRaiseSize – 50
subsequentTriggerDelay – 300 (sec)
threshHoldTimeInMinutes – 5 (min)
MaxRecoverAttempt – 4
Invoke Auto Recovery in Action:
The instance is faulted with remote fault.

The BPEL process instance is visible in Recovery console as ‘Undelivered’.

Observed that, ‘BPEL Message Recovery Required’ notification is visible after expiration of time as given for the property threshHoldTimeInMinutes.

After the first auto recovery attempt made by BPEL engine. Observe that retry happened by initiating process from the start as there is no dehydration point before faulted invoke.

After the 2nd recovery attempt. Observe the time difference between the successive recovery attempts.

After the 4 the and final recovery attempt.

Now this BPEL process can be seen in recovery console with message state as ‘Exhausted’ (shown below) as all the 4 recovery attempts are done. Now we can recover this BPEL process manually by clicking on ‘Recover’ button or click on ‘Reset’ button to make this process eligible for auto recovery again.

Clicking on Reset button which makes this process to be eligible for auto recovery again and BPEL engine will restart recovery attempts (shown below).

Activity:
To demonstrate Activity auto recovery, modify BPEL process to add Dehydrate and Assign activity before faulted invoke. This case also demonstrates that auto recovery will happen from the last break point. The highlighted part shown below shows the difference from the previous scenario with Dehydrate activity along with remote fault at invoke activity level.

In BPEL recovery console, we can search for the activities that are marked for recovery. Assign3 is the first activity after the dehydrate activity so the recovery should happen from this activity.

Following screenshots show flow trace after the first auto recovery attempt made by BPEL engine. Observe the difference from previous run in this flow trace. Now the entire BPEL process is not started rather it starts from Assign 3 activity as expected.


After the 4 the recovery attempt.



Now this BPEL process can be seen in recovery console with message state as ‘Exhausted’ (shown below) as all the 4 recovery attempts are done. Now we can recover this BPEL process manually by clicking on ‘Recover’. Observe that reset button is not available and it needs a manual recovery.

Other Observations:
- The above mentioned behavior is observed only for ASync BPEL and for Sync BPEL processes (Transient Sync) no auto recovery is performed. However, the same is not verified in case of Durable Sync BPEL processes for the time being.
Sample code can be downloaded from here.
References:
http://docs.oracle.com/cd/E17904_01/integration.1111/e10226/bp_config.htm
Like this:
Like Loading...