Describe the bug
In a highly concurrent distributed environment, resource updates may be processed on different nodes. Those nodes may suffer clock drift which could potentially cause version N+1 having an earlier lastUpdated timestamp than version N.
It is desirable for the lastUpdated time to follow the same natural ordering as versionId for a given logical resource.
Environment
Which version of IBM FHIR Server? 4.11.0
To Reproduce
Steps to reproduce the behavior:
- Deploy multiple-node where instances reside on different physical nodes (ideally in different data centers to increase the likelihood of clocks being slightly different)
- Generate large number of parallel updates for one resource
- Compare the lastUpdated timestamp of each version and see if it follows the same order as the versionId
Expected behavior
Ideally, where N is the version number, we want the following to hold: lastUpdated(N) > lastUpdated(N-1).
Additional context
This is important when using the whole-system history endpoint to ensure that resource version changes are returned in the expected order.
The logic has been updated to allow a drift up to 2 seconds which is very reasonable for a cluster with properly configured network time synchronization. If the drift is 2 or more seconds, the request is rejected with a 500 Server Error (because something is critically wrong with the server environment).
You can artificially trigger the issue if you are able to manually adjust the clock when running a local instance of the FHIR server:
- Insert a patient p1
- Set clock to manual and adjust it back by 1 hour
- Update patient p1. But make sure that at least one field in the resource is different (otherwise the update will be skipped)
- The update should be rejected because the current time comes before the current lastUpdated time of the resource
- Reset clock to automatic time sync.
- Update patient 1. The update should succeed.
Describe the bug
In a highly concurrent distributed environment, resource updates may be processed on different nodes. Those nodes may suffer clock drift which could potentially cause version N+1 having an earlier lastUpdated timestamp than version N.
It is desirable for the lastUpdated time to follow the same natural ordering as versionId for a given logical resource.
Environment
Which version of IBM FHIR Server? 4.11.0
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Ideally, where N is the version number, we want the following to hold:
lastUpdated(N) > lastUpdated(N-1).Additional context
This is important when using the whole-system history endpoint to ensure that resource version changes are returned in the expected order.
The logic has been updated to allow a drift up to 2 seconds which is very reasonable for a cluster with properly configured network time synchronization. If the drift is 2 or more seconds, the request is rejected with a 500 Server Error (because something is critically wrong with the server environment).
You can artificially trigger the issue if you are able to manually adjust the clock when running a local instance of the FHIR server: