Optimize the Database Search API #2668

ndegwamartin · 2024-09-04T13:46:45Z

Is your feature request related to a problem? Please describe.
Did some profiling on this DatabaseImpl method here that has the below code:

  override suspend fun <R : Resource> search(
    query: SearchQuery,
  ): List<ResourceWithUUID<R>> {
    return db.withTransaction {
      resourceDao.getResources(SimpleSQLiteQuery(query.query, query.args.toTypedArray())).map {
        ResourceWithUUID(it.uuid, iParser.parseResource(it.serializedResource) as R)
      }
    }
  }

The mapping block takes a lot of time since each serialized resource has to be deserialized sequentially for every element in the list returned from the database. The time taken is linear and the more the db results returned the longer the API takes to return a result.

Describe the solution you'd like
It should be possible to optimize this block by introducing a parallelized implementation e.g. Using coroutines and async within each iteration and then collecting the results and get a significant improvement.

Describe alternatives you've considered
Instead of letting the async launch with the current dispatcher(probably IO at that point) we could instead switch to the Default dispatcher since it is a computationally expensive task. This will however require us to create a new JSONParser object per iteration since it is not thread safe.

Additional context
Add any other context or screenshots about the feature request here.

Would you like to work on the issue?
Yeah

ndegwamartin · 2024-09-05T10:14:40Z

The DatabaseImpl.searchForwardReferencedResources and DatabaseImpl.searchReverseReferencedResources functions can also benefit from the same optimization and should be included.

FikriMilano · 2024-09-06T06:36:04Z

Very excited to have this merged in the future :D
I'll review once a PR is ready.

joiskash · 2024-09-17T09:06:17Z

This seems like an interesting approach. What are the memory implications of creating a new Parser per iteration?
Regarding the Dispatcher, I feel that this should still be launched by the IO dispatcher since the core functionality is an IO database read operation. There are also some interesting implications of using this with runBlocking
as mentioned in this blog https://jivimberg.io/blog/2018/05/04/parallel-map-in-kotlin/

By default runBlocking uses a dispatcher that is confined to the invoker thread. Which means we are forcefully blocking the thread until the execution of pmap finishes, instead of letting the caller decide how the execution should go.

I have a few questions related to this issue that are related to the use case:

How many resources are we talking about here? 100s, 1000s or more?
Do these resources only contain text or does this also contain binary data like base64 encoded images?
Is there any scope for chunking instead of fetching all resources?

ndegwamartin · 2024-09-17T16:33:56Z

@joiskash I've added the results from the benchmarking to the PR - see link here

The resources do not contain binary data.

We can investigate the chunking approach, however since the objective is to reduce the performance hit that occurs when mapping from serialized json to the corresponding FHIR Resource object we might not get an improvement that way.

ndegwamartin · 2024-09-17T16:38:43Z

I suppose though if the idea is to use chunking to change the overall approach in terms of improving the UX then yeah, that works okay. We have implemented batching on our record fetches (10 records) by using pagination for registers and infinite scrolling e.g. for searches.

ndegwamartin · 2024-09-18T10:13:00Z

Looks like creating a new JsonParser for each iteration should not be a concern after all (provided the parent class FHIR Context is already created) as mentioned on the comment here.

Performance Note: This method is cheap to call, and may be called once for every message being processed without incurring any performance penalty

I will go ahead and update the PR with this variant of the optimization.

ndegwamartin linked a pull request Sep 7, 2024 that will close this issue

Implement Parallelized map and optimize Database search API #2669

Open

7 tasks

jingtang10 assigned ndegwamartin Sep 16, 2024

ndegwamartin mentioned this issue Sep 11, 2024

Transition experimental performance optimizations to FHIR Core opensrp/fhircore#3358

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the Database Search API #2668

Optimize the Database Search API #2668

ndegwamartin commented Sep 4, 2024

ndegwamartin commented Sep 5, 2024

FikriMilano commented Sep 6, 2024 •

edited

Loading

joiskash commented Sep 17, 2024 •

edited

Loading

ndegwamartin commented Sep 17, 2024

ndegwamartin commented Sep 17, 2024

ndegwamartin commented Sep 18, 2024

Optimize the Database Search API #2668

Optimize the Database Search API #2668

Comments

ndegwamartin commented Sep 4, 2024

ndegwamartin commented Sep 5, 2024

FikriMilano commented Sep 6, 2024 • edited Loading

joiskash commented Sep 17, 2024 • edited Loading

ndegwamartin commented Sep 17, 2024

ndegwamartin commented Sep 17, 2024

ndegwamartin commented Sep 18, 2024

FikriMilano commented Sep 6, 2024 •

edited

Loading

joiskash commented Sep 17, 2024 •

edited

Loading