Speeding Up Elasticsearch: Lessons in Chunking and Parallel Requests ·

Table of Contents

Introduction #

Over the last couple of months, I’ve found myself thrust deep into the world of Elasticsearch. The journey has been nothing short of illuminating. To give you some context, I’ve been working on a legacy enterprise system that, like most overgrown tech stacks, had started to slow down significantly with time.

One of the biggest issues? Search performance. Finding records in the system had become painfully slow, partly because of the size of the data and the number of filters that had to be applied. With some requests taking minutes to return results, it was becoming a real problem. Let’s be honest: if you’ve used a search box, you know how long even one second feels.

That’s where Elasticsearch came in. We designed a solution that not only solved the sluggish search experience but also enabled powerful cross-entity querying.

In this article, I’ll walk you through one of the key optimizations we made: using chunking and parallel requests to significantly speed up complex Elasticsearch queries. It wasn’t perfect out of the box, but once we tuned it, the performance gains were game-changing.

The Challenge: Searching Over Large Datasets #

When we first implemented Elasticsearch, everything was going great, response times were blazing fast. Our MVP seemed to be a success, and by all metrics, it was. But the real challenge showed up when we opened the floodgates and exposed the implementation to real-world, large datasets. See, Elasticsearch has a couple of limitations, one of them being a 10k limit in the max_result_window. Granted this value can be increased… but it comes at a performance cost. So instead of bumping the limit, we chose to use search_after, a popular pattern that retrieves all the data in batches. Then came problem number #2: massive terms queries with thousands of IDs. This became a bottleneck when we had to filter one entity using IDs returned from a previous query — essentially chaining queries across entities. The solution we landed on was chunking. Let me walk you through how that worked.

Example: Chunking + Parallel Requests #

var primaryData = await _searchService.GetPrimaryEntityData(searchRequest);
if (!primaryData.Ids.Any())
    return Json(new { total = 0, results = new List<string>() }, JsonRequestBehavior.AllowGet);

var semaphore = new SemaphoreSlim(6);
var filteredIdTasks = ChunkList(primaryData.Ids, 9000).Select(async idChunk =>
{
    await semaphore.WaitAsync();
    try
    {
        return await _searchService.ApplySecondaryFilters(searchRequest, idChunk);
    }
    finally
    {
        semaphore.Release();
    }
}).ToList();
var filteredIdChunks = await Task.WhenAll(filteredIdTasks);

var filteredIds = new HashSet<string>();
foreach (var chunk in filteredIdChunks)
{
    filteredIds.UnionWith(chunk);
}
var results = await _searchService.GetFinalResults(primaryData.Documents, filteredIds);

How It Works #

The initial request would retrieve documents for the primary entity. The result set would then be split into chunks of 9000 (to stay safe under the terms query limit), and use each chunk to run a secondary filter on another entity. Initially, these requests would be done sequentially, which worked but was slow. So naturally, we “optimized” by running them in parallel using SemaphoreSlim to avoid overwhelming the server.

Chunking at Scale: Making It Work in Production #

Sure, our first stab at parallel chunking had its issues, like overloading the server and introducing debugging complexity. But instead of scrapping the approach entirely, we doubled down and improved it.

Here’s what helped:

Controlling concurrency: Tuning SemaphoreSlim to control how many requests we send at once was crucial. Too low, and we lose the speed gains; too high, and the server chokes.
Batch size tuning: We experimented with different chunk sizes (e.g., 5000, 9000, 10000) to find the sweet spot between performance and query limits.
Fallback strategies: For especially large queries, we added fallback logic to split or retry failed chunks, improving resilience.
Caching and query reuse: We introduced basic in-memory caching for repetitive filters and reused intermediate results where possible.

After a few iterations, we hit a point where our chunked parallel strategy was fast, reliable, and scalable all without needing to refactor everything around _msearch or pagination tokens.

TL;DR: #

If you’re working with complex, filter-heavy Elasticsearch queries and large datasets, chunking and parallel requests can be your best friends, as long as you treat them with respect.