Elasticsearch isn’t the Definitive Scalable Search Engine

Elasticsearch is a widely adopted search engine used by both enterprise companies and small startups. Because Elasticsearch is free and really easy to start using, engineers sometimes just jump right in without reviewing their actual use cases, and how Elastic’s architecture supports or frustrates those use cases.

My prior experience with Elasticsearch was at a video entertainment company. We implemented a data ingestion tool using Logstash to pipeline video catalog data to Elasticsearch for search and storage. It began as a small piece of software for manipulating data formats and validating incoming data quality. However, as the datasets grew from a couple of hundreds of live channels to thousands of channels, we ran into ingestion issues and had a tough time keeping our cluster healthy.

Eventually, we realized that Elasticsearch just isn’t as simple to maintain at large scale as it was in the beginning. While Logstash worked well for us, Elasticsearch wasn’t the tool we needed. In this article, I’d like to share a few points that you should consider before choosing Elasticsearch as your primary search engine.

Indexing isn’t as valuable as you think

You can index any data you want in Elasticsearch, but the types of data you are managing will heavily impact your index configurations. Historically, indexing technology is designed for static data that grows or changes slowly. These datasets are typically big in size, such as blog posts or news articles. Besides, ranking is another crucial characteristic of indexing that empowers users to find the best documents matching each submitted search query.

It became quite evident that indexing didn’t bring much value to my application. The metadata for each channel or live program was small and ranking wasn’t a requirement. We also knew the exact query to use for checking any specific episode of “Game of Thrones” was available in the system. There were no situations where we’d need to enter “Game of” or “The Red Wedding” as you might do in a troubleshooting use case. Finally, relevancy was not needed for us. A missing episode on the 3rd page is just as important as the one on the first page, so indexing just doesn’t enhance our application in that regard.

Our use case was for troubleshooting catalog data quality, but a similar situation could also apply to troubleshoot production issues. For instance, indexing won’t help you identify the most relevant runtime exceptions that cause your application to crash, therefore, configuring analyzer and score mode for a document’s field is an overhead that brings little to no value in typical SRE and DevOps use cases.

Today, it is the rule rather than the exception to see enterprise companies producing large amounts of data on a daily basis, so ops engineers who monitor Elasticsearch clusters are required to manage indices on an active basis to achieve optimal performance. This brings up my second point: do you have a resourced plan to manage your indices?

Managing indices isn’t a trivial task

Users can choose Elasticsearch time-bound rolling indices for processing time-series datasets, which generate large ingestion volumes. A time-based index is designed to separate active data from old data, so obsolete documents can be dropped to cut down costs on storage. The conventional approach is to set up an index template to control the definition of new indices, so the new indices are created whenever they are required and old indices are discarded after a fixed period of time. It’s not difficult to implement it at first but to utilize it efficiently is quite complex.

To achieve the best ingestion rate, users should deploy active index shards over as many nodes as possible. However, too many shards could cause high resource usage for search, so you might decide to maintain separate indices for queries, and that’s additional work you need to plan ahead of time.

The other consideration is your daily ingestion volume. If you don’t have enough shards to handle spikes of ingestions, it could lead to failure of ingestion and result in loss of data.  Of course, if you have too many shards, you have cost and search performance issues. The ops team needs to actively manage the cluster’s health and roll over the index when it becomes too big or too old. While it’s possible to leverage Elasticsearch API to automate some of the steps, you will always need someone to manage it when things don’t go as predicted. That takes us to my last point: do you budget for efforts and costs to host your own Elasticsearch cluster?

It takes a team to run Elasticsearch

I hope by now I’ve convinced you that running Elasticsearch isn’t something you do because it’s easy. I personally learned that lesson the hard way. My primary job was to build an integration tool to ingest video catalog data and Elasticsearch just happened to be the platform the team chose. But the team didn’t have the background to run a production-grade Elasticsearch cluster, and I wound up spending more time just to keep the cluster up and running.  And since our use cases didn’t specifically benefit from Elasticsearch’s strengths which are determined by its architecture, this proved to be a bad investment.

If you decide to go with Elasticsearch, your team needs to have experienced Elasticsearch users who know how to configure the right number of shards for an index on your cluster. Furthermore, you’ll need to set the appropriate heap size and node types when moving the Elasticsearch cluster to production. Those configurations all depend on your use case and ingestion volume, so you shouldn’t dive in headfirst without preparing to invest time and resources.

In summary, Elasticsearch can still be a good tool for some cases, but it just isn’t a silver bullet for all complicated search problems. Since this experience, I’ve moved to a new role. I’ve joined Scalyr, which provides a cloud-based log management platform to host event data, i.e. exactly what I spent so much time trying to improve. Knowing now the amount of time and energy that was required, I feel like my time could have been better spent on improving the product.

A little more about Scalyr, it’s a unique, index-free architecture to optimize ingestion and search performance. The cluster is managed by our SRE team 24/7 and the search performance is significantly better at scale compared to Elasticsearch. Want to see what it looks like? Here’s access to a free 30 day trial.