Elasticsearch: Death by a Hundred Forks

Last week, Elastic.co made their changes to the Elasticsearch and Kibana licensing model and as expected we already have software forks from Amazon and Logz.io. The community is now definitely fractured.

I love freedom, but it can get confusing quickly. A great example of this is the BSD project. While the BSD community is led by some real heroes of open-source, the most popular offering is still from Apple. While having the freedom to expand and re-purpose code is amazing, there are definitely some projects that take more than they give.

Excerpt of Unix/BSD Family
Source: https://www.levenez.com/unix/unix.png

For the Elasticsearch project, even if Logz.io decides to contribute to Amazon’s codebase, you now have at least 2 diverging codebases competing to be the best. A strange irony also occurs as Elastic.co is now able to take from other codebases without giving back. This will give  Elastic.co a competitive advantage while disincentivizing other communities.  Amazon’s codebase is likely to have feature gaps from Elastic.co simply because it has to re-implement to keep up.

Promises

Amazon, like Elastic.co, has committed to always keeping the codebase as Apache 2.0.

That promise from Elastic.co has not been kept as they have modified the license. Yes, Apache 2.0 is still an option, but it is now heavily constrained. As an engineer, trust is a big part of my belief structure, and I have friends who I trust that work at both companies. Both of these promises were likely made to be kept. The challenge is not what the engineers at these companies will do, but what the company will do to meet investor expectations. The lesson for me is that promises made in blogs, even by the CEO, are easy to change. Never doubt that companies will react to threats even if it breaks a promise.

Challenges

While I can appreciate the need for immediate forks for business continuity, I also believe this is a great opportunity to re-evaluate where we go from here. Before joining Scalyr, I was pretty deep in the Elasticsearch community speaking at Elastic.co conferences and MeetUps. My experiences with Elasticsearch led me to believe that the long-term answer is not Elasticsearch. My challenges mostly stemmed from the design and age of Elasticsearch and I know others have experienced the same. When Elasticsearch was launched, production, storage and analysis of data was in a very different place. The recommended shard size for Elasticsearch is around 40GB depending on your reference. When you realize that companies are fast approaching the petabyte scale of daily data ingestion, there simply is no way to manage 25,000 new daily shards in Elasticsearch even if you could afford the infrastructure and make it work (which I highly doubt). Now your company’s needs may be much smaller, but I see no reason to invest in technologies with expiration dates when better solutions exist.

Moving Forward

So if not Elasticsearch then what? I personally needed something that was between Elasticsearch and a data lake. Something that would allow you to afford all of your data in a sustainable way while still being durable and returning answers in real-time. Only a few offerings can meet this today and I landed on Scalyr being the right design. I believed in it so much that I joined the company.
I am now 6 months into my “What’s Next” journey. I joined Scalyr because it solves the problems that I could not solve with Elasticsearch. After joining Scalyr, we launched a product for seamlessly integrating Scalyr into existing Elasticsearch clusters. This provides a clean path for engineers to augment and ultimately move away from Elasticsearch. We have lots of documentation on how Scalyr is different, but just know that Scalyr is a next generation data architecture built for Petabyte daily ingestion with forever retention.

I discussed this a bit on my webinar, “Is Elasticsearch keeping you up at night?“, watch the replay below.