How Decades of Open Source Contribution Prepared Me to Create My SDK

Years before I created MultiCloudJ, an open-source Java SDK for multi-cloud development, I spent a lot of time studying and submitting patches to projects I didn’t own.
My first patch on a large open source database took a few days to design and a few weeks to compile. Three moderators have reviewed it. Two asked me to rethink how to play it safe with existing usage. I fixed Apache HBase’s WAL replication pipeline where ChainWALEntryFilter was passing empty entries after all cells were filtered, causing useless data to be replicated across collections. The reviewers backtracked on my original approach of adding a boolean flag to the existing main filter. They pointed out that this change would require a rebuild of HBase to make the change, and more importantly, that changing the default behavior of the shared class could silently break the existing replication setup. The design evolved through multiple iterations during a 19-day review cycle, eventually settling on a separate entry filter class (ChainWALEmptyEntryFilter) that operators can configure on each peer iteration without touching the core code.
Instruction in Writing Code
That experience set the tone for everything that followed. Over the years, I contributed to projects throughout the Apache ecosystem and other open source efforts, including distributed databases, query engines, data processing frameworks, and cloud abstraction libraries. What I learned in those years had less to do with algorithms and more to do with the discipline of writing code that thousands of people rely on, most of whom will never file a bug report but simply stop using it.
Backward compatibility is the least important limitation in software engineering. In a company, you can coordinate a major change across teams via email and a migration guide. In open source, you can’t. Large Apache projects categorize APIs by audience and level of stability and requires that stable communication be maintained for all major releases. An API deprecated in one version cannot be removed until the next major release. That kind of discipline forced me to stop asking what API is clean and start asking what API I can live with for the next five years. This hit home during my first major contribution to Google’s go-cloud, adding atom writing in the docstore interface. Because I was dealing with an important issue, I came to the review with three different proposals for writing semantics. If it’s closed, those semantics are almost impossible to reproduce without breaking the entire downstream consumer. I worked interchangeably with the creator of the project, and we came together on a path that favored long-term sustainability over short-term aesthetics.
Open source code reviews operate at a different frequency than corporate reviews. Within the company, reviewers focus on accuracy, style, and whether the change meets the immediate requirement. The open source commits test how the change interacts with the features planned for the next two releases. They ask you about edge cases in deployment topologies that you’ve never encountered. Research on failures in large distributed systems wrote that optimization failures, partial network fragmentation, and crash detection bugs cause severe outages. Open source reviewers think about those failure modes by default. They’re flagging what could break two releases from now, not just what’s failing today. One review of my HBase pull request taught me exactly that lesson. I had looked twice on the line not thinking that I had no side effects, which was true in use at that time. The reviewer stated that the consideration was a burden on internal behavior that I could control. If a future contributor changes the line usage, my code will silently break. The maintenance was minimal. The change in the way I think about implicit contracts has stayed with me.
ASF’s approach to ‘lazy consensus’
The governance model of these communities has also shaped the way I approach technical decisions. Apache Software Foundation works lazy consensus: a few affirmative votes with no vetoes are sufficient to proceed, but a dissenting vote must include other material. No boss can rise above him. You have to appeal to people working in different companies, in different time zones, with important priorities. That habit stays with me; if I disagree with the technical direction of any group, I write what I will do instead and why, before raising objections.
Developers who want to contribute to a large project often ask where to start. My advice: read the issue tracker for a month before writing any code. Watch how committers update patches. I Apache Incubator’s guide to community management lays out how these communities work: decisions happen on mailing lists, eligibility is earned through ongoing contributions, and vendor neutrality is deliberately enforced. Understanding that culture before you send your first clip saves months of frustration.
When you start, choose a problem that you have actually encountered. My most productive contributions came from the forces I struck while using distributed databases at scale. I understood the failure path because I had followed it through the production logs. That context gave my patches credibility that a cold offering just couldn’t have. One production problem I can still think of was the set of MapReduce jobs handling HBase data migration were running out of memory for business critical operations, and trying again failed too. The cause was unnecessary memory usage in the job pipeline and a fix was submitted as HBASE-24859. One came from fixing bugs in Spark jobs that were silently shutting down, timing out, and retrying, which was adding millions of dollars to cloud waste before anyone noticed, and that fix was shipped as SPARK-39283.
A decade of contributing to open source infrastructure projects taught me how to think about ambiguity, communication latency, and systems that fail miserably when the assumptions they’re built on turn out to be wrong. Those years also gave me the foundation to start MultiCloudJ, an open source Java SDK for multicloud development that I now help maintain. Designing portable APIs across AWS, GCP, and other providers required all the habits I’d picked up over years of review cycles, lazy consensus debates, and arguments about backward compatibility. The teaching of giving came first. The SDK is the one that built it. You gain those skills one rejected piece at a time.



