Tech News

Is zero copy ‘freedom’ for data groups?

BearingPoint’s Shruti Goyal talks about zero-copy architecture and why it’s ultimately changing the game for data teams.

The world of data architecture, according to Shruti Goyal, has been defined by a single process for the past decade: extract, transform and load (ETL).

ETL is a three-stage computer process in which data is extracted from transactional or real-time source systems, converted (ie cleaned, optimized and standardized) into an analytical format, and loaded (or stored) into a data hub or warehouse for reporting and analysis.

“Basically, this meant building complex pipelines using tools like SQL Server Integration Services (SSIS), Azure Data Factory (ADF) and Microsoft Data Pipelines,” explains Goyal, manager of data analytics and AI at BearingPoint.

“ETL ensures that data is reliable, consistent, and ready for analysis and decision making.”

However, Goyal believes that after a decade of data dominance, ETL may be on the way out due to the rise of zero-copy architectures – a method “where data is used where it already resides, without physically copying it to downstream systems”.

“Data is no longer physically moved – rather, it’s accessed,” he said.

What is a zero copy?

As Goyal explained to SiliconRepublic.com, the zero-copy architecture allows users to query, share and access data directly from the source, as opposed to an ad-hoc ETL process.

Zero Copy enables this by using metadata, permissions and push query “without duplicating the underlying data”.

Goyal says that driving this change is Microsoft’s Fabric analytics platform, specifically its OneLake endpoint platform.

“The fabric presents a unified data core that makes traditional data replication obsolete,” he explains. “Two key approaches are Mirroring, which keeps source systems visible in real-time, and Crossovers, which allow entire multiterabyte databases to be exposed in an analysis environment in seconds without physical copying.

“While ADF still works in complex orchestration scenarios, it is no longer the backbone of data transmission – OneLake.”

‘Long awaited freedom’

Big changes in any industry can be met with joy or disdain depending on the circumstances, but Goyal says that in data teams, the so-called ‘death of ETL’ has been described as a “long-awaited freedom”.

“Years spent configuring SSIS packages and mapping ADF data flows pave the way for metadata management and governance policies instead,” he says. “The onus shifts from responding to pipeline failures to maintaining stable, controlled bypasses.

“The skillset is changing accordingly – the focus is shifting from pipeline engineering to data management, metadata management and strategic architecture, representing a significant rise in the data management role.”

But why exactly is zero copy adopted over ETL?

First, Goyal says that zero copy is replacing ETL because it is faster, cheaper and “more reliable”.

“Zero-copy architecture replaces ETL by allowing analytics and AI to access live data at its source – eliminating duplication, latency and administrative complexity while reducing costs.

“In short, ETL is expensive, slow and buggy; zero copy is lean, agile and autonomous.

Why is it important

Goyal believes the shift away from ETL is important because it “represents a fundamental architectural change”, allowing teams to manage metadata and management instead of fragmented data copies and “fragile pipelines”.

“The move is from an active, heavy maintenance model – characterized by nightly pipe failure alerts – to live business education.

In the long run, this means that organizations can make decisions with today’s data rather than yesterday’s batch, significantly reduce infrastructure and redirect skilled data teams away from firefighting work to strategic work.”

Goyal adds that from a data strategy perspective, zero copy “changes what’s most likely”.

“When the analytics framework reflects the business in real time rather than hours after the fact, decisions can be made on the ground truth of the moment,” he said. “Insufficient final completion means that strategies can grow without increasing costs.

“The built-in governance and persistence of metadata also means that organizations can trust their data deeply – enabling AI workloads, reporting and applications to confidently coexist in one well-managed data environment.”

Don’t miss out on the information you need to succeed. Sign up for Daily BriefSilicon Republic’s digest of must-know sci-tech news.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button