Product
Ocient Favicon
The Ocient Hyperscale Data Warehouse

To deliver next-generation data analytics, Ocient completely reimagined data warehouse design to deliver real-time analysis of complex, hyperscale datasets.

Learn More
Pricing Icon
Pricing

Ocient is uniquely designed for maximum performance and flexibility with always-on analytics, maximizing your hardware, cloud, or data warehouse as a service spend. You get predictable, lower costs (and absolutely zero headaches).

See How
Solutions
Customer Solutions and Workload Services Icon
Customer Solutions and Workload Services

Ocient offers the only solutions development approach that enables customers to try a production-ready solution tailored to their business requirements before investing capital and resources.

Explore
Management Services Icon
Management Services

Tap into the deep experience of the Ocient Management Services team to set up, manage, and monitor your Ocient solution.

Learn More
Company
Ocient Favicon
About Ocient

In 2016 our team of industry veterans began building a hyperscale data warehouse to tackle large, complex workloads.

Learn More
Published March 22, 2023

It’s Always Tax Season at Hyperscale: How AdTech Teams Can Solve the Top 3 Hidden Costs in Handling Growing Datasets

Director of User Experience Jonathan KelleyBy Jonathan Kelley, Director of User Experience at Ocient 

In the past few years, many factors have spurred a wave of growth in advertising, marketing, and e-commerce. The COVID-19 pandemic accelerated this growth. According to the U.S. Census Bureau, over $570 billion was spent on e-commerce in 2019. Fast forward to 2022 and $295 billion was spent in Q3 alone in the e-commerce space. More recent projections of ad spend show more modest growth, but even in the face of economic headwinds, digital advertising is expected to grow annually at a 5.2%-7.3% rate for the next 5 years, compounding to between 28%-42% growth.  

While the world was busy fighting the pandemic, the steady march of technological progress drove increases in the amount of data flowing into the AdTech ecosystem. Between network improvements from 5G rollouts, an increasing array of connected devices like smart TVs, and new advertising mediums like digital out-of-home displays, AdTech firms had to handle higher volume of data and a larger variety of data streaming into their advertising systems. 

The industry also shows no signs of slowing the rate of change for standards around privacy, consumer and brand protection, and advertising transparency. In the coming years, AdTech firms will see the conversion of third-party cookies to first-party data sources, the development of robust clean room technologies for privacy-safe data collaboration, and advertising supply chain transparency that connects advertisers’ dollars throughout the ecosystem. 

These forces, combined with millions of bid opportunities per second, make innovation in AdTech notoriously expensive. Yet, delivering new features to the market is essential to the survival of AdTech firms on the demand side and supply side. It requires time and talent to engineer high-performance data systems, and large IT budgets to operate these systems at scale. 

It is clear the world is not returning to pre-pandemic levels of online activity, nor is the industry going to stand still. 

So how are we to proceed? How are we to ensure our companies continue to succeed in the face of certain change? AdTech firms need to embrace data systems that help maintain development agility with excellent ROI to stay relevant in the years to come and maintain healthy profit margins.  

After working with AdTech firms and other leading hyperscale industries for the last few years, we have observed and solved some of the hidden costs that tax companies working with the world’s biggest data sets. Here are 3 of the top hidden data costs and what you can do to conquer them: 

Hidden Cost #1: The Time-to-Market Tax 

In AdTech, time to market is critical. New opportunities to innovate in AdTech arrive with bewildering frequency, and the leaders in AdTech must move fast to bring their advertisers and publishers tools to leverage innovative technologies that extend reach and ROI. Otherwise, ad budgets will shift to the competition. 

Unfortunately, most big data technologies require a vast array of touchy technologies staffed by expert talent who manage a tightrope of tradeoffs. Friction exists everywhere in the process, increasing costs and slowing new product releases. Engineering complexity balloons and time to market suffers.  

One example where we see this friction impact AdTech is in loading and cleaning data sources and preparing aggregations for downstream systems. Demand Side Platforms and Supply Side Platforms routinely process over 1 billion records per hour. At this scale, small inefficiencies compound, so engineers have needed to create complex custom systems or invest in various specialized systems to solve scale challenges on a budget. Data system sprawl makes adding every new feature slower and more costly. The costs of using brittle and complex data processing pipelines can put your team at a competitive disadvantage. Pipeline development, deployment, monitoring, and maintenance levy a hidden cost that is easy to overlook, but the delays in new feature releases are seen and felt by everyone in the organization.  

Rather than accepting that complex and slow data prep on many systems is “just the way big data works,” choose a modern, unified data platform. Ensure your analytics platform makes ingesting structured and semi-structured data effortless without the need for separate ETL frameworks. Ideally, your data warehouse handles transformation of data in flight, so you can take complex records from upstream applications and transform them efficiently into the structure needed for analytics. This can reduce the computing requirements, the complexity of data pipelines, and the number of systems that must be modified, a win-win-win for getting features shipped affordably. When you are ready to launch new data features, they should not each require six-figure engineering investments. Seek out a unified analytics platform where adding new data products is as simple as writing a query to reduce the time to market tax. 

Hidden Cost #2: The Data Movement Tax  

AdTech firms process petabytes of data daily. There is already a considerable cost to load, query, and analyze this large amount of data, but that is only the beginning of the costs.  

Moving data between various systems adds a tax on every analytics activity, leading to slower processes, higher costs, delayed reporting, and security risks. Each backup, replica, or alternative system further compounds the cost of data movement. Storage, I/O, and network costs pile up quickly when data is as large as it is in AdTech, and the human costs and opportunity costs follow closely behind.  

  • Customer-facing reporting is delayed by each processing stage to move and aggregate data, frustrating users.  
  • Analyst queries are slowed by crossing network boundaries between systems, increasing time to insight.
  • Data governance processes must be engaged each time data is shared with a new system, adding to the expense and expanding the security footprint. 
  • Data scientists must move data out of one system into another environment, slowing research efforts and creating copies. A data scientist would have to wait over 11 hours to move a 50TB data set over a 10 Gbps network. To work around the slowness of repeated data transfers, they “make big data small,” sacrificing valuable insights to make the data workable.  

To reduce the data movement tax, avoid building a web of point solutions that move data around. Instead, work with a data warehouse that can do more within the data warehouse while efficiently handling scale.  

For example, advanced systems can train machine learning models on computing resources that are adjacent to the data and execute those models during loading and transformation to score new samples. This reduces data movement, reduces total system costs through greater efficiencies, and can accelerate the feature development cycle.  

For AdTech, we have engineered a solution that delivers high concurrency, real-time analytics for campaign reporting while also offering a complete SQL engine for the massive-scale OLAP queries needed in forecasting, bid optimization, and yield optimization. With its optimized engine built for NVMe SSDs, Ocient’s data transformation solution handles massive transformations intra-database and can replace entire Spark clusters for bidder log processing. Ocient also leverages its Zero Copy Reliability™ to achieve data reliability through erasure coding instead of replicas that require further movement of data. This can represent up to an 80% savings in storage size – a cost that no business wants to unnecessarily scale. Seek out a unified data platform like Ocient to eliminate the data movement tax. 

Hidden Cost #3: The Flexibility Tax  

Pay-as-you-go systems are less practical for the continuous, heavy-duty workloads in AdTech. If you have a continuous workload with petabytes of bidder log and impression data, you should leverage committed resources. Dedicated hardware can lead to substantial savings and operating on prem can take the savings even further.  

The cloud tax described by Sarah Wang and Martin Casado of Andreessen Horowitz impacts the bottom line in a major way with hyperscale data. One example of this is always-on data ingestion pipelines. Cloud warehouse vendors have flexible options, but the overall cost to transform and ingest data can exact a sizable premium.  

Queries on pay-as-you-go cloud platforms can also quickly run up the bill, leading to budget surprises. We have observed that these surprises discourage future experimentation and new data product development on teams. The apparent benefits of flexibility can quickly turn into a headwind of uncertainty for data teams, stifling innovation. 

Committed infrastructure can ensure always-on workloads do not blow up the budget. For AdTech workloads analyzing bidder log data and impression data, loading and transformation should cost less than 20% of the overall analytics offering. Committed query infrastructure ensures you do not pay extra when your teams innovate and expand into new analytical areas. Your platform should support workload management concepts to balance workloads between user groups so that innovation can thrive while mission-critical workloads deliver their service level objectives. 

It is true that this requires up-front capacity planning, but with a vendor who can scale up as your needs increase, you will have the continuous processing your largest workloads require and the ability to expand when future opportunities arise, all at a reasonable cost. The best solutions designed for scale will seamlessly rebalance workloads as the system grows. Seek out cost-effective, hyper-performant data systems with always-on pricing options that will allow you to innovate and grow your data products without bottom line surprises. 

Conclusion 

As the AdTech industry continues to evolve and change, companies will have to adapt to survive. The winners will be those who create market-leading insights from their data and quickly convert those insights into new products that add value for advertisers and publishers. By partnering with analytical product vendors with dedicated deployment options, integrated intra-database capabilities, and scalable efficiency that reduces hidden costs, AdTech firms will be poised to rapidly create products that define the new frontier in advertising at ever-increasing scale. 

Ocient partners with AdTech firms, using its hyperscale data warehouse to create innovative solutions. Request a demo today to learn how Ocient pilots next-generation data analytics solutions so that you can do more with your data.