By Jonny McCormick, Solutions Architect at Ocient
A good way to get a sense of current trends on a subject is to look at search engine results.
Whether it’s Google, Bing or another site, you’ll often see themes or a thread that stand out. If you do this for “big data cloud costs” right now you find the usual array of cloud vendor price lists as you might expect, however, amongst these – and usually quite high up – you’ll see words standing out. “Spiralling”, “hidden costs”, “cost implications”, “optimize”. There are an ever-increasing number of articles dedicated to either describing huge challenges with cloud costs or explaining how to get these costs under control. While not a new phenomenon, for big data workflows the cost challenge needs a new scale.
Why are cloud big data costs spiralling?
One problem is big data workflows are complex. Managing data sources, data pipelines, transformations, aggregations, correlations, storage, analysis, security means many systems and services must come together to address the needs of a big data project. Data ops teams need to be given access to these resources to realize business critical insights quickly. Often, the cloud is the most common way to achieve this. Of course, each operation, API call, GB transferred, VPN, firewall, CPU cycle, GB stored has a cost and without proper management access to these resources is often akin to an all you can eat buffet – only with infinitely large plates and 24-hour opening times. This leads to monthly bills that are completely unpredictable, and if not managed correctly, inviable.
Success with predictable costs
Another problem faced with managing cloud costs is when assumptions meet reality. At the beginning of a project which will run ostensibly on cloud, to build any kind of cost model you need to make assumptions. Some assumptions have little risk attached – for example you likely have accurate information around data set size or number of users. Other assumptions need to be made based on very little data at the beginning – number of concurrent queries, query complexity, amount of time the solution is active are often examples. In situations where there are gaps, it is not uncommon for assumptions to fit an expected budget. Due to this, it’s only when you’re deploying a solution you start to understand actual costs. These gaps are one reason Ocient provides an incremental approach to the test and validation of workloads. To gain further understanding, the profile of customer data and requirements are synthesized and validated in house before being put in production. This guarantees success with predictable costs.
Of all the metrics mentioned, in my experience, the estimated time a system will be active is one of the least understood metrics at the beginning of a project. Every business is unique and the data analysis problem is also unique – amount of data, speed of data creation, business questions, time to insight. This means that establishing the amount of resources needed, given a set of queries and concurrency, takes real effort to predict with any accuracy. Often this information is key to a cloud cost model. Cloud savings against on premises deployments are really related to the amount of time you’re not using a given system or service. This means to make savings in cloud, you must have some focus on how efficiently you’re using the service or, put another way, how effectively you’re not using the service.
This is something at Ocient we see time and time again. Fixed, sporadic workloads brought to cloud can bring huge benefits in both cost and flexibility. Especially when the data sets are small and loaded infrequently. However, when the requirements move to more on demand and ad-hoc – with complex ETL pipelines and streaming requirements – price / performance is massively impacted and much harder to control. In these situations a hybrid approach – or in extreme cases on premises approach – are a much better fit for the needs of the business. A solution that recognizes always on requirements and is designed to achieve world class performance and efficiency can re-define what is possible with the big data challenges we are presented with today.
At the same time it shouldn’t be the case that our focus should be using a system as little as possible. At Ocient we believe that any data platform should be used at its limits, and those limits should be vast, but the performance can never come at an unexpected cost. Our technology is designed to make this possible, with features such as:
- Compute Adjacent Storage Architecture (CASA) – Unleashing the power of today’s multicore CPU’s and million IOPS NVMe drives.
- Total Pipeline Ownership – Ocient software controls the data pipeline from source to analysis.
- Interactive Insights at Massive Scale – Our platform is designed for hyper scale – managing trillions of records and providing interactive response times.
- One Truth AI – Advanced capabilities such as in database Machine Learning models ensure Ocient can own data source, transformation, cleansing, training, validation, and test without data movement.
- Robust Workload Management – Multiple ways to prioritise and manage teams and their workflows to allow consolidation and huge efficiencies.
- Complex Data Types – Increased fidelity in managing complex data sources such as multi-dimensional arrays, tuples and geospatial data types.
- Secondary Indexing – Fast changing data landscapes need a solution that can flex to accommodate more data faster.
- Always on Pricing – To gain the most value from data sets there should be no penalty for analytics systems to be active and the cost should be predictable.
To understand which approach is right for a project there are two key questions that need to be asked right at the start:
- What is the desired outcome?
- What value does this outcome have for the business?
If you are not clear on the analysis you’re looking to provide, or a good understanding of the value having the information will bring, it is impossible to provide a budget within which to build a solution. This is often the case – businesses either spend far more than the information generated is worth or provide a solution that doesn’t meet their needs, but at the right cost. A project where outcome and value are clear has much greater chance of success and you also have the benefit of being able to invest in a solution with confidence. It opens up a range of ways to approach the infrastructure not available before. Defined footprint, cloud commitment, reserved instances become possible and provide immediate savings over ad-hoc spend.
This is at the heart of what we’re doing at Ocient. With defined projects and success criteria we can build a system to meet even the most complex requirements. We look at all options – cloud, on premises and hybrid. We do this in synergy with our customers – proving capabilities and demonstrating success before commitments are made. This lowers the development cycle and accelerates a businesses ability to launch. We have truly talented people working on some of the largest and most complex workflows daily. With a world class delivery team who are with people every step of the way to ensure business goals are aligned on both delivery and cost. If any of the challenges I describe are facing you today, we would love to learn more about them and show you the ways in which Ocient can help.