Published June 9, 2022

Keeping a Hyperscale Data Repository Compliant: Getting Ahead of Law Enforcement Requirements for Call Detail Records

By Tim Vokes, Solutions Architect at Ocient

The mobile industry in the US has consolidated from four major players to three. Many smaller brands are also leasing capabilities from the “Big 3.” This has had a major impact on carriers because they now have significantly more data to manage. They retain data to monitor the health of their networks, predict and troubleshoot issues, and maintain network security. They also must meet legal requirements to provide insight into Internet Call Records (ICRs) and Call Detail Records (CDR search) to law enforcement agencies.

In the US, it’s the legal responsibility of the carriers themselves to retain, at a minimum, two years of this data for lawful request from police, judicial, or other entities.  While carriers focus on being compliant with regulations like Communications Assistance for Law Enforcement Act (CALEA), the consolidation of the industry to three major carriers has resulted in a massive increase in the size and complexity of these repositories.

The rise and fall of homegrown file-based solutions and Hadoop systems 

Many carriers have created repositories based upon homegrown file-based solutions or used Hadoop systems to meet their legal requirements. File-based solutions have both considerable server costs for processing and provide minimal metadata about the files themselves. The performance of these systems is not very good, and they have poor query capability built into them. Sadly, both the cost and analytical effectiveness of Hadoop-based solutions have not played out as advertised. With the rapid data growth, the cost of these systems is prohibitive, and carriers are looking for a new solution

The evaluation of object store-based solutions in the cloud

One trend emerging within the search for Hadoop replacements is the evaluation of object store-based solutions in the cloud with some sort of in-memory component to provide analytical capabilities. One could argue, however, that the Hadoop solution evolved into a file repository, and the merits of this move are inconsequential. It doesn’t address the three requirements of scale, cost management, and analytical capability driving the change.

Finding a solution optimized for a modern, compliant repository

To better understand the complex data analysis challenges associated with maintaining a compliant repository at hyperscale, let’s go back to the core requirements.

  1. Full resolution analysis of all records for a specific lookback window. Often the requests of a legal entity require all records associated with a given cell phone number for a day, a week, 60 days, or more. This may be in support of a missing person that the police are trying to find. It could also be in support of a legal case in the courts. The goal is to create a digital forensic trail supporting ongoing investigations and determine what a person was doing and where they were located at a specific point in time. If your system can’t store the full resolution and the geographic data, you can miss the required details to resolve your case. In this case of public safety, this can be extremely damaging.
  2. Geospatial analysis on records generated within a specific time frame. There are also geospatial aspects to these requests. For example, a legal entity may require records within 1, 5, or 20 miles of a given cell tower. The ability to draw a boundary around a given location is key. Furthermore, a law enforcement entity may need to understand what the movement of that cell phone was and what activity occurred from location to location over time. The issue with many current solutions is that they can only qualify on one dimension not several, preventing your analytics from returning granular enough results. The ability to flex and scale is also limited, which can drive costs up through additional environments and duplication of data.
  3. Continuous data loading at hyperscale with data available in seconds. Another characteristic of a modern, complaitn solution is the hyperscale loading requirement. They ingest billions to trillions of records, multiple petabytes of data, and all day, every day. Any sufficient repository solution must load, on average, 700-800 thousand rows of data per second. This constant ingest also occurs while running queries on the system that require interactive response time. The combination of both extreme ingest rates and service level agreements surrounding the legal requests can force more infrastructure, sacrificed Service Level Objectives (SLO’s), and data that is too stale.

A new approach to hyperscale data analysis on call detail records 

Carriers are looking for solutions that are new, different, and cost-effective. This is where a solution provider like Ocient comes in. Ocient’s Hyperscale Data Warehouse addresses the massive ingest requirements while delivering robust analytical capabilities and full ANSI SQL support. Ocient also has industry-leading geospatial capabilities and robust Workload Management (WLM) to enable demanding co-located workloads. Finally, Ocient’s hyperscale data analytics solutions are affordable at this scale. This cost-effectiveness is even true when compared to object stores.

In conclusion, the consolidation of carriers and their legal obligations have led to daunting data management challenges around cost, scale, and performance. They’re becoming increasingly aware that the old way of doing things through Hadoop is not going to work. Ocient is uniquely positioned to provide a solution for repositories that customers can leverage now and at their future scale.