24 November 2025

Data Processing and Storage – The Backbone of your Logging Approach

Blogs, Data Engineering

In our previous blog post on the data lifecycle and data value chain we looked at Data Generation, Collection, and Ingestion as the first phases of your data’s journey. In this blog we will further this by looking at the phases of processing and storage.

Data processing is inevitable, as any collected our routed piece of data will have to be stored. Whether this is in analytic tools, buckets, or data lakes processing will take place, whether intentional or not! Anytime a piece of data crosses a boundary into a tool, service, or dedicated storage it is processed, even if this means preserving it in a full fidelity form. Sometimes this approach makes sense, for audits and ensured compliance it is often required to store your data in full fidelity but for analytic tooling where you pay for format and storage this might not be the most effective approach.

Most commonly we see problems arise in the data processing phase in environments with agent and tool sprawl, where several agents are collecting the same piece of data for their own tooling. The effect of this is two pronged – the first is in the reliability of data insights and the second is in the translation of naming between tooling.

 

Apto’s most recent example of this is in a piece of data investigation work we completed with one of the UK’s largest airports. In this piece we worked with the airport to map their data landscape between their current tooling (Splunk, Tableau, Databricks, and Google Cloud Platform storage) to generate insights into cost, workload, sprawl, silos, and provided and generated business value. Within this we found several things, but the most pertinent was a key operational reliance on data flows coming via Splunk and then feeding into other tools – in this case data was being transformed once before Splunk, again leaving Splunk, and once again when entering the other tools.

This architecture raises issues with availability, reliability, and vendor lock-in but also revealed its true impact at a business level – there was no common understanding for the calculation and naming of key metrics like passenger. Are you a passenger once parked? Or one when you scan your boarding pass? Or even when you are sat down on a departing plane? It turned out that the operational impact of this discrepancy was significant, as teams calculated the passenger metric to accommodate their KPI’s and SLA’s rather than factually. Without a common language, or centralised model for data processing it was impossible for teams and business units to be speaking the same language.

 

To resolve this, we engaged with the airport in a piece to introduce centralised collection, utilising pipelining technology to ensure the right data, in the right format, was available to the right teams, at the right time. Passenger, the metric, defined once and then sent to each specific team via their selected tooling.


The other piece of this puzzle is storage, as unprocessed data is often cumbersome and hard, at a human level, to parse. In analytic tooling this often means inflated storage due to ‘heavy’ data types, like csv or xml, that require you to store the data and its formatting texts (, and <>). It also contributes to wider problems around retention policies, as they become governed by compliance rather than use or value. By effectively defining a common model for data format and a standard policy for its destination routing full fidelity data to long term storage becomes a viable strategy – this allows you to optimise your analytic tooling for real-time insights and response rather than a data archive for audits. In our next data value chain blog, we will explore this in more detail, detailing how data analysis and data actions are key stages in the lifecycle of data within organisations.

    Stay updated with the latest from Apto

    Subscribe now to receive monthly updates on all things SIEM.

    We'll never send spam or sell your data, see our privacy policy

    See how we can build your digital capability,
    call us on +44(0)845 226 3351 or send us an email…