Data source cost models. How much does your visibility cost?

Cost Justification. The Tough Questions.
As a cybersecurity leader, have you been asked the following questions:
- Why are we spending this amount on a technology?
- What value does this provide to the business?
- Are other business units using this technology?
Was it a challenge to answer these questions? If so, building a data source cost model in an existing security information and event management (SIEM) deployment may be a good first step in answering these questions.
Data Source Cost Models. Let’s Get Started!
A cybersecurity leader is responsible for justifying the costs of a capability as well as the associated supporting technologies. Organizations that have security information and event management (SIEM) systems can analyze the populated data sources to build cost models that provide improved analytics on how much it costs to maintain a unique data source and what business units and services the data source supports.
There are many different SIEM platforms available. For this example, Splunk was used to illustrate how to build a cost model. Splunk is one popular SIEM provider and offers a free trial of their Cloud instance. The core concepts of a data source cost model can be applied to other platforms, but the exact steps are beyond the scope fo this post.
Calculate SIEM Total Cost Of Ownership.
A cybersecurity team may consume SIEM capabilities as part of an outsourced managed service or in-house operational function. Each strategy may have unique considerations, but the goal is to identify a data storage amount (GB) to cost.
This calculation may be easier in an outsourced model as there is a fixed subscription cost to daily data (GB) amount. For this example and for simplicity, a fictional managed service vendor charges a fixed cost of $1,000 per month for 10 GB of data per day.
Index Alignment To Business Unit And Service.
Splunk processes all incoming data into an index. For this example, index is synonymous with data source. As a first iteration to building a cost model, all indexes will need to be evaluated to identify the supported business unit and service. Splunk indexes are listed in Settings:

With Splunk, certain default indexes are used for the operational functionality and can be included as a cost for security operations teams to maintain the technology. Once these default indexes are excluded, a business unit and support service can be identified:

Once this exercise has been completed, SIEM operational teams can ensure a process is built to include business unit and service prior to onboarding a new index.
Evaluate the Data Source Usage.
Using Splunk’s “_internal” index, metrics.log sources, and specific indexes, a daily data source amount can be calculated. The following Splunk query can be used to identify daily data source amounts:
index=_internal earliest=-1mon@mon latest=now source=*metrics.log splunk_server=”*” group=”per_index_thruput”
series=”app1" OR “app2” OR “main” OR “sysmon” OR “winfw”
| eval GB=kb/1024/1024
| timechart span=1d sum(GB) as DailySumGB by series limit=0
Once refined, this query can be set to run on a monthly schedule. The results can be exported to CSV and an average monthly data amount can be calculated:

The data source averages can then be used to calculate monthly costs against subscription amounts (GB) and price:

Using the defined business unit, supported service, and data source totals, a pivot table can be created to identify the data source cost allocation:

While having visibility for organizational systems and applications is critical for IT and security teams for a multitude of capabilities, the data source costs generated by a unique business unit or service may not be well known. Building this initial cost model will help organizations gain better insight to these gaps.
Closing Remarks.
The data source cost model above is a first step a cybersecurity leader can take to better justify security spend. In subsequent posts, an additional component will be added to include detection alert costs that will further refine an overall cost model.
If you made it this far, thanks for reading! Any feedback is always appreciated.