When splunk puts events into a summary index, it manipulates the fields in the event so as to accomodate functionality that it provides in other spots.
Well, recently I was in the process of building a custom command, and wanted to manipulate the values of the events that are sent to that command. That command, however, would be operating on events extracted from a summary index.
So what does a summary indexed event look like in Python? It looks like this
'_raw': '11/17/2010 10:00:00, search_name="TC_DNS_RR - Offsite IP and port contacts", search_now=1290016800.000, info_min_time=1290009600.000, info_max_time=1290013200.000, info_search_time=1290016880.101, destip="184.108.40.206", psrsvd_ct_destport=46, psrsvd_gc=46, psrsvd_v=1, psrsvd_vm_destport="1121;2;1122;7;1123;7;1124;7;1125;7;1126;7;1127;7;1128;2;"',
'eventtype': 'auditd cpu df hardware interfaces iostat lastlog lsof netstat openPorts package protocol ps top usersWithLoginPrivs vmstat who',
'search_name': 'TC_DNS_RR - Offsite IP and port contacts',
'source': 'TC_DNS_RR - Offsite IP and port contacts',
Now, of particular interest to me is rewriting the event so that my multi-value field, dest_port, would only show the values that matched the dest_port.
For instance, the original event may have had the following destination ports
But if only port 80 and 8080 were matched in my custom command, then I wanted the custom command to emit an event that only had
The event that is shipped to your custom command is a simple Python dictionary, as you can see above. I believe that when splunk performs a stats command on these summarized events, it looks at the
psrsvd_*</code> fields. So I was left wondering which fields I should be manipulating before I pass the event back to splunk.
Well, so there is the
psrsvd_vm_destport</code> field and a
psrsvd_ct_destport</code> field. I believe the
_ct_</code> field tells splunk how many values are in that multi-value field. The
_vm_</code> field then, I think, lists the values, semi-colon delimited. There's one "but" though; there are not 46 items in that
After considering it for a little while, what I think the value of the
_vm_</code> field is telling you is that yes, it is semi-colon delimited, but it's a semi-colon delimited list of semi-colon delimited pairs.
Split up, the first semi-colon delimited list would be.
Now you should see where I'm coming from. The above list contains semi-colon delimited strings where the first index value is the port number and the second index value is the number of times that value appears in, for lack of a better term, the uncompressed list of multiple values.
So the above list is saying
- port 1121 appears in the list 2 times</li>
- port 1122 appears in the list 7 times</li>
This appears to make sense since if you add the numbers next to each port, you get 46, which is the
_ct_</code> value that splunk reported.
So, and I haven't done this so I can't confirm, what I think I need to do is make myself a new array, step on the values in the
_vm_</code> field and update the
_ct_</code> field appropriately. Splunk then should be able to still do its thing.