When splunk puts events into a summary index, it manipulates the fields in the event so as to accomodate functionality that it provides in other spots.

Well, recently I was in the process of building a custom command, and wanted to manipulate the values of the events that are sent to that command. That command, however, would be operating on events extracted from a summary index.

So what does a summary indexed event look like in Python? It looks like this

'destip': '',
'values(destport)': '',
'_cd': '0:64621922',
'_h': '1',
'_indextime': '1290017252',
'_meta': '',
'_raw': '11/17/2010 10:00:00, search_name="TC_DNS_RR - Offsite IP and port contacts", search_now=1290016800.000, info_min_time=1290009600.000, info_max_time=1290013200.000, info_search_time=1290016880.101, destip="", psrsvd_ct_destport=46, psrsvd_gc=46, psrsvd_v=1, psrsvd_vm_destport="1121;2;1122;7;1123;7;1124;7;1125;7;1126;7;1127;7;1128;2;"',
'_s': '1',
'_serial': '7',
'_si': 'simmer.site.orgsi_tc_dnsrr',
'_sourcetype': 'stash',
'_st': '1',
'_time': '1290009600',
'date_hour': '10',
'date_mday': '17',
'date_minute': '0',
'date_month': 'november',
'date_second': '0',
'date_wday': 'wednesday',
'date_year': '2010',
'date_zone': 'local',
'eventtype': 'auditd cpu df hardware interfaces iostat lastlog lsof netstat openPorts package protocol ps top usersWithLoginPrivs vmstat who',
'host': 'simmer.site.org',
'index': 'si_tc_dnsrr',
'info_max_time': '1290013200.000',
'info_min_time': '1290009600.000',
'info_search_time': '1290016880.101',
'linecount': '1',
'psrsvd_ct_destport': '46',
'psrsvd_gc': '46',
'psrsvd_v': '1',
'psrsvd_vm_destport': '1121;2;1122;7;1123;7;1124;7;1125;7;1126;7;1127;7;1128;2;',
'punct': '//_::,_="_-_____",_=.,_=.,_=.,_=.,_="...",_=,_=,_=',
'search_name': 'TC_DNS_RR - Offsite IP and port contacts',
'search_now': '1290016800.000',
'source': 'TC_DNS_RR - Offsite IP and port contacts',
'sourcetype': 'stash',
'splunk_server': 'simmer.site.org',
'tag::eventtype': 'check\ncpu\ndf\nfile\nhost\niostat\nlsof\nmemory\nnetstat\nos\nprocess\nps\nreport\nresource\nsuccess\ntop\nvmstat',
'timeendpos': '19',
'timestartpos': '11'
Now, of particular interest to me is rewriting the event so that my multi-value field, dest_port, would only show the values that matched the dest_port.

For instance, the original event may have had the following destination ports

  • 80</li>
  • 443</li>
  • 8080</li>
    But if only port 80 and 8080 were matched in my custom command, then I wanted the custom command to emit an event that only had
    • 80</li>
    • 8080</li>
      In it.

      The event that is shipped to your custom command is a simple Python dictionary, as you can see above. I believe that when splunk performs a stats command on these summarized events, it looks at the psrsvd_*</code> fields. So I was left wondering which fields I should be manipulating before I pass the event back to splunk.

      Well, so there is the psrsvd_vm_destport</code> field and a psrsvd_ct_destport</code> field. I believe the _ct_</code> field tells splunk how many values are in that multi-value field. The _vm_</code> field then, I think, lists the values, semi-colon delimited. There's one "but" though; there are not 46 items in that _vm_</code> list.

      After considering it for a little while, what I think the value of the _vm_</code> field is telling you is that yes, it is semi-colon delimited, but it's a semi-colon delimited list of semi-colon delimited pairs.

      For example


      Split up, the first semi-colon delimited list would be.

      • 1121;2</li>
      • 1122;7</li>
      • 1123;7</li>
      • 1124;7</li>
      • 1125;7</li>
      • 1126;7</li>
      • 1127;7</li>
      • 1128;2</li>
        Now you should see where I'm coming from. The above list contains semi-colon delimited strings where the first index value is the port number and the second index value is the number of times that value appears in, for lack of a better term, the uncompressed list of multiple values.

        So the above list is saying

        • port 1121 appears in the list 2 times</li>
        • port 1122 appears in the list 7 times</li>
        • etc</li>
          This appears to make sense since if you add the numbers next to each port, you get 46, which is the _ct_</code> value that splunk reported.

          So, and I haven't done this so I can't confirm, what I think I need to do is make myself a new array, step on the values in the _vm_</code> field and update the _ct_</code> field appropriately. Splunk then should be able to still do its thing.