You could go on and on and on collecting data about any particular URL. The amount of contextual data relevant to these things is near limitless, especially when they are involved in malware distribution or other hosting nastiness.

Savory includes somewhere around 100 or so different bits of information for each URL it records. More information is added when new, reputable, data sources are found on the net. savory's design makes it easy to extend the list of information about any particular URL, so as more new sources of context are uncovered, they are easily added to new or existing URLs.

So what does savory currently know?

I picked out a recent URL that was reported as having been defaced recently. Here's what savory will show you when you view the general information about the URL.

<img class="size-medium wp-image-687 aligncenter" title="URL Information" src="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-13-300x164.png" alt="" width="300" height="164" /></a></p>

Remember that almost all of this information is retrieved automatically. There is very little user-interaction with savory aside from just seeing "what's there"</p>

The first tab here shows the URL in question and the value of the tag at the time it was submitted (or if no title was found, a copy of the URL)</p> <p style="text-align: left;">In addition to that are a list of notes and tags. Notes can be added by automated processes or human processes to log actions or info about the URL. Tags can be used to allow future narrowing of search results, or the building of various forms of metrics or statistics.</p></p> <p style="text-align: left;">From here we can move on to the "meat and potatoes" of the URI context; the document.</p></p> <p style="text-align: left;">The document contains the majority of the URL information and it can be big, so I'll show it in pieces. First, the "document details". This information is specific to the document itself, not to the URL which the document is associated with.</p></p> <p style="text-align: center;"><a href="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-23.png"><img class="size-medium wp-image-688 aligncenter" title="Document detail" src="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-23-300x111.png" alt="" width="300" height="111" /></a></p> <p style="text-align: left;">Each document has a unique ID; the Document ID. This is a UUID that can be used to directly pull the full document details out of savory. This ID is used in the context of the backend document database. Savory also uses a relational database which includes a "savory ID". These two IDs are different, but stored in both databases so that tools that use one database can also access the other database if they need to.</p></p> <p style="text-align: left;">Each time savory updates the details of a URL's document, it creates a revision of the document. Using the drop down menu, admins can view the context of the URL as it has changed over time and new fields have been added, removed, or changed. It should be noted that the database can, and is, compressed periodically. The end result of this is that old revisions are lost. This is not a particularly big concern for us though because at this time we are not as concerned with document history as we are with what is current.</p></p> <p style="text-align: left;">Moving on down the page we get to the URI context</p></p> <p style="text-align: center;"><a href="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-31.png"><img class="size-medium wp-image-689 aligncenter" title="Context 1" src="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-31-300x237.png" alt="" width="300" height="237" /></a></p> <p style="text-align: left;">So here is what savory knows. This document has 72 fields of context (we'll see more in a second).</p></p> <p style="text-align: left;">You can see some fields that may be particularly useful to you as a human, but we also include information that is more useful to robots, scripts, or other automated tools.</p></p> <p style="text-align: left;">Moving on, here is more context.</p></p> <p style="text-align: center;"><a href="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-41.png"><img class="size-medium wp-image-690 aligncenter" title="Context 2" src="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-41-300x57.png" alt="" width="300" height="57" /></a></p> <p style="text-align: left;">Among other things, savory GeoIP maps the IP address of the URL in question, so you can map the location of the URL.</p></p> <p style="text-align: center;"><a href="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-51.png"><img class="size-medium wp-image-691 aligncenter" title="Context 3" src="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-51-300x202.png" alt="" width="300" height="202" /></a></p> <p style="text-align: left;">Another automated process that savory runs is page sourcing. Often we're interested to know what sort of badware is actually hosted at the URL in question. Sometimes it's a script, other time's it's an exe, or a malformed webpage, you get the idea.</p></p> <p style="text-align: left;">We use resources that we get from page sourcing to re-inforce our training and awareness programs at work. We've found admins to be much more interested in <strong>seeing</strong> the actual crap that is used to infect their machines than to just be told about it.</p> <p style="text-align: left;">Part of the page sourcing is hashing of the content, show in the next couple pics</p></p> <p style="text-align: center;"><a href="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-6.png"><img class="size-medium wp-image-692 aligncenter" title="Context" src="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-6-300x222.png" alt="" width="300" height="222" /></a></p> <p style="text-align: center;"><a href="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-7.png"><img class="aligncenter size-medium wp-image-693" title="Context" src="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-7-300x227.png" alt="" width="300" height="227" /></a></p> <p style="text-align: center;"><a href="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-8.png"><img class="size-medium wp-image-694 aligncenter" title="Context" src="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-8-300x91.png" alt="" width="300" height="91" /></a></p> <p style="text-align: left;">Alright, I can here you asking yourself "isn't that a little excessive? Do you really need that many hashes?"</p></p> <p style="text-align: left;">You'd be surprised how often we come across a new tool that finds some new weird way of doing things. Most of the time you'll see MD5 or some form of SHA used by these tools but then you'll find a tool that doesn't. We include all the hashes we can for the URL to ensure that we don't have to worry about the next new-fangled hash that people start using.</p></p> <p style="text-align: left;">Finally, a little bit more information after the hashes.</p></p> <p style="text-align: center;"><a href="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-9.png"><img class="size-medium wp-image-695 aligncenter" title="Screenshot-9" src="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-9-300x48.png" alt="" width="300" height="48" /></a></p> <p style="text-align: left;">You get the idea.</p></p> <p style="text-align: left;">savory includes one more stub of information on the document page. I will grant you that it is probably useless information. However, it makes for stimulating discussions between upper management during their multiple hour long meetings. So in that case, we refer to the following data as "printing money"; completely irrelevant to the Techs, only cared for by the Suits.</p></p> <p style="text-align: center;"><a href="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-10.png"><img class="size-medium wp-image-696 aligncenter" title="Screenshot-10" src="http://caphrim.net/tim/wp-content/uploads/2010/02/Screenshot-10-300x83.png" alt="" width="300" height="83" /></a></p> <p style="text-align: left;">Management like pictures of things, especially ones that look alien like the above QR code. When you take a bunch of these and put them on a single page, people begin to ask questions and your applications get more air-time among the population. QR codes are almost entirely useless in the context of this application. But we did it to scratch an itch (ok I was just wasting time and having fun with technology I admit it)</p></p> <p style="text-align: left;">But it does make for an interesting conversation starter.</p></p> <ul> <li><a href="http://groups.csail.mit.edu/uid/sikuli/">project sikuli</a></li> <li><a href="http://www.iconfinder.net/">iconfinder</a></li> <li><a href="http://www.lenmus.org/sw/page.php?pid=paginas&name=screenshots&lang=en&ignore=yes">lenmus</a></li> <li><a href="http://theserverpages.com/php/manual/en/zend.php">hacking the core of php</a></li> <li><a href="http://www.phpcompiler.org/">phc</a></li> <li><a href="http://www.roadsend.com/home/index.php">roadsend php</a></li> <li><a href="http://httpd.apache.org/docs/2.0/mod/mod_mime_magic.html">apache mod mime magic</a></li> <li><a href="http://tilgovi.github.com/couchdb-lounge/">lounge</a></li><br /> </ul> </a></li></a></li></a></li></a></li></a></li></a></li></a></li></a></li></ul></a></p></a></p></a></p></a></p></a></p></strong></p></a></p></a></p></a></p></a></p>