ELK + Palo Alto Networks Part 2 (URL and Custom Logs)

Part 1: https://anderikistan.com/2016/03/26/elk-palo-alto-networks/

To recap part 1 we did the following:

  1. Set up syslog-ng to read in logs from a Palo Alto Networks firewall
  2. Set up some syslog profiles in our profile and forwarded traffic logs
  3. Installed ELK stack and gained the ability to visualize our logs

In our ELK instance we have traffic logs and are able to get a lot of great information including the very cool GeoHash stuff. Now we want to take a better look at web traffic and help identify risky users or see what kind of stuff people are doing in our environment. For this tutorial we are going to do the following:

  1. Set up custom threat logs on our Palo Alto Networks firewall
  2. Ensure syslog-ng is working properly with our new syslog feed
  3. Update the Logstash configuration file
  4. Tweak our Elasticsearch index mapping
  5. Build out some visualizations

This tutorial assumes you have gotten through the previous tutorial as we will reference files and techniques discussed previously. Let’s get started

Custom Threat Logs with Palo Alto Networks

Before we do anything let’s create a new syslog object on our firewall. In case you forgot go to Device Tab > Server Profiles > Syslog. Click (+ Add) in the bottom left and use the following options:

  • Name: URLSyslog
  • Syslog Server: The same as your traffic syslog
  • Transport: UDP
  • Port: 514
  • Format: BSD
  • Facility: LOG_LOCAL1

So hopefully you picked up on the threat part of Custom Threat Logs. Palo Alto Networks doesn’t ship URLs in syslogs by default so we have to build our own custom threat log type in order to get the fields we want. To create the custom syslog do the following:

In the popup box click on the box called “Custom Log Format”‘

Here you will be presented with 5 different Log Type options. We will select threat and we will be prompted with another popup box



The beauty of these custom logs is we can refine them down to only what we are able to ingest and index in our ELK deployment. Here we are going grab the source IP, destination IP, app-id, category, and the URL itself. The PAN Firewalls use the $misc field as the URL field. For more detail check out this article:


Now add your newly created URL syslog object to the Log Forwarding object created in the previous tutorial:


Don’t forget to commit the firewall configuration. With this done . Let’s move over to our ELK server and get these logs coming in. Don’t forget to commit the firewall configuration.

Tinker with syslog-ng

To simplify having multiple syslog streams let’s use syslog facilities. Syslog facilities allow syslog-ng to differentiate between what type of logs it is dealing with. We actually did all this work in the previous lesson but maybe a quick review will help. Let’s look at the config file.

sudo vi /etc/syslog-ng/syslog-ng.conf

Below is the “meat” of the config:

source s_netsyslog {
        udp(ip( port(514) flags(no-hostname));
        tcp(ip( port(514) flags(no-hostname));

destination d_netsyslog { file("/var/log/network.log" owner("root") group("root") perm(0644)); };

destination d_urlsyslog { file("/var/log/urllogs.log" owner("root") group("root") perm(0644)); };

log { source(s_netsyslog); filter(f_traffic); destination(d_netsyslog); };

log { source(s_netsyslog); filter(f_threat); destination(d_urlsyslog); };

filter f_traffic { facility(local0); };
filter f_threat { facility(local1); };

We have one “source” which is a listener on port 514, the default port for syslog. Next we define a couple of options for destinations. Look near the bottom and you can see we define some filters based on facility. That’s why proper facility configuration on the firewall is so important. Lastly in the middle you can see the log function which defines the source, filter, and destination. With all this set up and a restart of the syslog-ng service:

sudo service syslog-ng restart

We should now start seeing traffic:

sudo tail -f /var/log/urllogs.log

Your logs should look something like this:

2016-04-24T21:59:27-05:00 WOPR ssl business-and-economy "e.crashlytics.com/"
2016-04-24T21:59:28-05:00 WOPR ms-onedrive-base online-personal-storage "bn1304.storage.live.com/"
2016-04-24T21:59:28-05:00 WOPR ms-onedrive-base online-personal-storage "bn1304.storage.live.com/"

Congrats. Your server is now reading in our custom URL syslog feed.

Update Logstash Configuration

So we have configured logstash once… how about we just repeat our steps. Well that could work except there’s one big difference between our custom syslog and the default traffic syslog. Can you spot it?


2016-04-24T21:59:27-05:00 WOPR ssl business-and-economy "e.crashlytics.com/"
2016-04-24T21:59:28-05:00 WOPR ms-onedrive-base online-personal-storage "bn1304.storage.live.com/"
2016-04-24T21:59:28-05:00 WOPR ms-onedrive-base online-personal-storage "bn1304.storage.live.com/"


2016-04-24T15:09:09-05:00 WOPR 1,2016/04/24 15:09:08,001606001622,TRAFFIC,drop,1,2016/04/24 15:09:08,,,,,Deny Outbound,,,not-applicable,vsys1,Wired,Wireless,ethernet1/4,,Fowarder,2016/04/24 15:09:08,0,1,56843,53,0,0,0x0,udp,deny,68,68,0,1,2016/04/24 15:09:08,0,any,0,6725833,0x0,,,0,1,0,policy-deny,0,0,0,0,,WOPR,from-policy
2016-04-24T15:09:12-05:00 WOPR 1,2016/04/24 15:09:11,001606001622,TRAFFIC,drop,1,2016/04/24 15:09:11,,,,,Deny Outbound,,,not-applicable,vsys1,Wired,Wireless,ethernet1/4,,Fowarder,2016/04/24 15:09:11,0,1,56843,53,0,0,0x0,udp,deny,68,68,0,1,2016/04/24 15:09:12,0,any,0,6725834,0x0,,,0,1,0,policy-deny,0,0,0,0,,WOPR,from-policy

If you said the difference is commas then you would be correct. The problem is if there’s no commas then the way we used last time:

csv {
      source => "raw_message"
      columns => [ "PaloAltoDomain","ReceiveTime","SerialNum","Type","Threat-ContentType","ConfigVersion","GenerateTime","SourceAddress","DestinationAddress","NATSourceIP","NATDestinationIP","Rule","SourceUser","DestinationUser","Application","VirtualSystem","SourceZone","DestinationZone","InboundInterface","OutboundInterface","LogAction","TimeLogged","SessionID","RepeatCount","SourcePort","DestinationPort","NATSourcePort","NATDestinationPort","Flags","IPProtocol","Action","Bytes","BytesSent","BytesReceived","Packets","StartTime","ElapsedTimeInSec","Category","Padding","seqno","actionflags","SourceCountry","DestinationCountry","cpadding","pkts_sent","pkts_received" ]

… it won’t work. We will have to do something else, but first let’s cover how the config file will work. Logstash is very cool in how it allows you to make multiple config files which certainly helps with readability. The problem is that logstash, upon startup, builds its own configuration file where it merges all of your configuration files together. So for the most predictable results I have found it’s easiest to just go ahead and build one large config file.

As we start adding more inputs in we already run into our first problem which is how to make sure we can differentiate between logs. In this scenario we will assign “types” to our input. Example is below:

input {
  file {
        path => ["/var/log/network.log"]
        sincedb_path => "/var/log/logstash/sincedb"
        start_position => "beginning"
        type => "traffic"

  file {
        path => ["/var/log/urllogs.log"]
        sincedb_path => "/var/log/logstash/urlsincedb"
        start_position => "beginning"
        type => "url"

Here we have created two separate file methods and assigned each a “type”, traffic and url, which can be used throughout our config file so we can perform certain actions on certain syslog data. We will want to treat traffic logs differently than url logs and this allows us to do that. Additionally we defined a new sincedb instance.

As we go down our config file we will repeatedly use the “if function”

if [type] == "url" {...}

We have “typed” our input and can refer to said type throughout our configuration. Let’s look at how the output section looks with both traffic and url.

output {
  if [type] == "traffic" {
    elasticsearch {
    index => "pan-traffic"
    hosts => ["localhost:9200"]
    template => "/opt/logstash/elasticsearch-template.json"
    template_overwrite => true

  if [type] == "url" 
    elasticsearch {
    index => "pan-url"
    hosts => ["localhost:9200"]
} #end output block

So within the one output method we have handled both our traffic and url syslog traffic that we are expecting. So we have gone over input and output but what about filter {}?

filter {
  if [type] == "url" {
    grok {
      #strips timestamp and host off of the front of the syslog message leaving the raw message generated by the syslog client and saves it as "raw_message"
      #patterns_dir => "/opt/logstash/patterns"
      match => { "message" => '%{TIMESTAMP_ISO8601} %{IPV4:firewallIP} %{HOSTNAME:firewall} %{IPV4:sourceIP} %{IPV4:destinationIP} %{NOTSPACE:application} %{NOTSPACE:category} "%{URIHOST:URIHost}%{URIPATH:URIPath}"'  }

What is all this match business? Remember how our custom syslog does not have commas in it? This prevents us from using the very simple csv trick. We have to ramp up our effort a notch in order to get this working right. Enter the grok.

What’s the Deal with Groks?

Grok is a way to take unstructured data and parse it into something indexable and thus searchable. There’s a lot to this concept and grok is an incredibly powerful tool! For the sake of this tutorial we are going to keep it pretty high level. The goal is to teach logstash how to parse out the good stuff by helping it identify patterns or have some sense of what to look for. There’s probably dozens of ways to go about building your grok filter but the way that worked for me was to use a grokconstructor: http://grokconstructor.appspot.com/do/construction.

To use the constructor start out with a line from your syslog file that you intend to match. For this example I chose the following:

2016-04-24T21:59:27-05:00 WOPR ssl business-and-economy "e.crashlytics.com/"

Copy and paste the log file in the text box and hit continue.


Once you hit Go! you will be presented with a bunch of different grok options and this is where the fun starts. For the first match we know we are looking at the syslog date time so the match will be something like:


Logstash is going to handle all the date/time stuff for us since the timestamp is determined to be compliant with ISO8601. Once we select TIMESTAMP_ISO8601 we will scroll back to the top of the page and hit Continue. When the page refreshes you will see the constructed regex for the grok match filter and below how much of the syslog line we have matched and what is remaining to be matched.


At this point it’s a game of iteration.


We will actually define the “spaces” in the syslog as well. This is helpful for the grok constructor tool but as you will see in a moment we will be ditching that space business and keeping the good stuff. When all the matching is done in the tool you should see something like this:


We are almost there. There is a slight manual transformation that we need to make. One we need to cull the match filter of the spaces, and we also need to give these matches names. We assign names by adding a : then a variable name. So {IPV4} becomes {IPV4:SourceAddress}.



%{TIMESTAMP_ISO8601} %{IPV4:firewallIP} %{HOSTNAME:firewall} %{IPV4:sourceIP} %{IPV4:destinationIP} %{NOTSPACE:application} %{NOTSPACE:category} "%{URIHOST:URIHost}%{URIPATH:URIPath}"

In total what we add to the filter is:

  if [type] == "url" {
    grok {
      #strips timestamp and host off of the front of the syslog message leaving the raw message generated by the syslog client and saves it as "raw_message"
      #patterns_dir => "/opt/logstash/patterns"
      match => { "message" => '%{TIMESTAMP_ISO8601} %{IPV4:firewallIP} %{HOSTNAME:firewall} %{IPV4:sourceIP} %{IPV4:destinationIP} %{NOTSPACE:application} %{NOTSPACE:category} "%{URIHOST:URIHost}%{URIPATH:URIPath}"'  }

    date {
      timezone => "America/Chicago"
      match => [ "GenerateTime", "YYYY/MM/dd HH:mm:ss" ]
    mutate {
        remove_field => ["message"]

A complete copy of the pan-traffic.conf file that you will need can be found at: https://onedrive.live.com/redir?resid=6595236A4C9AD71B!32560&authkey=!AHYSb4DufO-2aYk&ithint=folder%2cconf

Update Elasticsearch Index Mapping

Same exercise as before… We need to add “index”:”not_analyzed” to the Category and Application fields. The reason we add this to the mapping is so Elasticsearch won’t delimit strings by the “-“. Feel free to try it out without updating the mapping and you will see what I’m talking about. To update the mapping perform the following steps:

sudo service logstash stop
curl -X DELETE 'http://localhost:9200/pan-url'
You should get some sort of acknowledge true or a 404 error. Those typically mean you're good to go. With these two items done we will push the new mapping in.
curl -g -X PUT 'http://localhost:9200/pan-url' -d '{"mappings":{"url":{"properties":{"@timestamp":{"type":"date","format":"strict_date_optional_time||epoch_millis"},"@version":{"type":"string"},"URIHost":{"type":"string"},"URIPath":{"type":"string"},"application":{"type":"string","index":"not_analyzed"},"category":{"type":"string","index":"not_analyzed"},"destinationIP":{"type":"string"},"firewall":{"type":"string"},"firewallIP":{"type":"string"},"host":{"type":"string"},"path":{"type":"string"},"port":{"type":"string"},"sourceIP":{"type":"string"},"tags":{"type":"string"},"type":{"type":"string"}}}}'

Assuming no errors start back up Logstash.

sudo service logstash start

Wrap Up

With the traffic logs and url logs it’s possible to make some really cool dashboards. I hope these have helped. If you have any questions shoot me an email at anderikistan@gmail.com.
You can find me on LinkedIn at: https://www.linkedin.com/in/iankanderson
or on Twitter @Anderikistan

ELK + Palo Alto Networks Part 2 (URL and Custom Logs)

Implications of Modern Malware


The owners of the ransomware noticed similar machines reporting in, and after basic footprinting realized the victim was a hospital. Knowing the hospital would face regulatory fines from losing patient information, the ransom was raised.

Randy is dead on with his assessment here. The next generation of ransomware will gain situational awareness about the environment it is in. Attackers know that organizations are far more likely to just pay the ransom than most, and they are willing to spend more to restore their data.

Implications of Modern Malware