Splunk Spl Cheat Sheet



From Splunk Wiki

The Power of Splunk Search. At the core of the Splunk platform is SPL™, Splunk’s Search Processing Language. SPL is a language with immense capability that’s easy to learn. It gives you the power to ask ANY question of ANY machine data. Here is a cheat sheet to help you remember. Architecture/ Deployment. Dedicated Deployment Server and license Master - If your install grows beyond just a single Splunk instance (talking indexers & search heads here, not forwarders), set up a separate server to be a license manager & Deployment server.

Jump to: navigation, search

You've been there before, having spent hours getting your Splunk set up just right. Then you hear in #splunk an idea that would have made your job much easier, had you only known it then. All of these tidbits of knowledge could make a pretty nice .conf presentation one day, so share them here.

Note: The contents of this page have been contributed by Splunk Luminaries and Customers Who Inhabit the #splunk IRC Channel on EFNET. Feel free to join us there for further discussion. :)

Testing and troubleshooting

  • Configure the Monitoring Console (the artist formerly known as 'DMC', now just 'MC'): https://docs.splunk.com/Documentation/Splunk/7.1.0/DMC/DMCoverview
  • Use btool to troubleshoot config - Splunk has a large configuration surface, with many opportunities to set the same configuration in different locations. The splunk command-line utility 'btool' is incredibly helpful. http://docs.splunk.com/Documentation/Splunk/latest/Troubleshooting/Usebtooltotroubleshootconfigurations . It gives you a merged view of all the configuration files.
  • Make an index for test data that you can delete anytime you need to. Any new datatypes you haven't imported previously should be 'smoke-tested' here. Date extractions and event breaking are important to get right and a new data type that isn't handled properly can cause big issues if not caught till later.
  • Setup more than one index think will one set of users need access but not another? Will I want to retain data longer or shorter than other data? Could a run away process rapidly exceed my tolerance for storage.

Splunk Search Command Cheat Sheet

  • Make a test environment. Never test index-time changes in the production environment. A VM is fine. DO IT. NOW!

Data input and indexing

  • In line with the above, for each input *ALWAYS* declare sourcetype and index. Splunk may be very good at detecting timestamps of various formats, but don't leave your sourcetype naming to chance.
    • For monitored directories of mixed sourcetypes, still declare the sourcetypes explicitly, just by file pattern in props.conf
  • For any applications that write their logs with a datestamp (IIS, Tomcat's localhost_access_log, etc), create a transforms entry to strip the datestamp from the source name. This will make it easier on the user, not needing to wildcard source names, as well as not creating endless numbers of unique sources. Splunk likes to search source=/my/path/to/localhost_access_log far more than source=/my/path/to/localhost_access*
    • However, don't go overboard in folding away information for source. If you have to troubleshoot the data path, you'll want to be able to have some idea which actual files the data comes from. Consider copying the original source to an indexed field.
  • Syslog - If you plan to receive syslog messages via tcp or udp, resist the urge to have Splunk listen for it. You'll invariably need to restart Splunk for various config changes you make, while a seperate rsyslog or syslog-ng daemon will simply hum along continuing to receive data while you're applying Splunk changes.
  • Scripted inputs - If you have a scripted input that grabs a lot of data (think DBX, or OPSEC LEA), careful about where you add it. If you have that input running on an indexer, all of the events it generates will reside on only that indexer. This means you won't be taking advantage of distributed Splunk searching.
  • The events are parsed on the first instance that can parse, so usually the indexers or the heavy-forwarders. So you want to put your props/transforms for the index time parsing on them. Exception, since splunk 6.*, the structured data (csv, iis, json) are parsed on the forwarder (see INDEXED_EXTRACTIONS).
  • If you are using a volume configuration for hot and cold, make sure that those volumes point to locations outside of $SPLUNK_DB or you will most likely end up with warnings about overlapping configurations.
  • When you are onboarding new data, make sure that you (or that the TA you are using) is configured to properly extract time stamps (TIME_PREFIX, MAX_TIMESTAMP_LOOKAHEAD, TIME_FORMAT), and that you are using a LINE_BREAKER and disabling line merging (SHOULD_LINEMERGE = false). These settings have significant performance implications (see https://conf.splunk.com/files/2016/slides/observations-and-recommendations-on-splunk-performance.pdf and https://conf.splunk.com/files/2016/slides/worst-practices-and-how-to-fix-them.pdf). Here is a cheat sheet to help you remember.
Splunk search tips cheat sheet

Splunk Search Commands

Architecture/ Deployment

  • Dedicated Deployment Server and license Master - If your install grows beyond just a single Splunk instance (talking indexers & search heads here, not forwarders), set up a separate server to be a license manager & Deployment server. A VM is fine for these purposes, and can actually be a benefit to be on a VM. You can even use the tarball install to put these multiple instances on the same box/VM. Say one in /opt/splunk, one in /opt/deploymentserver, and one in /opt/licenseserver. Run them on different ports of course.
  • Indexers WILL need more than 1024 file descriptors, which is the default setting on most Linux distros. Remember, each connected forwarder needs one descriptor. And each index will need as many as 8 or more descriptors. You can quickly run out, so go ahead and set it higher - 10,240 minimum, and perhaps as many as 65536. You can use the SOS (Splunk On Splunk) 'Splunk File Descriptor Usage' view to track file descriptor usage over time and prevent an outage from running out.


  • When upgrading your deployment servers to 6.0, be aware that the serverclass.conf setting machineTypes was deprecated in 5.0, and is removed in 6.0. You should now be using machineTypesFilter .
  • If running Splunk indexers or search heads on RHEL6 or its derivatives CentOS6 / OEL6 - turn off Transparent Huge Pages (THP). This has a profound impact on system load. There's a decent Oracle Blog about it at https://blogs.oracle.com/linux/entry/performance_issues_with_transparent_huge. (Thanks Baconesq!) Update 2014-01-21 -- Splunk 5.0.7 includes a check for THPs and should warn you about them (Splunk BugID SPL-75912)
    • In Oracle Linux version 6.5 Transparent HugePages have been removed.
  • Use DNS CNAMEs for things like your deployment server and your license master. That way, if you need to move it later it's not entirely painful.
  • Use apps for configuration, especially of forwarders. Even if you aren't starting out with deployment server, putting all of your configuration in /opt/splunkforwarder/etc/system/local will make it much harder when you do start to use it. Put that config in /opt/splunkforwarder/etc/apps/myinputs instead, and then you're in a better position later if you do decide to push that app.
  • Along with the above, put your deploymentclient.conf information into its own app. When installing a forwarder, drop your 'deploymentclient' app into $SPLUNK_HOME/etc/apps to bootstrap your deployment server connection. Then, push a new copy of that app out via deployment server. Now, if you need to split your deployment server clients in half you can do it using deployment server.
  • For forwarders, put outputs.conf and any corresponding certs in an app. Makes life easier when the time comes to renew the certs, or when you add extra indexers. In fact when installing a forwarder, the only config it needs is the instruction on where to find the rest of the configs. See above.
  • Remember that REST API that you had all the ideas for hooking it up to other systems? Why not use it for troubleshooting? Create a troubleshooter user on your forwarders, and you can grab information about them. If you want to 'outsource' support of forwarders to other personnel within your organization, they can use this method to start diagnosing issues. Here's an example of what the TailingProcessors sees for files: curl https://myhost:8089/services/admin/inputstatus/TailingProcessor:FileStatus -k -u 'troubleshooter:mypassword'
  • When installing lots of splunk instances, you can use user-seed.conf to set the default password for the admin account to whatever you need it to be. The file needs to exist prior to starting Splunk for the first time, and will be deleted by Splunk after it is used. In fact, it would be easy enough to also have user-seed.conf include your troubleshooter user from up above as well.
  • Filesystem / Disk layouts - Use Logical Volume Manager (LVM) on linux. At a minimum, make two LVs - one for /opt/splunk and one for /opt/splunk/var/lib/splunk. With this approach, you can use LVM snapshots for backups while Splunk remains running. Also, snapshots before software upgrades can make rollbacks (if necessary) easier.

Searches / UI

  • Subsearching/using lookup tables are great! Be wary of result limits of using both, however, especially when dashboarding.
  • Find a cool visual (chart/dashboard/form) from one of the apps and want to learn how to do them? Poke around the XML to see how the data is parsed/filtered/transformed. Gives you great ideas to customize for your own purposes.
  • When developing searches and dashboards, if you change around your XML or certain .conf files directly, you don't always have to restart Splunk for it to take effect. Often enough you can just hit the /debug/refresh URL and that'll reload any changed XML and some .conf files.
Cheat
  • If you have a database background - ensure that you read up on how stats works. Really. Otherwise, your searches will contain more joins than are needed. Also, read Splunk for SQL users
  • Splunk's DB Connect (aka DBX) App (http://apps.splunk.com/app/958/) is wonderful for interacting with databases. You'll have much better (faster) luck running scheduled DB queries and writing the results to a lookup table using '| outputlookup' that Splunk references rather than doing 'live' lookups from all your search heads to your databases. Splunk handles large lookup tables elegantly, don't worry about it. (h/t to Baconesq and Duckfez)
  • Be as specific as possible when creating searches. Specify indexes, sourcetype, etc...if you know them.
  • Since it so easy to build and extend a dashboard in Splunk, it is very tempting to put LOTS of searches/panels on a single dashboard. But you should note that most users won't scroll down to see the cool stuff at the bottom and thus you are wasting server capacity running searches that users will likely not see. Consider breaking the dashboards into smaller ones that focus on either different parts of, uses of, layers of, etc and linking them together with creative use of click-throughs or simple HTML links.

Other References

  • Aplura (which is a Splunk partner) has also made their Best Practices document available Aplura's Splunk Best Practices
Retrieved from 'https://wiki.splunk.com/index.php?title=Things_I_wish_I_knew_then&oldid=58332'

< Back |Home| Next >

Splunk Search Query Cheat Sheet

Splunk regex tutorial | field extraction using regex

Regular expressions are extremely useful in extracting information from text such as code, log files, spreadsheets, or even documents.Regular expressions or regex is a specialized language for defining pattern matching rules .Regular expressions match patterns of characters in text. They have their own grammar and syntax rules.splunk uses regex for identifying interesting fields in logs like username,credit card number,ip address etc.By default splunk automatically extracts interesting fields and display them at left column is search result -only condition is log must contain key value pairs which means logs should contains field name and its value - like for username it should appear in log like usename=x or user:x.Extracted fields can be used later for sorting data,making specialized reports,creating valueable dashboards.But if logs do not contain field name in key value pair- like username or other fileds appears in log at random place then splunk will not detect the username automatically.In this condition regex comes for your help.You have to teach splunk to extract the field using regex.
So basically regex used to identify fields and list them in proper manner which later can be used for reporting,sorting and dashboard.Below image will help you in understanding the scenario
so the next question popping in your mind is should i need to lean regex for using splunk..then answer is it depends.For using and operating splunk you do not need to learn regex in detail - basic knowledge will be ok.But if you want to become a skilled splunk admin then learning regex is necessary.
Why Regex?

Regex is helpful in transforming your horrible looking machine logs into beautiful human understandable reports and dashboard -Easy to understand and use.Shown as below.

How to regex?

Splunk automatically identifies any fields that match its key/value pair intelligence, which can be found to the left of the search results as below. This can often allow you to start putting together useful data visualizations right out of the box.In below screenshot splunk has automatically extracted host,timestamp etc values.We can use these values for reporting,statistical analysis and creating dashboards.Splunk has inbuilt regex extractor called IFX (Interactive field extractor).By using IFX splunk autodetects useful fields and list them at left side.Splunk IFX can extract fields automatically which are in standard key value pair format i.e. key=value format like username=john etc.But if logs are not in key value pair format then you have to teach splunk to extract fields which you wan using regex.

We’re going to extract data that Splunk doesn’t recognize right away. There are a few of ways to do this, including using Splunk’s Interactive Field Extractor (IFX), or you can write your own regex (which I prefer)
How to extract fields using regex?

  • Good regex sites to help with Splunk

Splunk Spl Commands Cheat Sheet

  • https://regex101.com/ - Great for general regex stuff and capture groups.
  • http://www.regexe.com/ - Great for dealing with capture groups in the way that Splunk likes them for anonymising data.
  • http://regexr.com/ - Classic website for quick PoC regexs.

Below are few commonly used regex notations while using extracting keywords using regex manually:
Spl

Splunk Regex Cheat Sheet


Splunk Query Language Cheat Sheet

Use of special notations in regex:
Regex usage example