Reindex Logs That Have Already Been Indexed by Splunk
What do you do when you want to reindex logs that have already been indexed by Splunk? Maybe you indexed the data into the whole wrong index, or you forgot to create the index before applying the inputs.conf
. Whatever the reason, this article will explore how to reindex logs that have already been indexed by Splunk.
😀 Note: There are other ways to reindex data in Splunk. In this post, we are looking at the
fishbucket
method. Using acrcSalt
is also another viable option. Take a look at some of my other posts to get details around that. (You can use the “Search” tab to find it easier).
Fix the Problem
There might be a wide range of reasons why you want to reindex logs. Some of the common reasons are due to a mistake you already made. Before we start to reindex, make sure you fix anything you need to fix. So, go ahead and make that missing index (Do you have a Last Chance Index setup? 🤔), carefully run the delete
command, or do whatever else you need to do before we reindex the data.
In this example, let’s say that I created a monitor for the [monitor:///var/log]
, but forgot to make the linux
index. The data went into the main
index instead. I want to reindex the data into the linux
index.
I went ahead and delete
d the data from the main
index and created the linux
index.
Gone Fishing
Splunk uses the fishbucket
directory to keep track of what data has been indexed. This bucket lives directly on the forwarder. Instead of deleting the entire fishbucket
directory, we are going to reset specific files.
This process is pretty simple. In this example, I’m going to reset the fishbucket
for the /var/log/syslog
file.
I’ll start by SSHing into the forwarder and stopping the Splunk service.
/opt/splunk/bin/splunk stop
Now, let’s clear the fishbucket
for the /var/log/syslog
file. Make sure that the paths to Splunk and to the fishbucket
are correct for your environment.
/opt/splunk/bin/splunk cmd btprobe -d /opt/splunk/var/lib/splunk/fishbucket/splunk_private_db --file /var/log/syslog --reset
Now, start the Splunk service again.
/opt/splunk/bin/splunk start
Once the forwarder is back up, it will start reindexing the data. You should now see the data in the correct index.
Casting a Wider Net
What if you want to clear the fishbucket
for multiple files? We can use some bash scripting to make this easier. First, let’s use the find
command to get a list of files we want to reset. In this example, I want to reset the fishbucket
for all files in the /var/log
directory, so my find
command looks like this:
find /var/log -type f
You can filter the list using grep
or other options that find
provides. Once you have the list of files you want to reset, we will loop through the list and reset the fishbucket
for each file using the xargs
command. The xargs
command can be super powerful, so before we run the reset, let’s run a echo of the files and confirm that we are getting the correct list.
find /var/log -type f | xargs -I % sh -c 'echo %'
If you are happy with the list, you can now reset the fishbucket
for each file.
📝 Note: Stop Splunk before you run the
fishbucket
reset.
find /var/log -type f | xargs -I % sh -c '/opt/splunk/bin/splunk cmd btprobe -d /opt/splunk/var/lib/splunk/fishbucket/splunk_private_db --file % --reset'
You should see a stream of messages indicating that the fishbucket
has been reset for each applicable file. Once you start Splunk again, the data will be reindexed.
Conclusion
While reindexing data can be challenging, it’s certainly achievable. The fishbucket
method is a great way to reindex data that has already been indexed by Splunk. Just remember to stop Splunk before you reset the fishbucket
and start it again once you are done. If you have a lot of files to reset, you can use the find
and xargs
commands to make the process easier.
Happy Fishing! 🎣