Reindex Logs That Have Already Been Indexed by Splunk

What do you do when you want to reindex logs that have already been indexed by Splunk? Maybe you indexed the data into the whole wrong index, or you forgot to create the index before applying the inputs.conf. Whatever the reason, this article will explore how to reindex logs that have already been indexed by Splunk.

😀 Note: There are other ways to reindex data in Splunk. In this post, we are looking at the fishbucket method. Using a crcSalt is also another viable option. Take a look at some of my other posts to get details around that. (You can use the “Search” tab to find it easier).

Fix the Problem

There might be a wide range of reasons why you want to reindex logs. Some of the common reasons are due to a mistake you already made. Before we start to reindex, make sure you fix anything you need to fix. So, go ahead and make that missing index (Do you have a Last Chance Index setup? 🤔), carefully run the delete command, or do whatever else you need to do before we reindex the data.

In this example, let’s say that I created a monitor for the [monitor:///var/log], but forgot to make the linux index. The data went into the main index instead. I want to reindex the data into the linux index.

I went ahead and deleted the data from the main index and created the linux index.

Gone Fishing

Splunk uses the fishbucket directory to keep track of what data has been indexed. This bucket lives directly on the forwarder. Instead of deleting the entire fishbucket directory, we are going to reset specific files.

This process is pretty simple. In this example, I’m going to reset the fishbucket for the /var/log/syslog file.

I’ll start by SSHing into the forwarder and stopping the Splunk service.

/opt/splunk/bin/splunk stop

Now, let’s clear the fishbucket for the /var/log/syslog file. Make sure that the paths to Splunk and to the fishbucket are correct for your environment.

/opt/splunk/bin/splunk cmd btprobe -d /opt/splunk/var/lib/splunk/fishbucket/splunk_private_db --file /var/log/syslog --reset

Now, start the Splunk service again.

/opt/splunk/bin/splunk start

Once the forwarder is back up, it will start reindexing the data. You should now see the data in the correct index.

Casting a Wider Net

What if you want to clear the fishbucket for multiple files? We can use some bash scripting to make this easier. First, let’s use the find command to get a list of files we want to reset. In this example, I want to reset the fishbucket for all files in the /var/log directory, so my find command looks like this:

find /var/log -type f

You can filter the list using grep or other options that find provides. Once you have the list of files you want to reset, we will loop through the list and reset the fishbucket for each file using the xargs command. The xargs command can be super powerful, so before we run the reset, let’s run a echo of the files and confirm that we are getting the correct list.

find /var/log -type f | xargs -I % sh -c 'echo %'

If you are happy with the list, you can now reset the fishbucket for each file.

📝 Note: Stop Splunk before you run the fishbucket reset.

find /var/log -type f | xargs -I % sh -c '/opt/splunk/bin/splunk cmd btprobe -d /opt/splunk/var/lib/splunk/fishbucket/splunk_private_db --file % --reset'

You should see a stream of messages indicating that the fishbucket has been reset for each applicable file. Once you start Splunk again, the data will be reindexed.

Conclusion

While reindexing data can be challenging, it’s certainly achievable. The fishbucket method is a great way to reindex data that has already been indexed by Splunk. Just remember to stop Splunk before you reset the fishbucket and start it again once you are done. If you have a lot of files to reset, you can use the find and xargs commands to make the process easier.

Happy Fishing! 🎣