Detecting Typosquatting with Splunk and the URL Toolbox App

Typosquatting is a common attack vector that is often overlooked. It involves the use of domain names that are similar to legitimate domain names, but with slight differences. This can be used to trick users into visiting malicious websites, or to steal sensitive information. In this post, we will explore how to detect typosquatting using Splunk and the URL Toolbox app.

Prerequisites

Before we get started, you will need to have installed the URL Toolbox app for Splunk. You can find it on Splunkbase, or install it directly from the Splunk web interface. You will also need access to a Splunk instance with the necessary permissions to install apps.

You can find the URL Toolbox app on Splunkbase here: https://splunkbase.splunk.com/app/2734/

Detecting Typosquatting

Inorder to detect typosquatting, we need a way to compare domain names and identify those that are similar. One way to do this is by calculating the Levenshtein distance between domain names. The Levenshtein distance is a measure of the similarity between two strings.

For example, the Levenshtein distance between “example.com” and “exampIe.com” is 1, because only one character needs to be changed to transform one into the other. Similarly, the Levenshtein distance between “example.com” and “examp1e.com” is also 1.

Using the URL Toolbox App

The URL Toolbox app for Splunk provides a macro called ut_levenshtein. This macro takes two inputs and calculates the Levenshtein distance between them. We can use this macro to compare domain names and identify typosquatting attempts.

Here is an example search that uses the ut_levenshtein macro to compare domain names:

index=your_index
| eval our_domain="example.com"
| `ut_levenshtein(our_domain, domain_name)`
| where ut_levenshtein<=2
| table _time, our_domain, domain_name, ut_levenshtein

In this search, we first define our legitimate domain name as a variable called our_domain. We then use the ut_levenshtein macro to calculate the Levenshtein distance between our domain name and the domain names in our index. We filter the results to only show domain names with a Levenshtein distance of 2 or less, as these are likely to be typosquatting attempts.

Here is another example you can use if you don’t have data to test with:

| makeresults 
| eval domain_name="exaample.com;examplle.com;exampple.com;examplee.com;examp1e.com", our_domain="example.com"
| makemv delim=";" domain_name
| mvexpand domain_name
| `ut_levenshtein(our_domain, domain_name)`
| where ut_levenshtein<=2
| table _time, our_domain, domain_name, ut_levenshtein

In this example, we use the makeresults command to generate some test data. We define a list of domain names with intentional typos, and use the ut_levenshtein macro to calculate the Levenshtein distance between each domain name and our legitimate domain name. We then filter the results to only show domain names with a Levenshtein distance of 2 or less. Here is the output of the search:

_timeour_domaindomain_nameut_levenshtein
2024-02-21 13:05:30example.comexaample.com1
2024-02-21 13:05:30example.comexamplle.com1
2024-02-21 13:05:30example.comexampple.com1
2024-02-21 13:05:30example.comexamplee.com1
2024-02-21 13:05:30example.comexamp1e.com1

Conclusion

By using the URL Toolbox app for Splunk and the ut_levenshtein macro, we can easily detect typosquatting attempts. This can help us identify potential security threats and take appropriate action to protect our systems and data. By monitoring domain names and calculating their Levenshtein distance from legitimate domain names, we can stay one step ahead of attackers and keep our systems secure.