Detecting and Blocking Bad Bots in cPanel Without Third-Party Tools

Published on June 6, 2024, Last Reviewed and Updated on June 6, 2024

Introduction

Bad bots can wreak havoc on your website in many ways, from scraping your content to launching brute-force attacks. Protecting your site from malicious bot traffic is essential for maintaining its performance, security, and integrity. While there are many third-party tools and services available for bot protection, this guide demonstrates how to detect and block bad bots on cPanel-managed servers using built-in tools like Apache’s .htaccess, ModSecurity, and server-level firewall configurations. This method gives you complete control over the process, without relying on external plugins.

1. Identifying Bad Bots in Apache Logs

The first step in blocking bad bots is identifying them in your server logs. cPanel stores Apache logs in a specific directory for each domain, typically under /etc/apache2/logs/domlogs/. You can analyze these logs for suspicious patterns that might indicate bot activity.

What to Look For

Suspicious User Agents: Bots often use specific user agents or none at all.
High-Frequency Requests: Bots often send repeated requests to a website faster than a human user could.
Known Bad Bot Signatures: Many bots use common signatures that can be blocked.

Example Commands to Parse Logs

You can use simple command-line tools like grep and awk to extract suspicious user agents. Here’s an example to search for “bot” in your Apache logs:

grep -i "bot" /etc/apache2/logs/domlogs/yourdomain.com | awk '{print $1}' | sort | uniq -c | sort -nr | head

This command returns the top IPs with the word “bot” in their request. You can also use tools like GoAccess to parse logs more efficiently.

2. Blocking Bad Bots with `.htaccess`

Once you’ve identified bad bots from your logs, you can block them at the Apache level using .htaccess files. .htaccess allows you to define rules that control how Apache handles incoming requests based on certain conditions, such as user agent or IP.

Basic Deny Rules

To block bad bots using .htaccess, add conditions with the RewriteCond and RewriteRule directives. For example:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(crawler|scrapy|badbot).*$ [NC]
RewriteRule .* - [F,L]

This rule matches user agents that contain “crawler”, “scrapy”, or “badbot”, and returns a 403 Forbidden status for those requests.

Considerations

Performance Impact: .htaccess rules are read on every request, so extensive use of these rules could degrade server performance.
Not Suitable for High Traffic: While useful for smaller sites, large sites may require a more efficient bot-blocking method, such as ModSecurity or firewall rules.

3. Using ModSecurity for Bot Blocking

ModSecurity is a web application firewall (WAF) that works alongside Apache to block malicious traffic. cPanel typically includes ModSecurity, and it’s highly effective at filtering out unwanted bots.

Writing Custom ModSecurity Rules

You can create custom ModSecurity rules to block specific patterns such as suspicious user agents, empty user agents, or high-frequency requests. Here’s an example of blocking requests with an empty user-agent:

SecRule REQUEST_HEADERS:User-Agent "@rx ^$" "id:123456,phase:1,deny,status:403,msg:'Empty UA blocked'"

Where to Configure

You can manage ModSecurity rules in cPanel via Security >> ModSecurity.
For custom rules, you’ll need to modify /etc/modsec/modsec2.conf or add custom rules to a specific site’s .htaccess.

4. Blocking at the Firewall Level (CSF / iptables)

You can also block bad bots at the server firewall level. Many cPanel servers use ConfigServer Security & Firewall (CSF), which allows you to manage access to your server through a simple interface. Alternatively, you can use iptables for more fine-tuned control.

Blocking via CSF

To block a known bot IP via CSF, you can use the following command:

csf -d 192.0.2.123 # Example bad bot IP

This command adds the IP to your firewall’s deny list. You can also edit the csf.deny file directly for bulk IP blocks.

Blocking via iptables

For advanced users, you can block IPs manually using iptables:

iptables -A INPUT -s 192.0.2.123 -j DROP

This rule drops all incoming traffic from the specified IP.

5. Creating a Maintenance Workflow

Once you’ve implemented the necessary bot-blocking rules, it’s important to set up a regular maintenance workflow to keep your server secure. Automated scripts can help monitor logs and block new bad bot IPs as they appear.

Automating Log Analysis

Use cron jobs to regularly parse your logs and block bad bot IPs:

0 0 * * * /path/to/parse_logs.sh

The script (parse_logs.sh) can search logs for suspicious patterns and add new bad bot IPs to your firewall deny list.

Updating and Maintaining Bot Lists

Periodically, you should update your list of bad bots. Keep an eye on common bot signatures and user agents to ensure that your rules remain effective.

6. Testing and Verification

After implementing your bot-blocking rules, it’s important to test whether they’re working as expected.

Simulating Bot Traffic

You can simulate bot traffic using tools like curl or by changing the user agent in your browser’s developer tools to a known bad bot user agent. For example:

curl -A "BadBot" http://yourdomain.com

This request should be blocked if your rules are working correctly.

Verifying via Logs

Check your Apache logs for 403 errors, indicating that the bot was blocked. You can also check the modsec_audit.log file for ModSecurity rules that were triggered.

7. Conclusion

In this guide, we’ve covered how to detect and block bad bots in cPanel using Apache’s .htaccess, ModSecurity, and server-level firewalls. By leveraging these built-in tools, you can protect your website from bot attacks without relying on third-party tools. This approach provides more control and can be tailored to your site’s specific needs.

However, for large sites or those with significant bot traffic, using third-party tools like ServerGuardian may still be necessary for comprehensive protection. Keep your bot-blocking rules up to date, monitor your server logs regularly, and you’ll be well-equipped to safeguard your site against malicious bots.