Last Reviewed and Updated on June 6, 2024
Introduction
Bad bots can wreak havoc on your website in many ways, from scraping your content to launching brute-force attacks. Protecting your site from malicious bot traffic is essential for maintaining its performance, security, and integrity. While there are many third-party tools and services available for bot protection, this guide demonstrates how to detect and block bad bots on cPanel-managed servers using built-in tools like Apache’s .htaccess
, ModSecurity, and server-level firewall configurations. This method gives you complete control over the process, without relying on external plugins.
1. Identifying Bad Bots in Apache Logs
The first step in blocking bad bots is identifying them in your server logs. cPanel stores Apache logs in a specific directory for each domain, typically under /etc/apache2/logs/domlogs/
. You can analyze these logs for suspicious patterns that might indicate bot activity.
What to Look For
- Suspicious User Agents: Bots often use specific user agents or none at all.
- High-Frequency Requests: Bots often send repeated requests to a website faster than a human user could.
- Known Bad Bot Signatures: Many bots use common signatures that can be blocked.
Example Commands to Parse Logs
You can use simple command-line tools like grep
and awk
to extract suspicious user agents. Here’s an example to search for “bot” in your Apache logs:
grep -i "bot" /etc/apache2/logs/domlogs/yourdomain.com | awk '{print $1}' | sort | uniq -c | sort -nr | head
This command returns the top IPs with the word “bot” in their request. You can also use tools like GoAccess to parse logs more efficiently.
2. Blocking Bad Bots with .htaccess
Once you’ve identified bad bots from your logs, you can block them at the Apache level using .htaccess
files. .htaccess
allows you to define rules that control how Apache handles incoming requests based on certain conditions, such as user agent or IP.
Basic Deny Rules
To block bad bots using .htaccess
, add conditions with the RewriteCond
and RewriteRule
directives. For example:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(crawler|scrapy|badbot).*$ [NC]
RewriteRule .* - [F,L]
This rule matches user agents that contain “crawler”, “scrapy”, or “badbot”, and returns a 403 Forbidden status for those requests.
Considerations
- Performance Impact:
.htaccess
rules are read on every request, so extensive use of these rules could degrade server performance. - Not Suitable for High Traffic: While useful for smaller sites, large sites may require a more efficient bot-blocking method, such as ModSecurity or firewall rules.
3. Using ModSecurity for Bot Blocking
ModSecurity is a web application firewall (WAF) that works alongside Apache to block malicious traffic. cPanel typically includes ModSecurity, and it’s highly effective at filtering out unwanted bots.
Writing Custom ModSecurity Rules
You can create custom ModSecurity rules to block specific patterns such as suspicious user agents, empty user agents, or high-frequency requests. Here’s an example of blocking requests with an empty user-agent:
SecRule REQUEST_HEADERS:User-Agent "@rx ^$" "id:123456,phase:1,deny,status:403,msg:'Empty UA blocked'"
Where to Configure
- You can manage ModSecurity rules in cPanel via Security >> ModSecurity.
- For custom rules, you’ll need to modify
/etc/modsec/modsec2.conf
or add custom rules to a specific site’s.htaccess
.
4. Blocking at the Firewall Level (CSF / iptables)
You can also block bad bots at the server firewall level. Many cPanel servers use ConfigServer Security & Firewall (CSF), which allows you to manage access to your server through a simple interface. Alternatively, you can use iptables
for more fine-tuned control.
Blocking via CSF
To block a known bot IP via CSF, you can use the following command:
csf -d 192.0.2.123 # Example bad bot IP
This command adds the IP to your firewall’s deny list. You can also edit the csf.deny
file directly for bulk IP blocks.
Blocking via iptables
For advanced users, you can block IPs manually using iptables
:
iptables -A INPUT -s 192.0.2.123 -j DROP
This rule drops all incoming traffic from the specified IP.
5. Creating a Maintenance Workflow
Once you’ve implemented the necessary bot-blocking rules, it’s important to set up a regular maintenance workflow to keep your server secure. Automated scripts can help monitor logs and block new bad bot IPs as they appear.
Automating Log Analysis
Use cron
jobs to regularly parse your logs and block bad bot IPs:
0 0 * * * /path/to/parse_logs.sh
The script (parse_logs.sh
) can search logs for suspicious patterns and add new bad bot IPs to your firewall deny list.
Updating and Maintaining Bot Lists
Periodically, you should update your list of bad bots. Keep an eye on common bot signatures and user agents to ensure that your rules remain effective.
6. Testing and Verification
After implementing your bot-blocking rules, it’s important to test whether they’re working as expected.
Simulating Bot Traffic
You can simulate bot traffic using tools like curl
or by changing the user agent in your browser’s developer tools to a known bad bot user agent. For example:
curl -A "BadBot" http://yourdomain.com
This request should be blocked if your rules are working correctly.
Verifying via Logs
Check your Apache logs for 403 errors, indicating that the bot was blocked. You can also check the modsec_audit.log
file for ModSecurity rules that were triggered.
7. Conclusion
In this guide, we’ve covered how to detect and block bad bots in cPanel using Apache’s .htaccess
, ModSecurity, and server-level firewalls. By leveraging these built-in tools, you can protect your website from bot attacks without relying on third-party tools. This approach provides more control and can be tailored to your site’s specific needs.
However, for large sites or those with significant bot traffic, using third-party tools like ServerGuardian may still be necessary for comprehensive protection. Keep your bot-blocking rules up to date, monitor your server logs regularly, and you’ll be well-equipped to safeguard your site against malicious bots.