中文版:2023/1/21博客受攻击宕机事件分析与复盘 – Frank’s Weblog
On 1/21/2023, my blog was attacked and went down for 4 hours. This article will cover what the incident was like, the root cause analysis and improvements.
On that day, I woke up in the noon and saw the alert email from UptimeRobot. Sometimes a network or server glitch can trigger an alert as well, but it have been 2 hours since alert triggered, so apparently that’s not the case. I found I was not able to connect to the website, while sometimes I could connect but got 504.
I ssh-ed to the server and restarted all the Docker containers, but the problem persists. top
showed that all the load average were 6.xx and most of the CPU usage were from php-fpm. I checked the graphs in nginx amplify and found that nginx have received large amount of requests during past few hours.
I planned to go grocery shopping for the lunar new year dinner with my girlfriend, so I didn’t want to spend too much time on this. I simply turned on the Cloudflare reverse proxy(orange cloud icon) and “Under attack” mode and left home.
After a while I received the alert clear email from UptimeRobot and website was back online.
Over last few years I’ve implemented a set of monitoring and security measures for my site and automated scripts to mitigate common issues.
- UptimeRobot for monitoring downtime. I’ll receive alerts if the website cannot be reached or returned HTTP status that indicates a malfunctioning(5xx).
- nginx amplify for monitoring nginx and OS metrics. I’ll receive alerts if some metrics(eg. disk usage, requests per second) goes over the threshold.
- If requests per second goes over the threshold, it will automatically turn on Cloudflare proxy and increase security level.
- WordPress security plugin automatically blocks malicious requests.
Benefit from these measures, my site have maintained a uptime of nearly 100%. Being a blog that only have 2 digits of visitors everyday, 4 hour downtime is nothing to worry about. But my professtional habit have been wondering what happened behind the incident, especially why these measures failed to prevent the incident from happening.
(more…)