Yesterday when I was trying to create a quality system for the upcoming release of our tool, Innbound, I accidentally messed up the data in production system!
I didn’t know what to do and started learning about the MySQL recovery options.
I want to note that it happened at 17:02, since I am gonna refer that time in the blog post.
Here is the deal:
- MySQL logs every update/insert/delete/create/drop etc. modification statements into files called “binary logs”
- We had the back up from the morning
So, what I needed to do was
- Stop the application which accesses the MySQL database immediately
- Note the time of the accident (17:02)
- Load the back up from the morning (11:00)
- Get the modification statements for 11:00 – 17:02 from binary logs
- Apply the modification scripts
Binary logs are stored in the folder defined in the my.cnf file:
And here they are:
Only the files marked with
<< was relevant for me since I had the back up from 11:00.
To learn exact back up moment, we first checked the file produced by
mysqlbinlog a date range around the back up moment:
With the output of this command, you can determine when the back up is executed. There will be some gap of a couple of seconds at least. From that file, I learnt the back up happened at 11:02:35.
I executed following to get the scripts executed between 11:02:30 and 17:02:
Please note I didn’t specify exactly 11:02:35 as the
start-datetime, since there can be a lot of statements in a second.
So, I had to modify the file manually afterwards to remove everything that is before the back up moment.
Then the remaining is easy.
Just loaded the dump from morning into database:
and then executed the modification statements:
You can use the following bash script to get the row counts of tables and then compare the corrupt database with the database after recovery applied:
NOTE: My recovery was more painful than this. I also wanted to do the operations in my local environment and because of max-packet-size and encoding problems in my laptop, I wasn’t successful. Because of those problems, doing these stuff took me 6 hours. Next time something like this happens, I would definitely do it on a machine which is identical to the production server.