Weirdest reason in a technical request email ever: Porn is not allowed in Moldova

Some time ago I was working on a porn search engine project -now with more than 10 thousands visits a day and growing- and I found myself writing the weirdest reason ever to ask for a domain change. The site was ; supposed to be “the porn doctor”. You would write your problem in the search box, and the doctor will give you some remedies. Everything was going good, traffic was going up and everything, until we received a notification from the TLD .md managers: Moldova does not allow porn websites.

We needed to move the project to a different domain: There was no choice.

I went to the email client and I sent an email to my hosting company, the weirdest reason ever.

I will need to change the domain name from to because porn is not allowed in Moldova.


Nicolás Andrade


“Porn is not allowed in Moldova”. Have you ever wrote a weirder reason than that?

Sleep command is not useless: good usages of sleep command

All developers thougt at least once -and many of them still thinking it from time to time- that something that has been built by somebody else is completely useless. Make a small review of your three or four days and think if you did not say to yourself “what is this useful for?” while looking to a function or a class. It did happen, right? Well, many of us did that with some really useful pieces of code. If you say that you’ve never though that the sleep command is useless you did not spend enough time in front of a command line. I though that some times, “why would somebody want to make a computer to do nothingfor a period of time?” So here I am to explain a list of useful uses of the sleep command, in Linux. This can be done in many other operative systems of course, but not with the same syntax. You’ll need to think on your own how to apply this kind of things on Windows, for example. So here we go with the most recent example I have to give. I was downloading a bunch of torrents when I decided to go to my girlfriend’s place. I did not want to leave Transmission running the whole weekend, and I did not want to tell my girlfriend “Sorry, need to go home to close the torrent client to avoid having an enormous internet bill”. So I simply executed this in the command line:

sleep 7200; killall transmission-gtk

The SEO words for this would be “how to close an application with delay”, “how to delay closing an application”, or “schedule application to be closed”. Another good use, as explained in the “drop table” post, is to make intense-looped operations a bit ligther. For example this could be a good approach to execute a big list of SQL files:

for FILE in `ls *.sql`; do
mysql -uuser -ppassword database < $FILE;
sleep 5;

The script will wait five seconds between the execution. However the sleep command can be even more useful. Let’s say, what if the “5 seconds” is actually changed by another process? So we’ll have two scripts running simultaneously. The first one will be

while true; do echo 5 > sleep.txt; sleep 30; echo 1 > sleep.txt; sleep 10; done

And the second one will be:

for FILE in `ls *.sql`; do
mysql -uuser -ppassword database < $FILE;
sleep `cat sleep.txt`;

With this approach you have the control on how to wait within the loop, and you can change the setting without killing and re executing the script. Nice, uh?

Datos para todos

Regla No. 2 – La regla del acceso garantizado

Cada ítem de datos debe ser lógicamente accesible al ejecutar una búsqueda que combine el nombre de la tabla, su clave primaria, y el nombre de la columna.

Esto significa que dado un nombre de tabla, dado el valor de la clave primaria, y dado el nombre de la columna requerida, deberá encontrarse uno y solamente un valor. Por esta razón la definición de claves primarias para todas las tablas es prácticamente obligatoria.

Regla No. 6 – La regla de la actualización de vistas

Todas las vistas que son teóricamente actualizables, deben ser actualizables por el sistema mismo.

La mayoría de las RDBMS permiten actualizar vistas simples, pero deshabilitan los intentos de actualizar vistas complejas.


Reglas de Codd

New ‘tax’: Internet Explorer 7 customers will be charged extra, an australian retailer; has decided to “tax” their customers using Internet Explorer 7 or lower.

Ruslan Kogan, chief executive officer of, explained that they need to find a way to cover the extra cost of designing their pages to look properly in a very old browser.

Every month the tax will be increased 0.1%,because IE7 is one month older.

More information at

“Anyone who is involved with the internet and web technology would know the amount of time that is wasted to support all these antiquated browsers,” Kogan said. “You have to make all these work-arounds all the time to make sure the site works properly on it.”


Mod_Redis for Apache

Thanks to Redis Google Group I’ve fallen in this link while browsing the web:

It’s an Apache module providing an interface to Redis servers. This means that you can access different URLs to set or get values from Redis, avoiding all the overhead PHP, the PHP Redis extension and probably your framework are putting in.

By using different HTTP methods (POST, PUT, DELETE, GET), you can set keys and retrieve keys in a very customizable way.

By default the module returns an XML representation of the resulting Redis response. For example configuring a RedisAlias to support the PING command:

(RedisAlias ^ping$ "PING")

the response would be:

<?xml version="1.0" encoding="UTF-8"?> <response> <status>PONG</status></response> 

This behavior can be altered by appending .json to the url, for example:

{ "status" : "PONG" } 

Alternatively, appending .jsonp and adding a query parameter ‘callback’ respond with:

callback({ "status" : "PONG" });

The Mod_Redis module can be useful to load different values in asynchronous requests after the page has been loaded, for example you can dynamically refresh the viewcounts in the player page every 30 seconds with an Ajax call that will not affect your webserver, or to refresh the number of comments a video had.

There’s also a Nginx module providing the same features in

How does Pornhub use Redis?

Since Eric Pickup’s talk in Confoo 2012, a great and informative session, I’ve seen lots of comments and postings in many blogs and forums about Redis and different sites, many of them owned by Manwin. Redis is a very interesting technology that opens doors to different and new features, capabilities and what is even more interesting to new ways of thinking web based applications. As Pornhub developer, that’s what I am, I will clarify how Redis is used in our end, and why.

The first feature we used Redis for is counting video plays. Every time a video is played, the hit is inserted in a big hash, by increasing the value in one. Every some hundreds of plays we store the actual value in our  MySQL, which is used later on for searches and statistical information. We also keep track of monthly and daily views, among as rating and votes, so visitors can sort videos by most viewed this day or most viewed this month as well.

We use Redis also for part of the social-network features, those that keep relation between users as “friends” or “blocked people”. Let’s say you block a user, we’ll store that relation in both Redis and MySQL. When reading the relations, we’ll try to use Redis, and if the information is not there we’ll fall into MySQL.

The implemenation I’ve proposed here is a bit tricky and takes the best of Redis and the best of MemCache in a nice in-house implementation.

We have one big sorted set named blocks:lastUpdate, having the user ID as score, and a timestamp as value. And for each user we have a sorted set named “user:[userId]:blocks”, containing the blocked user ID as value and the timestamp when it was blocked as score.

When we need to know who is blocked by a certain user we first read the “last update” value from the blocks:lastUpdate hash, and we add a Time To Leave value we’ve set in our configuration file.  If that value (lastUpdate + TTL) is lower than now, we consider the user:[userId]:blocks as expired, so we clear it and we reload from MySQL. If the key was not considered as expired we simply use it, avoiding thus an access to MySQL.

Why not use Redis’ SetExp to expire the key instead?

This is a good question, and caused some discussion within the team. The main reason is that when you set the expiration time you can not increase it or decrease it as you need. In case of data loss we can recover the Redis information from the MySQL, or the MySQL from the Redis information. For example now our TTL in the configuration file is one month (!). This means that we refresh the information stored in Redis once per month per user. However if for any reason I need to reduce drastically the access to DB, I can change the TTL setting to 3 or 4 months; and from that moment on I’m 100% sure that no access will be done to DB, because all keys will sill be valid.

In case of a MySQL crash and data loss, we can change the setting to 1 year, and write a small script that will read from Redis and then repopulate the concerning MySQL table.

When investigating this I’ve found an excellent Redis command, OBJECT. It returns information about the key you’re passing as parameter, and one of the subcommands you can use is IDLE. That subcommand returns since when the key has been idle, which is actually the last read or write. This could be useful for many cases, but it’s not in ours since we would need the time when the key has been created. Having that information in the Redis level could allow us to get rid of the blocks:lastUpdate set.

Many other features are using Redis, and many more will come in the next months, so stay tuned to get info about how Pornhub evolves with this interesting technology.

Recover millions of rows in production MySQL server from backup

That’s what we did during the last week. The first try was doneby the sysadmin, they tried to insert everything in one move. Imagine an INSERT statement with 20 million values, of course it will fail!

The second try took 2 days, using splitted files, but the sysadmin decided to restart from scratch because of a missing index. I would keep that unindexed table at least and try an alternative way to deliver results to the users as soon as possible. But the sysadmins are that, sys admins, and not business-admins, so the guy took the decision of simply drop the table, again. For the third massive-insert I had the experience from the other two massive insert; so I decided to define some rules to the sysadmin in charge of the import in order to do things in a different way.

The first thing was already known by both Troy and me, the 20 million rows should be splitted in many files and not in a single one as they tried to do once.  So we agreed on splitting the SQL in 19024 files of 1000 rows each one.

Here I’ll document different aspects you could consider if you find yourself in the same situation.


Preparing the table to receive the inserts

I’ve read many times that is convenient to insert the information before the indexes are created, to speed up the process. Then you should create the indexes in the table. This is true for inactive servers -this means staging servers or not-in-production servers- but it’s not good at all for live servers. Creating an index implies moving the whole table to a temporary one, creating the index itself, and then move back all the information to the indexed table. Of course the table is blocked during the whole process, so all your clients will be stuck: no inserts, no updates, no selects. The load in the server will go up to the skies, so probably not only that table will fail but the others too.

Even if the server is not active, is better to have at least the PRIMARY KEY / UNIQUE index created. The insert process will be slower, true. But you’ll be safe if for any reason you need to re-insert a file that hang in the middle. If you don’t have the UNIQUE index created you’ll end with duplicated rows, that will be more complicated to remove afterwards.

Having the UNIQUE index created, and using IGNORE option in the INSERT clause is the best way to be able to reparse a file in case you need to.


Avoid extended-inserts.

While extended inserts are good in terms of speed and in terms of dump file size, you can not continue the process from the middle in case it breaks. A quick example:

INSERT INTO table (a,b,c) VALUES (1,2,3), (4,5,6), (7,8,9);

INSERT INTO table (a,b,c) VALUES (1,2,3);
INSERT INTO table (a,b,c) VALUES (4,5,6);
INSERT INTO table (a,b,c) VALUES (7,8,9);

The first insert will be much faster than the three inserts in the second line, because MySQL will update the indexes only once at the end of the insert. But both blocks will do the same exact operation (assuming no duplicated keys)

The second line with the 3 statements will be slower because of the three writes and three updates; but in terms of safety is much better since if a statement fails due to an existing key, the others will still be valid.

Another option will be to use INSERT IGNORE to ignore errors while inserting. If you’re trying to insert an existing value in the index, and you are using IGNORE, the error will become a warning and the statement will be parsed anyway. This means that the whole insert will success, except the particular block that was repeated.

The advantage of using one INSERT per row is named split, and is a Linux command.

mysqldump (tablename, options, etc) --skip-extended-insert > dump.sql

At this moment we had a nice dump.sql file with 20 million INSERTs, one per line. So we can move to the next step.


Split the big file into many

By running the split command in a file with one insert per line you get multiple files that are still valid. The advantage of this is that you can parse a few of them while monitoring the server status -and all the slaves- and see how they are affected. Also you can stop the process for any reason you found, and then continue from the break point.

A good balance between 100% individual inserts and a single insert will all the values would be using a single insert per file, at the top, and creating extended inserts, one per file. An example of a 1-insert-per-row file could be:

INSERT INTO table (1,2,3);
INSERT INTO table (4,5,6);
INSERT INTO table (7,8,9);

INSERT INTO table (10,11,12);
INSERT INTO table (13,14,15);
INSERT INTO table (16,17,18);

A faster and still safe approach could be


INSERT IGNORE INTO table (1,2,3), (4,5,6), (7,8,9);

INSERT IGNORE INTO table (10,11,12), (13,14,15), (16,17,18);


The chance of starting from the last breakpoint is still there, while the speed increments significately. MySQL will update the indexes only once every 3 rows; while in the previous example will update the indexes once per row.

While doing the operation we decided to use single inserts to not block the table for a long period of time -inserting many at once blocks the table for longer periods- so we allowed clients to read even when we were doing the import operation.


Inserting one file after the other

This is a simple step, you should loop over the files in the directory, and execute file by file. The script will:

1-Get a list of files not starting with done_
2-Loop over that list, executing mysql < $file
3-Rename $file to done_$file.
4-Sleep for a small period of time to allow the server stabilize, and to serve SELECTs to the clients.
5-Loop again, until no files are remaining.

The speed of the whole process depends on many variables, one of the most important is the period of time you’ll have between each file. It will depend on many variables, the most important are:
1-As it’s a live environment, you should still be serving SELECTs from that table. While you’re inserting the table is not locked, but the indexes on it are.
2-If you’re working in a replication environment, you should monitor the slaves SBM (seconds behind master). Having big SBMs means that the slave server is not fast enough to receive all the updates from the master. In that case you should try some things that will be explained later in this post.
3-Load of the server, beyond MySQL connections and processes, you don’t want to melt your live server.


Let’s discuss the sleep

According to the actual plan we’re going to insert 19K files in the live server, with 1K rows each one. That’s an enormous amount of information that should be not only inserted in the master, but transferred to the slaves; while they also receive INSERTs, UPDATEs and SELECTs from the site itself and the normal activity. This means that stress in the servers will increase. How much? We don’t know yet, lets try.

Pick the first file and throw it to MySQL. Measure the time the server takes to process the file. Let’s say the server processed the file in one second. In a normal situation the slaves will take the same amount of time to process such insert, so a safe value for the sleep between inserts can be one second too. But as we’re working in a live environment where many other things should be propagated to the slaves, we should leave a small window of time to let the propagation finish. I would use 1.2 seconds in that case.

So create a file named sleep.txt and write 1.2 on it. Then update the script that we talked about before to read the sleep.txt file and sleep the amount of time specified there. This will help us to change the value of sleep during the insertion. Let’s say you see how MySQL processes start to increase a lot, or you see that the files are being parsed slower after some minutes, you can edit sleep.txt and set the value to 2, 3, 10; whatever you need. The server will be very thankful for that. Our updated script will now:

1-Get a list of files not starting with done_
2-Loop over that list, executing mysql < $file
3-Rename $file to done_$file.
4-Sleep for a small period of time to allow the server stabilize, and to serve SELECTs to the clients. This value comes from sleep.txt in this way:
SLEEP=`cat /tmp/sleep.txt `; sleep $SLEEP;
5-Loop again, until no files are remaining.

Rename or remove the already parsed file, and call the insertion script.


Monitoring: what to check?

Once the process started, open the monitoring tool you have to control the server activity. I would like to have a command-line utility to check these values, but I hadn’t. I used Sysmon, which shows graphics updated every some minutes, so the information I got was a bit outdated. But having in mind that the process took 2 days, having a delay of some minutes is not an impediment.

As we said, open Sysmon and start to monitor the following things:

  • MySQL Processes, this is how many MySQL processes are being executed on each server. You’ll note an increase on this when you start doing the inserts, but try to keep them in an horizontal line as much as possible. You should know how the server behaves and how much margin you have under normal load. In my case I was able to put it at 1.5 the normal load, and was safe.
  • Seconds Behind Master, if you see that the SMB in the slaves start to increase beyond normal, increase the delay between operations. We’ll see later how to play with the sleep.txt file to get the best balance between speed and stability. When the sleep period is incrementing you should see how the SBM start to decrease. If it does not decrease, put a higher sleep time until you see the SMB drops. In my case I had 2 very slow servers, one of them had a huge SBM, and we decided to exclude it from the application.
  • Statements per second; shows 3 lines representing INSERTs, UPDATEs and SELECTs. You’ll see how the INSERTs line goes higher, while the UPDATEs and SELECTs keep in the same fashion as before.

While watching the graphs you should try to identify issues before they happen. In order to identify issues fast.


Identifying possible problems while stressing the server

Open the mysql client in the master server and in 2 or 3 of the slaves. Keep running SHOW PROCESSLIST; on all servers to identify possible issues. I’ve found during the process that the master server was receiving SELECT queries from front-end, even when it was not supposed. That made me realize that ,because of issues with the slaves, a previous developer made changes in the code to query SELECTs from the master. As soon as I saw that I made changes in the code to avoid those kind of selects and updated the live environment right away. The operation was basically:

  • Throw processlist in the master, and identify the selects.
  • Match the selects with your source code, in your development environment, making sure that effectively in the current version you’re querying the master server and not the slaves.
  • Double check that the query is that one you’ve found
  • Download the affecting file from the live environment to your local machine, and recheck the connection is being done to the master
  • Fix the code to connect to the slaves instead
  • Run PHP Lint on the affected file
  • Upload the file to only one of the live servers, and open the error log right after the upload.
  • Depending on the number of front-end webservers you have you may see or not the impact of the change. In my case I have seventeen webservers, so the change was not noticeable.
  • If you don’t get any errors in the error log, propagate the file to the remaining servers, and keep watching the logs.
  • See how processlist returns each time less SELECTs for the master.

It was amazing how the line of the SELECTs in the master decreased, and at the same exact time the line of the INSERTs increased. The INSERTs where slowed down by the SELECTs being run in the master, that was clear. So, to do this kind of operations, you should know the code as if it was 100% written by you. You should know everything everything everything about your code if you plan to do such stressful thing.

The graphic below shows how the INSERTs started to increase since the server stopped reciving SELECTs around 18h30.

INSERTs per second increase since the master does not receive SELECTs

If for any reason -usually SBM problems, and usually in concrete situations- you need to SELECT from a master, make sure you actually need to do that. How? Try first to read from the slaves. If the slave has the information, use it. If the slave does not have the row, now you can query the master. This simple protection will decrease the load in the master by a very low price, which is querying twice instead of once in the worst scenario. But most of times the slave will have the required information.

This can be a patch, but your application is intended to read only from slaves, and never never from the master.

Another issue we’ve found was related to 2 slow slave servers, one of them wasn’t able to receive the big amount of INSERTs and to serve SELECTs at the same time, and decided to leave the INSERTs in buffer. Again, well knowledge of the code will help you in this situation. My project loads the server addresses from a PHP array. The DB layer shuffles that array and tries to connect sequentially to all servers in a loop, until a connection is established. When a server is too loaded it will reject the connection with “Too many connections”, so the next server will be tried. If no servers are able to handle the connection, an exception is thrown and logged.

Resuming, we have an array of servers, and we randomize it before connecting. If we have 5 servers; each servers has 20% chances to being chosen:

$slaveServers = array(


foreach($slaveServers as $slave){
 $conn = tryToConnect($slave);
 if($conn){ return $conn;}
die('Can not connect to any slave server');

In this example all servers have 20% chances to being selected. Let’s suppose that is the slow server, we would like to send less clients to that server than the others. To do so we can repeat the other servers in the array, increasing their possibility to be selected.

$slaveServers = array(
 '' , // Only once
 '', '', // Twice, increasing their chances to be selected
 '', '',
 '', '',
 '', '',

Our array has now 9 entries, all the slaves except the first have 22% chances to be chosen; and the first one has 11%. The graphic belows shows the impact this update had.

As you can see, the 3rd server stopped receiving SELECTs between 10h and 12h; and other servers took that load. During that moment I’ve put twice server 6 (7 and 8 too, but those are not in the image) in the configuration, so it received the double of the load than other servers.

After 13hs I’ve put again server 3 and 4 again serving SELECTs, but only once in the array, while the others were twice or three times. The impact of these changes is much more evident when watching the SBM graphics (seconds behind master, that’s it the time the slaves take to be 100% a replication)

See how between 9h30 and 11h the server reached the master server, just because it stopped receiving SELECTs. The same operation has been done in server 4 between 12h30 and 13h. Taking it out of the application and leaving it working with replication only.


Seconds behind master in the 3rd server goes up to one hour and a half
Seconds behind master in the 3rd server goes up to one hour and a half

Overnight -not monitored- working

As said before, this process took 2 days to be completed, and of course we weren’t there checking the graphics and the activity during the whole process. During day hours I was in the office, so while I was doing other tasks I verified the status very often. But from home I wasn’t able to check that closely. Before leaving the office I checked the graphics to try to identify possible issues, and tried to verify behavior of the servers during the pasts nights in order to find a pattern of how public behaves during these hours and how the server is affected. I realized that the best way to complete the process is alternating the sleep time every some minutes. Idea is to run with a minimal sleep time of 1 second during 200 or 300 inserts, and then give the server a pause of 30 seconds to become stable again. After the pause, the sleep returns back to 1 second.

To do that we used again the sleep command and the sleep.txt file. The script does, more or less, this:

  1. Loop while true
  2. Write “1” in sleep.txt
  3. Sleep during 300 seconds
  4. Write “30” in sleep.txt
  5. Sleep 30 seconds
  6. Repeat loop.

As the inserting script gets the value from the content of sleep.txt, we can switch the value according to our needs in real time without stopping the process; and this allows us to change dynamically the sleep. If we could have access to the real statistic information from MySQL in a command line I could have coded a script to increase or decrease the sleep time according to the load in the server. But that was outside of our possibilites.

The sleeper script, that sets different values for sleep.txt depending on the time, was left running the whole night. The effect of that script during the night hours can be easily noted in the graphics above. In the first graphic, between 20hs and the end; and in the other graphics between midnight and 4h. All the little peaks are very short sleeps.

If you, as me, ever asked “what is the sleep command useful for?”, there you have a good answer. In this situation the sleep command is the key command of the whole thing. Without sleep this could have been much more complicated.

Receiving status of the task 24 hours a day

When the process started I requested the sysadmins to write a simple small script, that will:

  1. Count how many .sql files are in the directory, and store the value as TOTAL
  2. Count how many of them start with “done_”, and store the value as DONE
  3. Send me an email saying “$DONE of $TOTAL Completed” in the body, “Import process” as subject.

Just call that script once every 5 minutes, and there you have your statuses. This is the script we had; but after some time I realized that a much better approach will be this:

Step 0, create an empty file named total.txt

  1. Count how many .sql files are in the directory, and store the value as TOTAL
  2. Count how many of them start with “done_”, and store the value as DONE
  3. echo “$DONE of $TOTAL Completed” > last.txt (Write the phrase in a file)
  4. cat last.txt total.txt | head > total.txt (Concatenate the last phrase with the previous phrases, putting the last in top. Then get the 10 first lines of that, and save it as total.txt)
  5. Mail the content of total.txt to my email account (or make it accessible through NFS, http, etc)

Now use crons to execute that script every 5 minutes. You’ll get an email in this fashion:

910 of 20000 Completed
830 of 20000 Completed
770 of 20000 Completed
710 of 20000 Completed
660 of 20000 Completed
600 of 20000 Completed
580 of 20000 Completed
520 of 20000 Completed
460 of 20000 Completed
390 of 20000 Completed

This is easily parseable, so easy to get good statistical data to see how the process reacts to different changes in the scenario we saw before through the post. You can clearly see the ratio you’re working at by comparing the previous reports.


At the end of the recovery

At the end, I would like to mention that the work the Sysadmins have done during this week has been amazing. I’ve learned a lot while discussing with them the planning and considering all the different options to get the task done. After such difficult experience I could say I’m happy for the result we had. Now that’s over I feel it has been a good experience.

Anyway there’s a lot of  work to do to optimize the site as I expect. The goal is to migrate the whole table to a new one with the proper indexes. Yes, we’ll migrate the 20 million rows to another table, in the live environment, but that’s a good story for another post.

Thanks for reading!

Nick “Drop table” Andrade

Friday morning, get into the office and got received with some news. Apparently a great part of the ads disappeared completely.

After some greps, MemCache keys checks and exactly one query found that we’ve lost an important user, with it’s comments, blog posts, videos,  list of friends, favourites… No need to mention relation between “ads” and “user” is, but that’s how it is.

So I recover the user in 1 move, getting the ads back in place.

Now it’s time to see how that happens, and here’s where the fun comes. SSH to the server, grepped in crons, found entry. Somebody, somehow, reported that user as abusive 5 times. It’s an internal user, can never be abusive. That situation should never happen, but happened because somebody wanted. Fix the cron, add some lines to log actions in some key points, and start to recover from backup.

Looking at show tables; wrote some queries in a file to run in the backup server in order to get the user’s rows. Got a few of them, sent them to IT and received the reply in txt -no SQL- format. RegExp to get a valid insert -every row should be wrapped with “( ),” , fields separated by comma- , manually adding INSERT INTO at top, add commas between fields- and mysq < recover.sql for each and every file. 6 or 7 of them.

At the end of the process, I receive the list of friends. This one in SQL format. Cool! no need to transform it. It was a 500MB file.

Checked content a bit, and rows seem to have the information I expected. SCP to the server, SSH afterwards and mysql < friends.sql.

Friday 18.30, everything was restored. We’ve lost the ads just 7 hours or so, not a big problem.

I can leave to Québec city without the laptop, just the photo camera and a friend.



Saturday night, I check the email just in case. 20 unreads. “Users lost their friends lists”.

Having in mind the big insert done in Friday, I suppose the read servers got too far behind the masters. Or the table has been damaged somehow, and can not be read. Request a check on the indexes, using my friends laptop, to IT by email. Everything looked good according to him. Bob asks how many rows we have in the backup, and we have ten times more in the backup than in the live table.

Somehow we’ve lost 90% of the table. The same table where I made an insert the previous day.

Spent the whole weekend thinking on that, sending emails to IT and Bob, thinking how that could happen. A cron? no, that table has no cron. An attack through www to a buggy URL? Could be…

Monday morning, arrive at the office early. Got breakfast, sit on my desktop, started to grep access logs counting calls to possible URLs. Everything was normal. No attacks up to the 5th server. Checked until the last one just to be sure, but already thinking in something else. The whole team was working on the same issue, searching deletes on that table through the code. But nothing.


I remember…

I’ve ran a SQL file on friday on that table. A big SQL file. Generated by mysqldump. The only one that I didn’t format.

Could be that
the file
dropped and recreated the table
to populate it
with the rows I’d requested?


20 million rows deleted in one shot.

I’ve learned many things from this little story. The first one is that before any stressful situation as I had on friday, try to calm. Sounds stupid, but go and take a walk. I was not stressed, I wouldn’t say “stressed” as a descriptive word of my situation, but I was not rally calm neither. I was alert, and paying a lot of attention on what I was doing. Probably I was overconcentrated in the data and not in the SQL itself. I was more thinking on how such insert could affect the sync of the servers, because we had that same issue many times in the recent past 2 months or so. And when head’d the friends.sql file I saw the comments on top, the SET NAMES UTF-8, and didn’t realized about the DROP TABLE IF EXISTS. That’s what happened.

The other thing, really important, always grep -Ei “DROP|TRUNCATE” the SQL file to make sure that does not contain that kind of pranks.

The other thing, directly learned from my Linux experience: create another user to administrate. Developers and the code should have access only to INSERT/ALTER/CREATE/DELETE/UPDATE; not to DROP or TRUNCATE. Those rights should be allowed of the administration user, so to do any of these actions you need to actually log out and log back in. This will prevent many accidents.

That was my experience deleting unintentionally an enormous table from the LIVE environment.

Realistic website testing

To make our QA process more realistic and accurate, we decided to give the QA department a set of computers -appart of those they already have. The new set is mostly composed by Intel Centrino with Windows XP, and not less than 5 o 6 different types of spyware installed.

The idea is to reproduce the average visitor experience, which is very far away from the experience they have using their last generation computers.

Sounds stupid, but I think it was a very clever move.