BitTorrent Sync, a geeky Dropbox alternative, much better!

This tool is a bit old, but I guess it’s worth to mention it

http://www.extremetech.com/computing/161584-bittorrent-sync-released-the-secure-cloud-avoiding-sync-tool-youve-been-waiting-for

It’s about synchronizing files across many computers, using BitTorrent. You can think on this as a DropBox alternative, with the following differences:

1-No limits, the limit will be the space in hard drive. As of now, I have more than 40GB synchronized within all my computers (Music, personal documents and part of the /etc/ directories)

2-It does not need the cloud. If you have computers in a LAN and no access to the internet, they will sync internally at amazing speed.

3-BitTorrent based, so as soon as a computer has a part of the file, it can become a source for that part of the file in the network.

4-Again, it does not need the cloud. The files are never stored in a remote third-party service as DropBox, UbuntuOne, Google Drive and so on.

5-Keeps up to 30 revisions of the files

It is not open source, but the application is completely free to download and use. The free software foundation set this project as a priority to make it open source. And in the last month the developers made public the API, allowing developers to build the GUIs to manage the service, available at http://www.bittorrent.com/sync/developers

Happy syncing!

CTRL-R and Bash shortcuts

When in a Bash shell you can do a backwards search. This is a search through the commands you’ve typed in to re execute them easily. Try it now, go to a Bash window and press CTRL-R. Now you can start typing something.

When you find the command you want to run, just press Enter.
If you want to edit the command, move through it with the left and right arrows.
If you want to move to the next matching command, press CTRL-R again, until you find the right command.
To leave the search, press CTRL-G

Here are the common Bash key bindings that will help you on your daily work. Some of these work on the backwards search also:

Ctrl-r History reverse search
Ctrl-a Jump to BOL
Ctrl-e Jump to EOL
Ctrl-l Clear terminal
Ctrl-k Delete from cursor to EOL
Ctrl- Undo last operation
Ctrl-m Return
Ctrl-w Delete word left from cursor
Ctrl-u Delete from BOL to cursor
Ctrl-x Ctrl-e Open the default editor $EDITOR and run edited command
Ctrl-p Previous command in history
Ctrl-n Next command in history
Ctrl-f Move forward a char
Ctrl-b Move backward a char
Alt-f Move forward a word
Alt-b Move backward a word
Ctrl-d Delete char under cursor. Exit shell if empty.
Alt-d Delete forward word
Ctrl-y Paste content of the kill ring
Ctrl-t Swap current char with previous char
Alt-t Swap current word with previous word
Alt-u Uppercase word at cursor
Alt-l Lowercase word at cursor
Ctrl-s Freeze terminal
Ctrl-q Restore frozen terminal
Shift-PgUp Scroll screen up
Shift-PgDn Scroll screen down
Most of these key bindings work in Emacs too.

Oh, le goluboi

Nowadays anybody who knows how to open Notepad and manage to save a file as .php puts “PHP developer” on his/her CV. I’ve seen a lot of that.

 

Here I’m bringing a real example of code I’ve found on one of my projects, wrote by a junior pretend-to-be-a-senior developer, who luckily for the health of the company,  was moved to one of the less prioritized projects across the board.

See by yourself the original code and the fixed version.

doubtful quality version

Now, what are the problems on this code?

  1. Variable name inconsistency. You have $key_name in the same way you have $outputFields. Why using both camelCase and underscored variables in the code, randomly picking them?
  2. The value to be returned is in the variable $tmp_objects. The main reason to be of the function , the reason why it was wrote, is to return a value, that you call temporary? If it is the return value, it is not temporary. It should be named $result, $return, $output; whatever, even $charles or $william; but never $tmp_objects.
  3. Does the developer need to wait until the end of the function to return an empty array? Isn’t it better to return the value right away when we know nothing will be changed, before loading the configuration and the additional modules?

 

Here’s my not-tested version of the code, with some rough comments. Not the best thing in the world, not a big change -it’s not my job to fix crappy code, it is to show my guys what kind of things should be avoided.

decent version

 

So, to conclude, please foll0w these simple rules:

  1. Return the value as soon as possible, for readability reasons.
  2. Name methods, attributes and variables in a consistent and descriptive way, avoid_doing_this and then doing somethingLikeThis.
  3. Call the return variable in a descriptive way, you should know across the whole function what you are going to return.

 

Leaky bucket algorithm implemented on Redis and PHP

One of the biggest problems of web development are programmers not fully aware of the infrastructure they work with. An average PHP developer will not know about the limitations the infrastructure has, the maximum number of connections the web site can handle, etc. The PHP developer just programs on PHP, refreshes the page, runs the unit tests, and if everything is OK, considers his work as finished. However once the application is deployed, everything starts to depend on the actual infrastructure.

In many cases the pages will not be ready to handle an excessive amount of traffic; and will noticeably degrade service quality under heavy -and not so heavy- loads. For example with hihgly used APIs by external developers -who could not be aware of what caching means-, or when the site is attacked by scrapers who want to populate their site with your data, or a botnet.

This is a recurrent problem we have at my office; thus I’ve proposed to start using the well known leaky bucket system to protect our sites from scrapers. My manager liked the initiative, but he wanted to apply it at the infrastructure level and not in the application level. Even when I strongly agree with that as a definitive solution, I think there’s no reason to implement the whole thing at the infrastructure level without knowing how bad is the current situation. What would we say? “Block all IPs making more than X amount of request per minute?” Yes, that’s the idea,but what would be that X?

What I wanted to do, instead, is to apply the leaky bucket at the application level for testing purposes. That will not take too much time, and by logging how many requests and different clients would have been blocked we can get some interesting information to make the definitive implementation at the infrastructure level. Also that would allow us not only to log who will be blocked, but to put some Analytics Event tracking codes. In that way we would see in Analytics how many real users would have been blocked with the specified settings, allowing us to tune it up. Besides the server-side logging, we want to know also which percentage of that are real browsers and not scrapers.

That is how I came up with these small PHP files that make the whole implementation for testing purposes.

The code is splitted in two parts: the request script and the clean-up script. Request is basically the doorman, “yes, come in” or “no, get the f*ck out of here”. The clean-up script is who reviews the list every often and takes some drops out of the bucket. The whole thing uses the Flexihash script for consistent hashing and so splitting the data across many sets -as much as you need. The example is fully dynamic, but you can hardcode the set names to make it faster.

Said that, please get the FlexiHash files from http://github.com/pda/flexihash and set the include line to the proper place.

Then, let’s start with the initial inc.php file which should be included in both request.php and cleanup.php:

<?
include 'flexihash-0.1.9.php';

// Redis extension from nicolasff, thanks bro!
$redis = new Redis();
$redis->connect('localhost');

$hasher = new Flexihash();

$numberOfSets = 10;

for($i = 1; $i <=$numberOfSets; $i++){
	$pad = str_pad($i,strlen($numberOfSets),'0',STR_PAD_LEFT);
	$sets[] = 'set'.$pad; // To create sets like set01, set02;
	// or set0001, set0999 if $numberOfSets is 1000
}

$hasher->addTargets($sets);

This part is easy, right? We prepared the Redis connection and the hasher.

Now, let’s see request.php

<?
require('inc.php');

if(sizeof($argv) < 3){
	die('2 parameters: clientId, actionId; ie A search');
}

$id = $argv[1];
$action = $argv[2];

$id = $id.'-'.$action;

$set = $hasher->lookup($id);

$period = 30; // In seconds
$limit = 6; // How many hits allowed every $period seconds

$actualHits = $redis->zscore($set, $id);

if($actualHits >= $limit){
	echo "Not allowed.  {$actualHits} hits done, only {$limit} are allowed every {$period} seconds\n";

	// Log that this request would have been locked.
	$redis->zIncrBy('locked', 1, $id);
	die();
}


list($actualHits) = $redis->multi()->zIncrBy($set, 1, $id)->zAdd($set.':control', -1, $id)->exec();
$available = $limit - $actualHits;
echo "Approved, you have {$available} hits\n";

In short, the script will see if the requested client (in most cases will be an IP check) has enough shots to make the requested action. If it does not have shots, we log the action for statistical purposes and fine tuning. If he does have shots, we increase it’s number of requests by 1, we add a -1 to the control set -to be used later on- and we let the client know how many hits are remaining.

Now, let’s see the cleanup script; that should be executed periodically in a cronjob. We’ll go back to that subject later on, no worries.

<?
require('inc.php');

// Loop across all sets
foreach($sets as $set){
	// Remove all entries with score =< 1 from all sets. This will
	// reduce the size before furthing processing
	echo "Set has ".$redis->zCard($set)." elements, before cleanup.";
	$redis->zRemRangeByScore($set, '-inf', '1');
	echo " Now it has ".$redis->zCard($set).".\n";
	
	// Remember the control set we created on request.php? That sorted set contains all entries
	// the set has, but with score as -1. The goal of that zset is to reduce by 1 all scores
	// storing user hits, by intersecting set and set:control.
	echo "Control set had ".$redis->zcard($set.':control') . ' before cleanup, now it has ';
	$redis->zinterstore($set.':control', array($set.':control', $set), array(1,0), 'SUM');
	echo $redis->zcard($set.':control') ."\n";
		
	// Now do the interstore on set by substracting one
	// Remember in the request.php file we add the client to set with score -1?
	// That's to use it with a zInterStore and thus substracting 1 from all
	// scores. The trick here is the aggregation function SUM instead of WEIGHT.
	$redis->zinterstore($set, array($set, $set.':control'), array(1,1), 'SUM');
	echo "Control applied, all entries were reduced by 1";
}

Well, that’s all! Now the trick is how to run the PHP cleanup script every 5 seconds or so. In my case I will run it every 30 seconds at first; by adding something like this in the crontab:

* * * * * php cleanup.php;
* * * * * sleep 30; php cleanup.php;

Why are these scripts good?
First, are very light. Second, are using sharding to speed up all checks. Third, are reusable.

What can be improved?
1-Make it compatible with different buckets. You might like to have one bucket for APIs, one bucket for login attempts, one bucket for premium customers who could have right to hit the APIs more often.
2-Convert it to OOP 😉 That’s the homework guys, if you convert it to OOP, drop me a line.
3-Apply it in a lower level, so the blocked client does not even hit the web server by being stopped at the network level.
4-Use more than one server -one master and one or more slaves- or many masters in a sharded setup based by set names. Remember Redis is a single threaded server, so you will definitely have advantage by running one instance per free core in your server. In that way your limit won’t be CPU but storage and RAM.

If this small project gets approved, I will apply these checks right after the framework and routing rules are loaded. In that way I will have access to the Redis configuration files, and to set the “action” names in a per-route basis. All categories, keywords, tags and search pages will be grouped under the “listing” action name. Login, register and password reset will be grouped in the “users” action. Embed and on-site player pages will be grouped under the “player” action. Voting, commenting, adding to favorites will be grouped under the “rating” action. This will, for sure, make our sites much more stable and will give better response times to normal users.

How does Pornhub use Redis?

Since Eric Pickup’s talk in Confoo 2012, a great and informative session, I’ve seen lots of comments and postings in many blogs and forums about Redis and different sites, many of them owned by Manwin. Redis is a very interesting technology that opens doors to different and new features, capabilities and what is even more interesting to new ways of thinking web based applications. As Pornhub developer, that’s what I am, I will clarify how Redis is used in our end, and why.

The first feature we used Redis for is counting video plays. Every time a video is played, the hit is inserted in a big hash, by increasing the value in one. Every some hundreds of plays we store the actual value in our  MySQL, which is used later on for searches and statistical information. We also keep track of monthly and daily views, among as rating and votes, so visitors can sort videos by most viewed this day or most viewed this month as well.

We use Redis also for part of the social-network features, those that keep relation between users as “friends” or “blocked people”. Let’s say you block a user, we’ll store that relation in both Redis and MySQL. When reading the relations, we’ll try to use Redis, and if the information is not there we’ll fall into MySQL.

The implemenation I’ve proposed here is a bit tricky and takes the best of Redis and the best of MemCache in a nice in-house implementation.

We have one big sorted set named blocks:lastUpdate, having the user ID as score, and a timestamp as value. And for each user we have a sorted set named “user:[userId]:blocks”, containing the blocked user ID as value and the timestamp when it was blocked as score.

When we need to know who is blocked by a certain user we first read the “last update” value from the blocks:lastUpdate hash, and we add a Time To Leave value we’ve set in our configuration file.  If that value (lastUpdate + TTL) is lower than now, we consider the user:[userId]:blocks as expired, so we clear it and we reload from MySQL. If the key was not considered as expired we simply use it, avoiding thus an access to MySQL.

Why not use Redis’ SetExp to expire the key instead?

This is a good question, and caused some discussion within the team. The main reason is that when you set the expiration time you can not increase it or decrease it as you need. In case of data loss we can recover the Redis information from the MySQL, or the MySQL from the Redis information. For example now our TTL in the configuration file is one month (!). This means that we refresh the information stored in Redis once per month per user. However if for any reason I need to reduce drastically the access to DB, I can change the TTL setting to 3 or 4 months; and from that moment on I’m 100% sure that no access will be done to DB, because all keys will sill be valid.

In case of a MySQL crash and data loss, we can change the setting to 1 year, and write a small script that will read from Redis and then repopulate the concerning MySQL table.

When investigating this I’ve found an excellent Redis command, OBJECT. It returns information about the key you’re passing as parameter, and one of the subcommands you can use is IDLE. That subcommand returns since when the key has been idle, which is actually the last read or write. This could be useful for many cases, but it’s not in ours since we would need the time when the key has been created. Having that information in the Redis level could allow us to get rid of the blocks:lastUpdate set.

Many other features are using Redis, and many more will come in the next months, so stay tuned to get info about how Pornhub evolves with this interesting technology.

Recover millions of rows in production MySQL server from backup

That’s what we did during the last week. The first try was doneby the sysadmin, they tried to insert everything in one move. Imagine an INSERT statement with 20 million values, of course it will fail!

The second try took 2 days, using splitted files, but the sysadmin decided to restart from scratch because of a missing index. I would keep that unindexed table at least and try an alternative way to deliver results to the users as soon as possible. But the sysadmins are that, sys admins, and not business-admins, so the guy took the decision of simply drop the table, again. For the third massive-insert I had the experience from the other two massive insert; so I decided to define some rules to the sysadmin in charge of the import in order to do things in a different way.

The first thing was already known by both Troy and me, the 20 million rows should be splitted in many files and not in a single one as they tried to do once.  So we agreed on splitting the SQL in 19024 files of 1000 rows each one.

Here I’ll document different aspects you could consider if you find yourself in the same situation.

 

Preparing the table to receive the inserts

I’ve read many times that is convenient to insert the information before the indexes are created, to speed up the process. Then you should create the indexes in the table. This is true for inactive servers -this means staging servers or not-in-production servers- but it’s not good at all for live servers. Creating an index implies moving the whole table to a temporary one, creating the index itself, and then move back all the information to the indexed table. Of course the table is blocked during the whole process, so all your clients will be stuck: no inserts, no updates, no selects. The load in the server will go up to the skies, so probably not only that table will fail but the others too.

Even if the server is not active, is better to have at least the PRIMARY KEY / UNIQUE index created. The insert process will be slower, true. But you’ll be safe if for any reason you need to re-insert a file that hang in the middle. If you don’t have the UNIQUE index created you’ll end with duplicated rows, that will be more complicated to remove afterwards.

Having the UNIQUE index created, and using IGNORE option in the INSERT clause is the best way to be able to reparse a file in case you need to.

 

Avoid extended-inserts.

While extended inserts are good in terms of speed and in terms of dump file size, you can not continue the process from the middle in case it breaks. A quick example:

INSERT INTO table (a,b,c) VALUES (1,2,3), (4,5,6), (7,8,9);

INSERT INTO table (a,b,c) VALUES (1,2,3);
INSERT INTO table (a,b,c) VALUES (4,5,6);
INSERT INTO table (a,b,c) VALUES (7,8,9);

The first insert will be much faster than the three inserts in the second line, because MySQL will update the indexes only once at the end of the insert. But both blocks will do the same exact operation (assuming no duplicated keys)

The second line with the 3 statements will be slower because of the three writes and three updates; but in terms of safety is much better since if a statement fails due to an existing key, the others will still be valid.

Another option will be to use INSERT IGNORE to ignore errors while inserting. If you’re trying to insert an existing value in the index, and you are using IGNORE, the error will become a warning and the statement will be parsed anyway. This means that the whole insert will success, except the particular block that was repeated.

The advantage of using one INSERT per row is named split, and is a Linux command.

mysqldump (tablename, options, etc) --skip-extended-insert > dump.sql

At this moment we had a nice dump.sql file with 20 million INSERTs, one per line. So we can move to the next step.

 

Split the big file into many

By running the split command in a file with one insert per line you get multiple files that are still valid. The advantage of this is that you can parse a few of them while monitoring the server status -and all the slaves- and see how they are affected. Also you can stop the process for any reason you found, and then continue from the break point.

A good balance between 100% individual inserts and a single insert will all the values would be using a single insert per file, at the top, and creating extended inserts, one per file. An example of a 1-insert-per-row file could be:

INSERT INTO table (1,2,3);
INSERT INTO table (4,5,6);
INSERT INTO table (7,8,9);

INSERT INTO table (10,11,12);
INSERT INTO table (13,14,15);
INSERT INTO table (16,17,18);

A faster and still safe approach could be

 

INSERT IGNORE INTO table (1,2,3), (4,5,6), (7,8,9);

INSERT IGNORE INTO table (10,11,12), (13,14,15), (16,17,18);

 

The chance of starting from the last breakpoint is still there, while the speed increments significately. MySQL will update the indexes only once every 3 rows; while in the previous example will update the indexes once per row.

While doing the operation we decided to use single inserts to not block the table for a long period of time -inserting many at once blocks the table for longer periods- so we allowed clients to read even when we were doing the import operation.

 

Inserting one file after the other

This is a simple step, you should loop over the files in the directory, and execute file by file. The script will:

1-Get a list of files not starting with done_
2-Loop over that list, executing mysql < $file
3-Rename $file to done_$file.
4-Sleep for a small period of time to allow the server stabilize, and to serve SELECTs to the clients.
5-Loop again, until no files are remaining.

The speed of the whole process depends on many variables, one of the most important is the period of time you’ll have between each file. It will depend on many variables, the most important are:
1-As it’s a live environment, you should still be serving SELECTs from that table. While you’re inserting the table is not locked, but the indexes on it are.
2-If you’re working in a replication environment, you should monitor the slaves SBM (seconds behind master). Having big SBMs means that the slave server is not fast enough to receive all the updates from the master. In that case you should try some things that will be explained later in this post.
3-Load of the server, beyond MySQL connections and processes, you don’t want to melt your live server.

 

Let’s discuss the sleep

According to the actual plan we’re going to insert 19K files in the live server, with 1K rows each one. That’s an enormous amount of information that should be not only inserted in the master, but transferred to the slaves; while they also receive INSERTs, UPDATEs and SELECTs from the site itself and the normal activity. This means that stress in the servers will increase. How much? We don’t know yet, lets try.

Pick the first file and throw it to MySQL. Measure the time the server takes to process the file. Let’s say the server processed the file in one second. In a normal situation the slaves will take the same amount of time to process such insert, so a safe value for the sleep between inserts can be one second too. But as we’re working in a live environment where many other things should be propagated to the slaves, we should leave a small window of time to let the propagation finish. I would use 1.2 seconds in that case.

So create a file named sleep.txt and write 1.2 on it. Then update the script that we talked about before to read the sleep.txt file and sleep the amount of time specified there. This will help us to change the value of sleep during the insertion. Let’s say you see how MySQL processes start to increase a lot, or you see that the files are being parsed slower after some minutes, you can edit sleep.txt and set the value to 2, 3, 10; whatever you need. The server will be very thankful for that. Our updated script will now:

1-Get a list of files not starting with done_
2-Loop over that list, executing mysql < $file
3-Rename $file to done_$file.
4-Sleep for a small period of time to allow the server stabilize, and to serve SELECTs to the clients. This value comes from sleep.txt in this way:
SLEEP=`cat /tmp/sleep.txt `; sleep $SLEEP;
5-Loop again, until no files are remaining.

Rename or remove the already parsed file, and call the insertion script.

 

Monitoring: what to check?

Once the process started, open the monitoring tool you have to control the server activity. I would like to have a command-line utility to check these values, but I hadn’t. I used Sysmon, which shows graphics updated every some minutes, so the information I got was a bit outdated. But having in mind that the process took 2 days, having a delay of some minutes is not an impediment.

As we said, open Sysmon and start to monitor the following things:

  • MySQL Processes, this is how many MySQL processes are being executed on each server. You’ll note an increase on this when you start doing the inserts, but try to keep them in an horizontal line as much as possible. You should know how the server behaves and how much margin you have under normal load. In my case I was able to put it at 1.5 the normal load, and was safe.
  • Seconds Behind Master, if you see that the SMB in the slaves start to increase beyond normal, increase the delay between operations. We’ll see later how to play with the sleep.txt file to get the best balance between speed and stability. When the sleep period is incrementing you should see how the SBM start to decrease. If it does not decrease, put a higher sleep time until you see the SMB drops. In my case I had 2 very slow servers, one of them had a huge SBM, and we decided to exclude it from the application.
  • Statements per second; shows 3 lines representing INSERTs, UPDATEs and SELECTs. You’ll see how the INSERTs line goes higher, while the UPDATEs and SELECTs keep in the same fashion as before.

While watching the graphs you should try to identify issues before they happen. In order to identify issues fast.

 

Identifying possible problems while stressing the server

Open the mysql client in the master server and in 2 or 3 of the slaves. Keep running SHOW PROCESSLIST; on all servers to identify possible issues. I’ve found during the process that the master server was receiving SELECT queries from front-end, even when it was not supposed. That made me realize that ,because of issues with the slaves, a previous developer made changes in the code to query SELECTs from the master. As soon as I saw that I made changes in the code to avoid those kind of selects and updated the live environment right away. The operation was basically:

  • Throw processlist in the master, and identify the selects.
  • Match the selects with your source code, in your development environment, making sure that effectively in the current version you’re querying the master server and not the slaves.
  • Double check that the query is that one you’ve found
  • Download the affecting file from the live environment to your local machine, and recheck the connection is being done to the master
  • Fix the code to connect to the slaves instead
  • Run PHP Lint on the affected file
  • Upload the file to only one of the live servers, and open the error log right after the upload.
  • Depending on the number of front-end webservers you have you may see or not the impact of the change. In my case I have seventeen webservers, so the change was not noticeable.
  • If you don’t get any errors in the error log, propagate the file to the remaining servers, and keep watching the logs.
  • See how processlist returns each time less SELECTs for the master.

It was amazing how the line of the SELECTs in the master decreased, and at the same exact time the line of the INSERTs increased. The INSERTs where slowed down by the SELECTs being run in the master, that was clear. So, to do this kind of operations, you should know the code as if it was 100% written by you. You should know everything everything everything about your code if you plan to do such stressful thing.

The graphic below shows how the INSERTs started to increase since the server stopped reciving SELECTs around 18h30.

INSERTs per second increase since the master does not receive SELECTs

If for any reason -usually SBM problems, and usually in concrete situations- you need to SELECT from a master, make sure you actually need to do that. How? Try first to read from the slaves. If the slave has the information, use it. If the slave does not have the row, now you can query the master. This simple protection will decrease the load in the master by a very low price, which is querying twice instead of once in the worst scenario. But most of times the slave will have the required information.

This can be a patch, but your application is intended to read only from slaves, and never never from the master.

Another issue we’ve found was related to 2 slow slave servers, one of them wasn’t able to receive the big amount of INSERTs and to serve SELECTs at the same time, and decided to leave the INSERTs in buffer. Again, well knowledge of the code will help you in this situation. My project loads the server addresses from a PHP array. The DB layer shuffles that array and tries to connect sequentially to all servers in a loop, until a connection is established. When a server is too loaded it will reject the connection with “Too many connections”, so the next server will be tried. If no servers are able to handle the connection, an exception is thrown and logged.

Resuming, we have an array of servers, and we randomize it before connecting. If we have 5 servers; each servers has 20% chances to being chosen:

$slaveServers = array(
 '10.0.0.101',
 '10.0.0.102',
 '10.0.0.103',
 '10.0.0.104',
 '10.0.0.105'
);

shuffle($slaveServers);

foreach($slaveServers as $slave){
 $conn = tryToConnect($slave);
 if($conn){ return $conn;}
}
die('Can not connect to any slave server');

In this example all servers have 20% chances to being selected. Let’s suppose that 10.0.0.101 is the slow server, we would like to send less clients to that server than the others. To do so we can repeat the other servers in the array, increasing their possibility to be selected.

$slaveServers = array(
 '10.0.0.101' , // Only once
 '10.0.0.102', '10.0.0.102', // Twice, increasing their chances to be selected
 '10.0.0.103', '10.0.0.103',
 '10.0.0.104', '10.0.0.104',
 '10.0.0.105', '10.0.0.105',
);

Our array has now 9 entries, all the slaves except the first have 22% chances to be chosen; and the first one has 11%. The graphic belows shows the impact this update had.

As you can see, the 3rd server stopped receiving SELECTs between 10h and 12h; and other servers took that load. During that moment I’ve put twice server 6 (7 and 8 too, but those are not in the image) in the configuration, so it received the double of the load than other servers.

After 13hs I’ve put again server 3 and 4 again serving SELECTs, but only once in the array, while the others were twice or three times. The impact of these changes is much more evident when watching the SBM graphics (seconds behind master, that’s it the time the slaves take to be 100% a replication)

See how between 9h30 and 11h the server reached the master server, just because it stopped receiving SELECTs. The same operation has been done in server 4 between 12h30 and 13h. Taking it out of the application and leaving it working with replication only.

 

Seconds behind master in the 3rd server goes up to one hour and a half
Seconds behind master in the 3rd server goes up to one hour and a half

Overnight -not monitored- working

As said before, this process took 2 days to be completed, and of course we weren’t there checking the graphics and the activity during the whole process. During day hours I was in the office, so while I was doing other tasks I verified the status very often. But from home I wasn’t able to check that closely. Before leaving the office I checked the graphics to try to identify possible issues, and tried to verify behavior of the servers during the pasts nights in order to find a pattern of how public behaves during these hours and how the server is affected. I realized that the best way to complete the process is alternating the sleep time every some minutes. Idea is to run with a minimal sleep time of 1 second during 200 or 300 inserts, and then give the server a pause of 30 seconds to become stable again. After the pause, the sleep returns back to 1 second.

To do that we used again the sleep command and the sleep.txt file. The script does, more or less, this:

  1. Loop while true
  2. Write “1” in sleep.txt
  3. Sleep during 300 seconds
  4. Write “30” in sleep.txt
  5. Sleep 30 seconds
  6. Repeat loop.

As the inserting script gets the value from the content of sleep.txt, we can switch the value according to our needs in real time without stopping the process; and this allows us to change dynamically the sleep. If we could have access to the real statistic information from MySQL in a command line I could have coded a script to increase or decrease the sleep time according to the load in the server. But that was outside of our possibilites.

The sleeper script, that sets different values for sleep.txt depending on the time, was left running the whole night. The effect of that script during the night hours can be easily noted in the graphics above. In the first graphic, between 20hs and the end; and in the other graphics between midnight and 4h. All the little peaks are very short sleeps.

If you, as me, ever asked “what is the sleep command useful for?”, there you have a good answer. In this situation the sleep command is the key command of the whole thing. Without sleep this could have been much more complicated.

Receiving status of the task 24 hours a day

When the process started I requested the sysadmins to write a simple small script, that will:

  1. Count how many .sql files are in the directory, and store the value as TOTAL
  2. Count how many of them start with “done_”, and store the value as DONE
  3. Send me an email saying “$DONE of $TOTAL Completed” in the body, “Import process” as subject.

Just call that script once every 5 minutes, and there you have your statuses. This is the script we had; but after some time I realized that a much better approach will be this:

Step 0, create an empty file named total.txt

  1. Count how many .sql files are in the directory, and store the value as TOTAL
  2. Count how many of them start with “done_”, and store the value as DONE
  3. echo “$DONE of $TOTAL Completed” > last.txt (Write the phrase in a file)
  4. cat last.txt total.txt | head > total.txt (Concatenate the last phrase with the previous phrases, putting the last in top. Then get the 10 first lines of that, and save it as total.txt)
  5. Mail the content of total.txt to my email account (or make it accessible through NFS, http, etc)

Now use crons to execute that script every 5 minutes. You’ll get an email in this fashion:

910 of 20000 Completed
830 of 20000 Completed
770 of 20000 Completed
710 of 20000 Completed
660 of 20000 Completed
600 of 20000 Completed
580 of 20000 Completed
520 of 20000 Completed
460 of 20000 Completed
390 of 20000 Completed

This is easily parseable, so easy to get good statistical data to see how the process reacts to different changes in the scenario we saw before through the post. You can clearly see the ratio you’re working at by comparing the previous reports.

 

At the end of the recovery

At the end, I would like to mention that the work the Sysadmins have done during this week has been amazing. I’ve learned a lot while discussing with them the planning and considering all the different options to get the task done. After such difficult experience I could say I’m happy for the result we had. Now that’s over I feel it has been a good experience.

Anyway there’s a lot of  work to do to optimize the site as I expect. The goal is to migrate the whole table to a new one with the proper indexes. Yes, we’ll migrate the 20 million rows to another table, in the live environment, but that’s a good story for another post.

Thanks for reading!

Nick “Drop table” Andrade

Friday morning, get into the office and got received with some news. Apparently a great part of the ads disappeared completely.

After some greps, MemCache keys checks and exactly one query found that we’ve lost an important user, with it’s comments, blog posts, videos,  list of friends, favourites… No need to mention relation between “ads” and “user” is, but that’s how it is.

So I recover the user in 1 move, getting the ads back in place.

Now it’s time to see how that happens, and here’s where the fun comes. SSH to the server, grepped in crons, found entry. Somebody, somehow, reported that user as abusive 5 times. It’s an internal user, can never be abusive. That situation should never happen, but happened because somebody wanted. Fix the cron, add some lines to log actions in some key points, and start to recover from backup.

Looking at show tables; wrote some queries in a file to run in the backup server in order to get the user’s rows. Got a few of them, sent them to IT and received the reply in txt -no SQL- format. RegExp to get a valid insert -every row should be wrapped with “( ),” , fields separated by comma- , manually adding INSERT INTO at top, add commas between fields- and mysq < recover.sql for each and every file. 6 or 7 of them.

At the end of the process, I receive the list of friends. This one in SQL format. Cool! no need to transform it. It was a 500MB file.

Checked content a bit, and rows seem to have the information I expected. SCP to the server, SSH afterwards and mysql < friends.sql.

Friday 18.30, everything was restored. We’ve lost the ads just 7 hours or so, not a big problem.

I can leave to Québec city without the laptop, just the photo camera and a friend.

 

 

Saturday night, I check the email just in case. 20 unreads. “Users lost their friends lists”.

Having in mind the big insert done in Friday, I suppose the read servers got too far behind the masters. Or the table has been damaged somehow, and can not be read. Request a check on the indexes, using my friends laptop, to IT by email. Everything looked good according to him. Bob asks how many rows we have in the backup, and we have ten times more in the backup than in the live table.

Somehow we’ve lost 90% of the table. The same table where I made an insert the previous day.

Spent the whole weekend thinking on that, sending emails to IT and Bob, thinking how that could happen. A cron? no, that table has no cron. An attack through www to a buggy URL? Could be…

Monday morning, arrive at the office early. Got breakfast, sit on my desktop, started to grep access logs counting calls to possible URLs. Everything was normal. No attacks up to the 5th server. Checked until the last one just to be sure, but already thinking in something else. The whole team was working on the same issue, searching deletes on that table through the code. But nothing.

Then…

I remember…

I’ve ran a SQL file on friday on that table. A big SQL file. Generated by mysqldump. The only one that I didn’t format.

Could be that
the file
dropped and recreated the table
to populate it
with the rows I’d requested?

Yes.

20 million rows deleted in one shot.

I’ve learned many things from this little story. The first one is that before any stressful situation as I had on friday, try to calm. Sounds stupid, but go and take a walk. I was not stressed, I wouldn’t say “stressed” as a descriptive word of my situation, but I was not rally calm neither. I was alert, and paying a lot of attention on what I was doing. Probably I was overconcentrated in the data and not in the SQL itself. I was more thinking on how such insert could affect the sync of the servers, because we had that same issue many times in the recent past 2 months or so. And when head’d the friends.sql file I saw the comments on top, the SET NAMES UTF-8, and didn’t realized about the DROP TABLE IF EXISTS. That’s what happened.

The other thing, really important, always grep -Ei “DROP|TRUNCATE” the SQL file to make sure that does not contain that kind of pranks.

The other thing, directly learned from my Linux experience: create another user to administrate. Developers and the code should have access only to INSERT/ALTER/CREATE/DELETE/UPDATE; not to DROP or TRUNCATE. Those rights should be allowed of the administration user, so to do any of these actions you need to actually log out and log back in. This will prevent many accidents.

That was my experience deleting unintentionally an enormous table from the LIVE environment.