Gnome Shell Overview with a mouse click on Ubuntu

Some weeks ago I’ve bought a new keyboard and mouse combo, the Logitech MK710 keyboard with the Logitech M705 optical mouse. This is a tremendously nice combination, that much that I’ve actually bought another combo to use at home.

Mouse Logitech M705 in Ubuntu, using all buttons

As a Gnome Shell user, I wanted to take advantage of the side-button the mouse has. Is that little line you see in the image above. So my idea was to trigger the Overview mode with this button, as if the Windows (or Super) key was pressed.

Gnome Shell Overview

So, how to do this under Ubuntu 13.10? Essentially by binding the button to a command. That command will execute an action, which is “press a key”

First, install the following packages: xbindkeys xautomation

Then, use the xev command to find the number of the mouse button. For that, run xev, press the button and take note of the value. When running xev, a little white box will appear. Put the mouse in the box and press the button. In the terminal -xev needs to be called from a terminal, did I forget to mention that?- you’ll see some output similar to this:

ButtonPress event, serial 33, synthetic NO, window 0x3c00001,
root 0x28d, subw 0x0, time 89142314, (89,93), root:(2011,171),
state 0x10, button 10, same_screen YES

ButtonRelease event, serial 33, synthetic NO, window 0x3c00001,
root 0x28d, subw 0x0, time 89142454, (89,93), root:(2011,171),
state 0x10, button 10, same_screen YES

What matters is the “button 10” part.

Now, we’ll need to bind the “button 10” to a key. This can not be done directly. What we can do is to bind a key to a command, this is execute a command when the key is pressed. Then, we can make the command to be a “send key” command.

To test it, first type this in the console

xte 'key Super_L'; sleep 1; xte 'key Super_L';

You should see the Overview mode, and after one second it will revert back. If this works, now we can bind the keys. We’re going to use xbindkeys for this. If there’s no configuration file, most likely this is the case for you, you can create a default one by typing

xbindkeys --defaults > $HOME/.xbindkeysrc

Then edit this file with your favorite editor, and add this

"xte 'key Super_L'"
b:10+release

“b:10” is the button you’ve found before. The +release is to trigger the command when the button is released and not when it’s pressed.

Last, you’ll want to execute xbindkeys on every start. For this, you might create an entry in the ~/.config/autostart folder named “xbindkeys.desktop” with the following content:

[Desktop Entry]
Name=XBindKeys
Comment=XBindKeys
Exec=/usr/bin/xbindkeys
Icon=solaar
StartupNotify=false
Terminal=false
Type=Application
X-GNOME-Autostart-enabled=true

Now you can log out, log back in, and enjoy your new button!

 

 

 

Ubuntu: solve Broadcom Sta slowdown on battery power problem

I’ve found myself forced to replace a new -9 months old- D-Link router, model DIR-835, because all it’s Ethernet ports stopped working. I will never buy another D-Link device, ever in the life. Well, past that point, I bought a Linksys N750 router, even being aware of all the security issues that the Cisco Cloud software has raised.

After doing the setup and everything, I’ve found that all YouTube videos on my MacBook 6,2 running Ubuntu Raring 13.04 were loading at a painfully slow speed. I, undoubtly, blamed the new router, who received a big amount of resets, setting changes, etc. I tried plugging in the Ethernet wire and testing the speed in that way, just to find that was working great.

After some time of testing and testing, the battery on the laptop was drained. So I plugged in the charger -while running a speed test- to find that the download speed increased at the same time I plugged the charger.

I said to myself “no, this can not be true” and kept plugging and unplugging the charger from the laptop, just to see how the speed was going up and down. I issued the iwconfig eth1 command and then, here’s what I’ve got:

With charger:

eth1 IEEE 802.11abgn ESSID:"JuanRomanV2"
Mode:Managed Channel:38 Access Point: C8:D7:19:21:3F:E0
Bit Rate=162 Mb/s Tx-Power:24 dBm
Retry min limit:7 RTS thr:off Fragment thr:off
Power Management:off
Link Quality=5/5 Signal level=-56 dBm Noise level=-88 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:0 Missed beacon:0

Without charger:

eth1 IEEE 802.11abgn ESSID:"JuanRomanV2"
Mode:Managed Channel:38 Access Point: C8:D7:19:21:3F:E0
Bit Rate=243 Mb/s Tx-Power:24 dBm
Retry min limit:7 RTS thr:off Fragment thr:off
Power Managementmode:All packets received
Link Quality=4/5 Signal level=-59 dBm Noise level=-88 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:0 Missed beacon:0

Oh… so this nice card has a “Power management” feature that, in order to manage the power, manages to make me feel back in the 90’s with those DUN connections, huh?

I’ve investigated about the issue to find that Broadcom included such feature in the recent Broadcom STA Linux driver for models B43224 -and possibly many more. On my other laptop, a Dell Vostro 3360 also running Raring, I have a Broadcom wireless and seems like I don’t have that problem.

So, how to solve this?

Seems like there’s a deamon taking care of the battery status and changing the wireless card setting in consequence. After some tries, my solution was to add this content in the /etc/pm/power.d/wireless file:

#!/bin/sh
sleep 2;
/sbin/iwconfig eth1 power off;
return

Why the sleep? Because there’s somewhere another deamon taking care of the change. With the Sleep 2, I give time enough for the other deamon to be happy, to think the change has been done and to quit; and then my script saves the day by enabling the card again to eat all  the battery power. But well, YouTube works decently at least.

Last, sorry Linksys! I blamed you hard for nothing…

Oh, le goluboi

Nowadays anybody who knows how to open Notepad and manage to save a file as .php puts “PHP developer” on his/her CV. I’ve seen a lot of that.

 

Here I’m bringing a real example of code I’ve found on one of my projects, wrote by a junior pretend-to-be-a-senior developer, who luckily for the health of the company,  was moved to one of the less prioritized projects across the board.

See by yourself the original code and the fixed version.

doubtful quality version

Now, what are the problems on this code?

  1. Variable name inconsistency. You have $key_name in the same way you have $outputFields. Why using both camelCase and underscored variables in the code, randomly picking them?
  2. The value to be returned is in the variable $tmp_objects. The main reason to be of the function , the reason why it was wrote, is to return a value, that you call temporary? If it is the return value, it is not temporary. It should be named $result, $return, $output; whatever, even $charles or $william; but never $tmp_objects.
  3. Does the developer need to wait until the end of the function to return an empty array? Isn’t it better to return the value right away when we know nothing will be changed, before loading the configuration and the additional modules?

 

Here’s my not-tested version of the code, with some rough comments. Not the best thing in the world, not a big change -it’s not my job to fix crappy code, it is to show my guys what kind of things should be avoided.

decent version

 

So, to conclude, please foll0w these simple rules:

  1. Return the value as soon as possible, for readability reasons.
  2. Name methods, attributes and variables in a consistent and descriptive way, avoid_doing_this and then doing somethingLikeThis.
  3. Call the return variable in a descriptive way, you should know across the whole function what you are going to return.

 

Good Developer vs Bad Developer

From Guy Nirpaz‘s blog

Good developer is an artist, a craftsman who enjoys the process of creation. Bad developer considers himself as a programmer, responsible for generating lines of code.

Good developer understands the problems of the customers. Bad developer understands only the technical problem at hand. Good developer does not define the why, but constantly strives to understand why. He’s responsible for the how, and still sees the big picture. Bad developer is focused on building classes and methods and configuration files, but does not get the big picture.

Good developer understands the complete architecture of the product. Bad developer knows only the components he’s written. Good developer fully understands the technologies that are used within the product. He understands what they are used for, and how they work internally.

Good developer is not afraid of new technologies but embraces them by quickly getting a grip. Bad developer only sticks to what he knows. His immediate reaction to any technical change is negative.

Good developer is constantly learning and improving his skills. Good developer reads technical articles, and finishes several technical books a year. Bad developer does not have time to learn. He’s always too busy with other stuff.

Good developer cares about the product quality. He is also very much concerned with the process quality. Good developer pushes himself to create bug-free code; bad developer leaves it to QA to find bugs to fix.

Good developer develops features which create value for customers. Bad developer completes tasks. Good developer will never claim the requirements are incomplete, and will make sure to fully understand the features he’s working on. Bad developer will wait until the finest details are available. To emphasize: good developer is the CEO of the feature – he’s going to make sure he always has the information needed to accomplish the feature, and in case information is missing he’ll make sure he gets it.

Good developer is not afraid to go into anyone’s code. Bad developer is afraid of others looking into his. Good developer understands that it shouldn’t take more time to write self-explanatory and well-documented code. Bad developer always needs to allocate extra time to document and simplify.

Good developer will never feel his code is good enough, and will always continue to clean and fix. Good developer always strives to create elegant solutions but understands that his job is to deliver value to customers. Bad developer thinks only about the elegance of his code and leave the job of delivering value to others.

Coding with style. Excessive style, actually.

We were having a bug where a DIV was being hidden by a asynchronous bar with Javascript on top of the page. So one of my front end developers came up with this solution: put the DIV a bit lower to show it completely. This is coding with a lot of style, isn’t it?

Double style tag? Are you joking me, bro?

Weirdest reason in a technical request email ever: Porn is not allowed in Moldova

Some time ago I was working on a porn search engine project -now with more than 10 thousands visits a day and growing- and I found myself writing the weirdest reason ever to ask for a domain change. The site was www.porn.md ; supposed to be “the porn doctor”. You would write your problem in the search box, and the doctor will give you some remedies. Everything was going good, traffic was going up and everything, until we received a notification from the TLD .md managers: Moldova does not allow porn websites.

We needed to move the project to a different domain: pornmd.com. There was no choice.

I went to the email client and I sent an email to my hosting company, the weirdest reason ever.

I will need to change the domain name from porn.md to pornmd.com because porn is not allowed in Moldova.

Thanks!

Nicolás Andrade

 

“Porn is not allowed in Moldova”. Have you ever wrote a weirder reason than that?

Datos para todos

Regla No. 2 – La regla del acceso garantizado

Cada ítem de datos debe ser lógicamente accesible al ejecutar una búsqueda que combine el nombre de la tabla, su clave primaria, y el nombre de la columna.

Esto significa que dado un nombre de tabla, dado el valor de la clave primaria, y dado el nombre de la columna requerida, deberá encontrarse uno y solamente un valor. Por esta razón la definición de claves primarias para todas las tablas es prácticamente obligatoria.

Regla No. 6 – La regla de la actualización de vistas

Todas las vistas que son teóricamente actualizables, deben ser actualizables por el sistema mismo.

La mayoría de las RDBMS permiten actualizar vistas simples, pero deshabilitan los intentos de actualizar vistas complejas.

 

Reglas de Codd

New ‘tax’: Internet Explorer 7 customers will be charged extra

Kogan.com, an australian retailer; has decided to “tax” their customers using Internet Explorer 7 or lower.

Ruslan Kogan, chief executive officer of Kogan.com, explained that they need to find a way to cover the extra cost of designing their pages to look properly in a very old browser.

Every month the tax will be increased 0.1%,because IE7 is one month older.

More information at http://www.bbc.com/news/technology-18440979

“Anyone who is involved with the internet and web technology would know the amount of time that is wasted to support all these antiquated browsers,” Kogan said. “You have to make all these work-arounds all the time to make sure the site works properly on it.”

 

How does Pornhub use Redis?

Since Eric Pickup’s talk in Confoo 2012, a great and informative session, I’ve seen lots of comments and postings in many blogs and forums about Redis and different sites, many of them owned by Manwin. Redis is a very interesting technology that opens doors to different and new features, capabilities and what is even more interesting to new ways of thinking web based applications. As Pornhub developer, that’s what I am, I will clarify how Redis is used in our end, and why.

The first feature we used Redis for is counting video plays. Every time a video is played, the hit is inserted in a big hash, by increasing the value in one. Every some hundreds of plays we store the actual value in our  MySQL, which is used later on for searches and statistical information. We also keep track of monthly and daily views, among as rating and votes, so visitors can sort videos by most viewed this day or most viewed this month as well.

We use Redis also for part of the social-network features, those that keep relation between users as “friends” or “blocked people”. Let’s say you block a user, we’ll store that relation in both Redis and MySQL. When reading the relations, we’ll try to use Redis, and if the information is not there we’ll fall into MySQL.

The implemenation I’ve proposed here is a bit tricky and takes the best of Redis and the best of MemCache in a nice in-house implementation.

We have one big sorted set named blocks:lastUpdate, having the user ID as score, and a timestamp as value. And for each user we have a sorted set named “user:[userId]:blocks”, containing the blocked user ID as value and the timestamp when it was blocked as score.

When we need to know who is blocked by a certain user we first read the “last update” value from the blocks:lastUpdate hash, and we add a Time To Leave value we’ve set in our configuration file.  If that value (lastUpdate + TTL) is lower than now, we consider the user:[userId]:blocks as expired, so we clear it and we reload from MySQL. If the key was not considered as expired we simply use it, avoiding thus an access to MySQL.

Why not use Redis’ SetExp to expire the key instead?

This is a good question, and caused some discussion within the team. The main reason is that when you set the expiration time you can not increase it or decrease it as you need. In case of data loss we can recover the Redis information from the MySQL, or the MySQL from the Redis information. For example now our TTL in the configuration file is one month (!). This means that we refresh the information stored in Redis once per month per user. However if for any reason I need to reduce drastically the access to DB, I can change the TTL setting to 3 or 4 months; and from that moment on I’m 100% sure that no access will be done to DB, because all keys will sill be valid.

In case of a MySQL crash and data loss, we can change the setting to 1 year, and write a small script that will read from Redis and then repopulate the concerning MySQL table.

When investigating this I’ve found an excellent Redis command, OBJECT. It returns information about the key you’re passing as parameter, and one of the subcommands you can use is IDLE. That subcommand returns since when the key has been idle, which is actually the last read or write. This could be useful for many cases, but it’s not in ours since we would need the time when the key has been created. Having that information in the Redis level could allow us to get rid of the blocks:lastUpdate set.

Many other features are using Redis, and many more will come in the next months, so stay tuned to get info about how Pornhub evolves with this interesting technology.

Recover millions of rows in production MySQL server from backup

That’s what we did during the last week. The first try was doneby the sysadmin, they tried to insert everything in one move. Imagine an INSERT statement with 20 million values, of course it will fail!

The second try took 2 days, using splitted files, but the sysadmin decided to restart from scratch because of a missing index. I would keep that unindexed table at least and try an alternative way to deliver results to the users as soon as possible. But the sysadmins are that, sys admins, and not business-admins, so the guy took the decision of simply drop the table, again. For the third massive-insert I had the experience from the other two massive insert; so I decided to define some rules to the sysadmin in charge of the import in order to do things in a different way.

The first thing was already known by both Troy and me, the 20 million rows should be splitted in many files and not in a single one as they tried to do once.  So we agreed on splitting the SQL in 19024 files of 1000 rows each one.

Here I’ll document different aspects you could consider if you find yourself in the same situation.

 

Preparing the table to receive the inserts

I’ve read many times that is convenient to insert the information before the indexes are created, to speed up the process. Then you should create the indexes in the table. This is true for inactive servers -this means staging servers or not-in-production servers- but it’s not good at all for live servers. Creating an index implies moving the whole table to a temporary one, creating the index itself, and then move back all the information to the indexed table. Of course the table is blocked during the whole process, so all your clients will be stuck: no inserts, no updates, no selects. The load in the server will go up to the skies, so probably not only that table will fail but the others too.

Even if the server is not active, is better to have at least the PRIMARY KEY / UNIQUE index created. The insert process will be slower, true. But you’ll be safe if for any reason you need to re-insert a file that hang in the middle. If you don’t have the UNIQUE index created you’ll end with duplicated rows, that will be more complicated to remove afterwards.

Having the UNIQUE index created, and using IGNORE option in the INSERT clause is the best way to be able to reparse a file in case you need to.

 

Avoid extended-inserts.

While extended inserts are good in terms of speed and in terms of dump file size, you can not continue the process from the middle in case it breaks. A quick example:

INSERT INTO table (a,b,c) VALUES (1,2,3), (4,5,6), (7,8,9);

INSERT INTO table (a,b,c) VALUES (1,2,3);
INSERT INTO table (a,b,c) VALUES (4,5,6);
INSERT INTO table (a,b,c) VALUES (7,8,9);

The first insert will be much faster than the three inserts in the second line, because MySQL will update the indexes only once at the end of the insert. But both blocks will do the same exact operation (assuming no duplicated keys)

The second line with the 3 statements will be slower because of the three writes and three updates; but in terms of safety is much better since if a statement fails due to an existing key, the others will still be valid.

Another option will be to use INSERT IGNORE to ignore errors while inserting. If you’re trying to insert an existing value in the index, and you are using IGNORE, the error will become a warning and the statement will be parsed anyway. This means that the whole insert will success, except the particular block that was repeated.

The advantage of using one INSERT per row is named split, and is a Linux command.

mysqldump (tablename, options, etc) --skip-extended-insert > dump.sql

At this moment we had a nice dump.sql file with 20 million INSERTs, one per line. So we can move to the next step.

 

Split the big file into many

By running the split command in a file with one insert per line you get multiple files that are still valid. The advantage of this is that you can parse a few of them while monitoring the server status -and all the slaves- and see how they are affected. Also you can stop the process for any reason you found, and then continue from the break point.

A good balance between 100% individual inserts and a single insert will all the values would be using a single insert per file, at the top, and creating extended inserts, one per file. An example of a 1-insert-per-row file could be:

INSERT INTO table (1,2,3);
INSERT INTO table (4,5,6);
INSERT INTO table (7,8,9);

INSERT INTO table (10,11,12);
INSERT INTO table (13,14,15);
INSERT INTO table (16,17,18);

A faster and still safe approach could be

 

INSERT IGNORE INTO table (1,2,3), (4,5,6), (7,8,9);

INSERT IGNORE INTO table (10,11,12), (13,14,15), (16,17,18);

 

The chance of starting from the last breakpoint is still there, while the speed increments significately. MySQL will update the indexes only once every 3 rows; while in the previous example will update the indexes once per row.

While doing the operation we decided to use single inserts to not block the table for a long period of time -inserting many at once blocks the table for longer periods- so we allowed clients to read even when we were doing the import operation.

 

Inserting one file after the other

This is a simple step, you should loop over the files in the directory, and execute file by file. The script will:

1-Get a list of files not starting with done_
2-Loop over that list, executing mysql < $file
3-Rename $file to done_$file.
4-Sleep for a small period of time to allow the server stabilize, and to serve SELECTs to the clients.
5-Loop again, until no files are remaining.

The speed of the whole process depends on many variables, one of the most important is the period of time you’ll have between each file. It will depend on many variables, the most important are:
1-As it’s a live environment, you should still be serving SELECTs from that table. While you’re inserting the table is not locked, but the indexes on it are.
2-If you’re working in a replication environment, you should monitor the slaves SBM (seconds behind master). Having big SBMs means that the slave server is not fast enough to receive all the updates from the master. In that case you should try some things that will be explained later in this post.
3-Load of the server, beyond MySQL connections and processes, you don’t want to melt your live server.

 

Let’s discuss the sleep

According to the actual plan we’re going to insert 19K files in the live server, with 1K rows each one. That’s an enormous amount of information that should be not only inserted in the master, but transferred to the slaves; while they also receive INSERTs, UPDATEs and SELECTs from the site itself and the normal activity. This means that stress in the servers will increase. How much? We don’t know yet, lets try.

Pick the first file and throw it to MySQL. Measure the time the server takes to process the file. Let’s say the server processed the file in one second. In a normal situation the slaves will take the same amount of time to process such insert, so a safe value for the sleep between inserts can be one second too. But as we’re working in a live environment where many other things should be propagated to the slaves, we should leave a small window of time to let the propagation finish. I would use 1.2 seconds in that case.

So create a file named sleep.txt and write 1.2 on it. Then update the script that we talked about before to read the sleep.txt file and sleep the amount of time specified there. This will help us to change the value of sleep during the insertion. Let’s say you see how MySQL processes start to increase a lot, or you see that the files are being parsed slower after some minutes, you can edit sleep.txt and set the value to 2, 3, 10; whatever you need. The server will be very thankful for that. Our updated script will now:

1-Get a list of files not starting with done_
2-Loop over that list, executing mysql < $file
3-Rename $file to done_$file.
4-Sleep for a small period of time to allow the server stabilize, and to serve SELECTs to the clients. This value comes from sleep.txt in this way:
SLEEP=`cat /tmp/sleep.txt `; sleep $SLEEP;
5-Loop again, until no files are remaining.

Rename or remove the already parsed file, and call the insertion script.

 

Monitoring: what to check?

Once the process started, open the monitoring tool you have to control the server activity. I would like to have a command-line utility to check these values, but I hadn’t. I used Sysmon, which shows graphics updated every some minutes, so the information I got was a bit outdated. But having in mind that the process took 2 days, having a delay of some minutes is not an impediment.

As we said, open Sysmon and start to monitor the following things:

  • MySQL Processes, this is how many MySQL processes are being executed on each server. You’ll note an increase on this when you start doing the inserts, but try to keep them in an horizontal line as much as possible. You should know how the server behaves and how much margin you have under normal load. In my case I was able to put it at 1.5 the normal load, and was safe.
  • Seconds Behind Master, if you see that the SMB in the slaves start to increase beyond normal, increase the delay between operations. We’ll see later how to play with the sleep.txt file to get the best balance between speed and stability. When the sleep period is incrementing you should see how the SBM start to decrease. If it does not decrease, put a higher sleep time until you see the SMB drops. In my case I had 2 very slow servers, one of them had a huge SBM, and we decided to exclude it from the application.
  • Statements per second; shows 3 lines representing INSERTs, UPDATEs and SELECTs. You’ll see how the INSERTs line goes higher, while the UPDATEs and SELECTs keep in the same fashion as before.

While watching the graphs you should try to identify issues before they happen. In order to identify issues fast.

 

Identifying possible problems while stressing the server

Open the mysql client in the master server and in 2 or 3 of the slaves. Keep running SHOW PROCESSLIST; on all servers to identify possible issues. I’ve found during the process that the master server was receiving SELECT queries from front-end, even when it was not supposed. That made me realize that ,because of issues with the slaves, a previous developer made changes in the code to query SELECTs from the master. As soon as I saw that I made changes in the code to avoid those kind of selects and updated the live environment right away. The operation was basically:

  • Throw processlist in the master, and identify the selects.
  • Match the selects with your source code, in your development environment, making sure that effectively in the current version you’re querying the master server and not the slaves.
  • Double check that the query is that one you’ve found
  • Download the affecting file from the live environment to your local machine, and recheck the connection is being done to the master
  • Fix the code to connect to the slaves instead
  • Run PHP Lint on the affected file
  • Upload the file to only one of the live servers, and open the error log right after the upload.
  • Depending on the number of front-end webservers you have you may see or not the impact of the change. In my case I have seventeen webservers, so the change was not noticeable.
  • If you don’t get any errors in the error log, propagate the file to the remaining servers, and keep watching the logs.
  • See how processlist returns each time less SELECTs for the master.

It was amazing how the line of the SELECTs in the master decreased, and at the same exact time the line of the INSERTs increased. The INSERTs where slowed down by the SELECTs being run in the master, that was clear. So, to do this kind of operations, you should know the code as if it was 100% written by you. You should know everything everything everything about your code if you plan to do such stressful thing.

The graphic below shows how the INSERTs started to increase since the server stopped reciving SELECTs around 18h30.

INSERTs per second increase since the master does not receive SELECTs

If for any reason -usually SBM problems, and usually in concrete situations- you need to SELECT from a master, make sure you actually need to do that. How? Try first to read from the slaves. If the slave has the information, use it. If the slave does not have the row, now you can query the master. This simple protection will decrease the load in the master by a very low price, which is querying twice instead of once in the worst scenario. But most of times the slave will have the required information.

This can be a patch, but your application is intended to read only from slaves, and never never from the master.

Another issue we’ve found was related to 2 slow slave servers, one of them wasn’t able to receive the big amount of INSERTs and to serve SELECTs at the same time, and decided to leave the INSERTs in buffer. Again, well knowledge of the code will help you in this situation. My project loads the server addresses from a PHP array. The DB layer shuffles that array and tries to connect sequentially to all servers in a loop, until a connection is established. When a server is too loaded it will reject the connection with “Too many connections”, so the next server will be tried. If no servers are able to handle the connection, an exception is thrown and logged.

Resuming, we have an array of servers, and we randomize it before connecting. If we have 5 servers; each servers has 20% chances to being chosen:

$slaveServers = array(
 '10.0.0.101',
 '10.0.0.102',
 '10.0.0.103',
 '10.0.0.104',
 '10.0.0.105'
);

shuffle($slaveServers);

foreach($slaveServers as $slave){
 $conn = tryToConnect($slave);
 if($conn){ return $conn;}
}
die('Can not connect to any slave server');

In this example all servers have 20% chances to being selected. Let’s suppose that 10.0.0.101 is the slow server, we would like to send less clients to that server than the others. To do so we can repeat the other servers in the array, increasing their possibility to be selected.

$slaveServers = array(
 '10.0.0.101' , // Only once
 '10.0.0.102', '10.0.0.102', // Twice, increasing their chances to be selected
 '10.0.0.103', '10.0.0.103',
 '10.0.0.104', '10.0.0.104',
 '10.0.0.105', '10.0.0.105',
);

Our array has now 9 entries, all the slaves except the first have 22% chances to be chosen; and the first one has 11%. The graphic belows shows the impact this update had.

As you can see, the 3rd server stopped receiving SELECTs between 10h and 12h; and other servers took that load. During that moment I’ve put twice server 6 (7 and 8 too, but those are not in the image) in the configuration, so it received the double of the load than other servers.

After 13hs I’ve put again server 3 and 4 again serving SELECTs, but only once in the array, while the others were twice or three times. The impact of these changes is much more evident when watching the SBM graphics (seconds behind master, that’s it the time the slaves take to be 100% a replication)

See how between 9h30 and 11h the server reached the master server, just because it stopped receiving SELECTs. The same operation has been done in server 4 between 12h30 and 13h. Taking it out of the application and leaving it working with replication only.

 

Seconds behind master in the 3rd server goes up to one hour and a half
Seconds behind master in the 3rd server goes up to one hour and a half

Overnight -not monitored- working

As said before, this process took 2 days to be completed, and of course we weren’t there checking the graphics and the activity during the whole process. During day hours I was in the office, so while I was doing other tasks I verified the status very often. But from home I wasn’t able to check that closely. Before leaving the office I checked the graphics to try to identify possible issues, and tried to verify behavior of the servers during the pasts nights in order to find a pattern of how public behaves during these hours and how the server is affected. I realized that the best way to complete the process is alternating the sleep time every some minutes. Idea is to run with a minimal sleep time of 1 second during 200 or 300 inserts, and then give the server a pause of 30 seconds to become stable again. After the pause, the sleep returns back to 1 second.

To do that we used again the sleep command and the sleep.txt file. The script does, more or less, this:

  1. Loop while true
  2. Write “1” in sleep.txt
  3. Sleep during 300 seconds
  4. Write “30” in sleep.txt
  5. Sleep 30 seconds
  6. Repeat loop.

As the inserting script gets the value from the content of sleep.txt, we can switch the value according to our needs in real time without stopping the process; and this allows us to change dynamically the sleep. If we could have access to the real statistic information from MySQL in a command line I could have coded a script to increase or decrease the sleep time according to the load in the server. But that was outside of our possibilites.

The sleeper script, that sets different values for sleep.txt depending on the time, was left running the whole night. The effect of that script during the night hours can be easily noted in the graphics above. In the first graphic, between 20hs and the end; and in the other graphics between midnight and 4h. All the little peaks are very short sleeps.

If you, as me, ever asked “what is the sleep command useful for?”, there you have a good answer. In this situation the sleep command is the key command of the whole thing. Without sleep this could have been much more complicated.

Receiving status of the task 24 hours a day

When the process started I requested the sysadmins to write a simple small script, that will:

  1. Count how many .sql files are in the directory, and store the value as TOTAL
  2. Count how many of them start with “done_”, and store the value as DONE
  3. Send me an email saying “$DONE of $TOTAL Completed” in the body, “Import process” as subject.

Just call that script once every 5 minutes, and there you have your statuses. This is the script we had; but after some time I realized that a much better approach will be this:

Step 0, create an empty file named total.txt

  1. Count how many .sql files are in the directory, and store the value as TOTAL
  2. Count how many of them start with “done_”, and store the value as DONE
  3. echo “$DONE of $TOTAL Completed” > last.txt (Write the phrase in a file)
  4. cat last.txt total.txt | head > total.txt (Concatenate the last phrase with the previous phrases, putting the last in top. Then get the 10 first lines of that, and save it as total.txt)
  5. Mail the content of total.txt to my email account (or make it accessible through NFS, http, etc)

Now use crons to execute that script every 5 minutes. You’ll get an email in this fashion:

910 of 20000 Completed
830 of 20000 Completed
770 of 20000 Completed
710 of 20000 Completed
660 of 20000 Completed
600 of 20000 Completed
580 of 20000 Completed
520 of 20000 Completed
460 of 20000 Completed
390 of 20000 Completed

This is easily parseable, so easy to get good statistical data to see how the process reacts to different changes in the scenario we saw before through the post. You can clearly see the ratio you’re working at by comparing the previous reports.

 

At the end of the recovery

At the end, I would like to mention that the work the Sysadmins have done during this week has been amazing. I’ve learned a lot while discussing with them the planning and considering all the different options to get the task done. After such difficult experience I could say I’m happy for the result we had. Now that’s over I feel it has been a good experience.

Anyway there’s a lot of  work to do to optimize the site as I expect. The goal is to migrate the whole table to a new one with the proper indexes. Yes, we’ll migrate the 20 million rows to another table, in the live environment, but that’s a good story for another post.

Thanks for reading!