Store downtimes significantly harm eCommerce, merchants lose revenue, brands lose credibility, reputation, customer loyalty. And the bigger size of the business is, the higher price of downtimes it pays. For instance, you perform 50 orders per minute, so even a half-hour downtime will cost you revenue from 1500 orders and enormous efforts to get back frustrated customers and repair a damaged reputation. So for the project with 2.7M site visitors per month and 5000 orders per day at the time of migration, we had to find a zero-downtime upgrade solution. And we did.
We are delighted to share with you our experience of migrating Magento 2.1.3 version to 2.3.3 (the latest version at that time) with literally no downtime (yes, not even a second).
As any client would, ours had a simple expectation: for the migration process to happen without any negative effects or delays magically.
- No downtimes
- Possibility to test new production version without customers on the real website data
- Possibility to immediately rollback without downtime
- Transfer customers to the new website gradually
Our team’s initial assumptions were to pick a couple of top-notch developers, focusing solely on migration while freezing all other developments allowing the time between 1 to 6 months to complete this step. The next steps would be releasing the migration by switching the client’s website to the “maintenance” mode; upgrade and convert the data and finally go live.
But this plan was not appropriate for our client.
Obviously, most clients would prefer to have their website up and running without stopping so the business would not risk losing revenue during this time.
Our clients also wanted to test an upgraded version themselves while restricting access for the real customers to the website. The website would need to contain the real products and orders, just like the previous version which was live accessible by customers at that moment. Also, there should be an ability to rollback to the previous version immediately! without any downtime, should that be required.
The main idea
Our team decided to direct the traffic to the new website gradually. Start with access for internal stakeholders (our team and client’s team), then redirect 10% of the users, adding more and more, eventually reaching 100% of the users.
We held a team discussion and came up with the following plan:
- Create a copy of the infrastructure for the new Magento 2 version. This will allow us to have both versions of Magento.
- Both versions would reference the same database.
- Questions remain open: how do we split customers between two versions? How do we manage that?
The list of concerns and roadblocks came up along with this plan:
- One domain - two websites;
- Configuring the required amount of users to the new website;
- Keeping the users at the new website once they had initial access there;
- Providing access to the new website for the internal stakeholders.
- 2.1 and 2.3 versions have different data structures;
- Data sync between two versions;
- Choosing a master system for integrations; e.g.order export, product import, etc.
Before we address the above issues a few words about our infrastructure.
The traffic flows to us from Cloudflare CDN and is split by a round-robin algorithm on DNS level into two load balancers. These load balancers have Nginx installed for the SSL termination and offloading. They then pass the traffic to Varnish which then does round-robin load balancing to the Front-end application nodes (the servers where Magento is installed and running). There are a Redis cluster and a database that are parts of this infrastructure.
Now back to the issues’ list.
The first one, as mentioned earlier is:
One domain and two different websites
After some brainstorming, we realized we cannot add additional IP addresses to a record of DNS as that would result in unregulated traffic. This, of course, will prevent us from meeting the client’s needs.
This meant that we needed to manipulate the traffic within our infrastructure. And also, that we needed two levels of load balancing.
So our infrastructure looked as follows: initial traffic flows to the group of load balancers for the Application A. These load balancers are looking for information on how to split the traffic and also redirect some traffic to Application B. And generally, all load balancers will pass traffic to Front-end nodes with relevant Magento versions: Application A to nodes with Magento 2.1 and Application B to nodes with Magento 2.3
Configuring the required amount of users to the new website
Now let’s look into upstreams in Nginx configuration.
Initially, our configuration looked like this:
Upstream directive of Application A points to the server with the following IP and port ( the port is used for Varnish)
We updated this configuration to the following: we added Application B upstream that points to a different server which would be our third load balancer.
The next step is to split clients between these two upstreams. For this purpose, Nginx has a
ngx_http_split_clients_module that allows doing A/B testing. So we took this module and declared directive
split_clients with the ‘msec’ parameter. All magic happened within this block.
We had two rules: to tell Nginx that we should pass most of the traffic to Application A and that 10% should go to Application B. When we pass the
msec parameter Nginx creates a big hash table in the background. There are around 4 million rows in this hash table and Nginx maps each row to these two variables of Application A and Application B.
As a result, the
upstream variable will contain the value of either Application A or Application B and we can use this variable inside the location directive of Nginx to split the 10% of the general traffic.”
Keeping the users at the new website once they had initial access there
Time to move on to “Sticky sessions”. What this means actually is that once the user has been directed to the one on the two website versions he should “stick” to that same version in the future for obvious reasons.
We need to choose some unique characteristics or parameters that will be unique for all the users and also this parameter should be available inside HTTP request. We decided to use the parameter that we already have which is the
PHPSESSID cookie. So we updated the
split_clients directive to the following: we used “cookie_PHPSESSID” as a parameter for Nginx to split clients and this gave us the result we looked for. Each client will be returned to this directive. Once the page is refreshed “PHPSESSID” remains the same and Nginx returns the relevant Application version for each user.
Providing access to the new website for the internal stakeholders
Now the next challenge is to let the team (QA’s and DEV’s) access the new version website. The solution will sound familiar: cookies. We created enable-upgrade.html and disable-upgrade.html pages which are basically the empty HTML pages. These pages contain small JS snippets that either set an “upgrade” cookie or remove it.
As a result, our Nginx configuration looked as follows:
So we have two upstreams that point to old and new versions accordingly. We have split client_directive that points clients with PHPSESSID to different versions. And also we added a map function to the Nginx configuration that receives cookie_upgrade (that we set on special pages) and defaults to a new version of upstream. Consequently, the rest of the traffic with no map cookie_upgrade will be directed to the split-clients directive.
Sync data between Magento 2.1 and 2.3 versions with different data structures
The next issue is the database. We need to split data into two databases and we need to synchronize data between them.
The reason, as mentioned earlier, is different data types.
- Serialized data changed to JSON;
- The new data structure of 3rd party modules;
- New attributes in source models.
Let’s look at the steps taken for data synchronization:
- Define which data needs to be synced orders (can not be restored,unlike products),inventory,customers,invoices, shipments, catalog(products, categories)
In our specific case, some of these data types were not relevant:
We restricted customers’ access completely during the upgrade. We do not use invoices and shipments. The client was also asked to stop any import for products, categories, etc. during the upgrade. So we have Orders and Inventory to be synced.
- Define the source and the destination (We decided to make a new database “the main one” so we need to sync the data from the old website to the new one)
- Define how do we handle auto increments (For the new database we will have auto increments sequence start with 1 000 000)
All of us use the
getConnection() method, but there is also one called
getConnectionByName(). So we tried to complete this synchronization the easiest possible way. We created a module for Magento 2.1.1. that uses 2 different connections: to internal db and external db. And what is left to do after this is to take data from a table, apply some rules, conditions, modifications required and save it to the external database. This can be done by cron job ,one time per minute.
In Magento, the final version of inventory is always stored in the warehouse.
The automatic inventory update happens once an hour so we used the simplest method:
mysqldump that copies inventory from database A to database B. It was helpful
that the Magento tables have the same structures in both versions
Plenty of integrations were used e.g. order export, product import, stock import, etc.
We decided to move all the incoming API to the new version using Nginx rules.
All outgoing integrations were disabled on the old version and enabled on the new one.
We had some unexpected troubles with integrations, e.g. payment integration. They were resolved in real-time, with customers online. This proves there is always a chance for such things happening and we just cannot predict everything.
- We have fully migrated to the new version of Magento in about 1 week time
- QA team was the first one to get access to the new website, then 10% of customers were directed there
- As we monitored the situation and made sure it runs smoothly
- We gradually reached 100% of the customers on the new version