Sunday, March 11, 2018

Migration to GitHub

We have been wanting to migrate to GitHub for quite some time. We already had subversion to GitHub synchronization, so some of the work was already done. What was left were tickets.
We now finally migrated completely to GitHub.


It was a lot more complicated than what it sounds because we had tickets on trac and issues + pull requests on GitHub. One of the key migration goals was to replicate the data in such a way
as to near perfectly clone it; all while not bothering our entire user base with GitHub notifications.

Joe Benden did most of the work. The same person who helped us migrating to autotools. I did not expect such a level of professionalism and perfectionism in the work. We probably had a total 10 to 15 test-runs, each taking 1 to 2 days to complete and a lot of discussions to find the best solutions to issues/limitations we came across. Each test run was followed by review and feedback to improve it on the next run.


All the trac tickets are correctly linked to each other and have their attachments. We had some ticket numbers skipped in trac, due to spam some time ago. Luckily, GitHub API allows to skip issues numbers. A lot of fine tuning was done to do it the way we wanted. One of the item was linking tickets that was sometimes done in different ways (#123 or https://trac.aircrack-ng.org/tickets/123 or sometimes hxxp:// trac.aircrack-ng.org/tickets/123). Even URLs were corrected where needed. Along with a bunch of other small details.


Attachment was a big issue since we wanted to keep them and GitHub pretty much only allows text files or pictures in issues. Anything else is out of question. The best solution we could think about was to store them in a repository, GitHub only limiting large attachments which wasn't a problem in this case. On the filesystem, trac doesn't organize them neatly in directories, by names. Instead, it uses some kind of hashing algorithm and it is necessary to look up in the trac database to match them with the tickets and create a script to batch rename them.


Surprisingly, migrated trac tickets look much better than migrated GitHub issues. They look OK but that is due to GitHub API limitations.


One of the decisions was to avoid spamming people with notifications, so all the tickets are created with the Aircrack-ng account. GitHub doesn't allow to set the creation date or the author (there is a mix or email addresses, anonymous author or just username looking reporters) and with almost 2000 tickets, the best solution was to write down the author and date/time in the issues themselves.


Pull requests is tricky, they can't be migrated from one repo to another, period, even when using the API. So, we now have a copy of the existing ones, without link to the code change. So, if anybody who had a PR in the old repository, if you can recreate it in the new one, it would be great. GitHub API has also some kinds of limitation regarding closed PR/issues.


The reason why we decided to create another repository has to do with importing trac tickets and numbering. Matching would have been way too complicated if we appended to the existing repository. On top of that, if it failed somehow, we had no way of going back. It would also have complicated testing significantly.


Finally, while doing the migration, we noticed that the paid accounts don't have as much rate-limiting as the free accounts and the migration went a lot faster than expected. Just a few hours vs 2 days.


We kept the old repository and renamed it aircrack-ng-archive in case we need to look back at some issue/PR history.

No comments:

Post a Comment