Summary
At about 1:25pm on April 3, 2025, an outage due to a bug in a Drupal module upgrade affected multiple components of the PortableApps.com infrastructure. This affected visiting our website and downloading/updating apps via the PortableApps.com Platform. The website and app downloads/updates were brought back online at 10:35pm on April 4th with outgoing email notifications, RSS feeds, and connectivity for XP/Vista being fixed over the coming days.
A Module Upgrade Goes Wrong: April 3 @ 1:25pm
On April 3rd, I was upgrading the mailsystem module within Drupal to a new version released by tag1consulting. As background, Drupal 7 has been end of lifed and we contract with tag1consulting to monitor and release security updates to our existing CMS. The D7ES mailsystem module included security fixes and was a priority to upgrade. Their release contained a fatal bug affecting older sites that may not have a path defined correctly, resulting in it deleting files associated with resources in the Drupal registry, including core files.
Compounding Fix: April 3 @ 1:35pm
Once the module had partially broken the site, the website would throw errors but partially work. On contacting tag1consulting, they suggested a registry_rebuild. On performing this action, the site then became completely inaccessible throwing 500 internal error on any page that was attempted to view. At this point, tag1consulting suggested restoring from backup. After sending them some of the errors I was seeing during the update, they pulled the faulty module update and warned all their users not to apply it.
Fatal Restore Mistake: April 3 @ 1:50pm
At this point, it was clear that all data since our 2am full backup would be lost as both the Drupal database and user/core files on the web server had been affected. I began the restore process using Rackspace's web-based restore utility. The default option was to restore everything from the root of the backup to the root of the server. I (incorrectly) assumed that this would function similarly to a snapshot and restore files while the virtual server was turned off. Unfortunately, it proceeded to attempt to restore all files from root path to root path on the running server. This corrupted the server and left it unable to boot.
As a related aside, I was sick with a fever and coughing to the point of vomiting while this was occurring, so this likely affected my decisions and assumptions on the restore functionality at that time.
Also on a related note, at this point tag1consulting looped in Brandon Bergren from their support team to assist with getting us back online. He was extremely helpful in getting us back online and tracking down some modern PHP incompatibilities in Drupal 7 modules.
Reaching out to Rackspace to assist with recovering the system to a state that the appropriate files could be restored, I was informed that since our version of CentOS was end-of-lifed, they would not assist at all despite having been paying the monthly fully managed fee for years without incident. Rackspace cloud does not allow loading of a custom image or custom OS, only permitting loading images that they provide.
The Failed Server Recovery: April 3 @ 1:50pm
I was able to mount the server storage to an emergency console, but could not correctly restore a backup to the files to get the machine to a bootable state. The last full snapshot image of the machine was years old and would not be an efficient starting point to restore the server to working order.
The Database Recovery: April 3 @ 2pm
I created a new mySQL database server instance within Rackspace Cloud and restore our 2am backup to it without incident. At this point both a post-breakage mySQL server instance from the afternoon and a pre-breakage 2am mySQL server instance were now up and running.
The New Server Build: April 3-4
I built a new AlmaLinux 8 server instance within Rackspace Cloud and configured it properly for use with Drupal and our standard set of modules. I then restored the working Drupal files from the 2am backup. The basic Drupal build was working by later at night. Once I'd made that progress with a basic working server and fully working database, I chose to get some sleep rather than let sickness and lack of sleep cause another compounding bad decision.
The next day, I continued setting up Drupal 7 on the new server. I spent much of the day working out bits of incompatible modules. While Drupal 7 supports modern PHP, many modules will have issues. I posted a JShrink Fix For AdvAgg On Drupal 7 and PHP 8 on my site to assit others with one of the incidents I ran into. I also updated the PHP code for our server-side stats, download redirections, and other elements to properly support modern PHP. I hardened the server, configured Let's Encrypt, configured it to use the newer mySQL restored instance, and did the usual server setup and maintenance tasks.
Back Online: April 4 @ 10:35pm
At 10:35pm on April 4, I felt confident with the configuration and working of the server enough to bring it back online serving users. I announced it on the website soon after. Website usage, browser app downloads, PA.c Platform downloads/updates, etc were all working again with occasional hiccups here and there.
Later Tweaks: April 5 - 21
The following day, April 5, outgoing mail was fixed at 3:32pm. This allowed account signups, password resets, and forum notifications to again go out.
On April 8, after app releases had resumed, it was noticed that the RSS feed was not properly updating. I rebuilt a PHP file to update the appropriate cached version and configured crontab to update it on schedule.
On April 18 I reconfigured the server security settings to allow standard XP and Vista machines without Legacy Update or my TLS patches to connect via unencrypted http to ensure those users could continue to update the PA.c Platform and their apps.
On April 21 I adjusted the security parameters to allow XP and Vista to download the PA.c Platform and apps via Internet Explorer 8, albeit with a security issue they need to dismiss and without images loading.
Aftermath
We now have an up to date tech stack running on AlmaLinux 8 with modern PHP 8. We won't have to rely on third party updates to our outdated CentOS stack. The server itself is running on newer cloud infrastructure and the website and server updates are served faster as a result. Our server is also now able to be served by our Rackspace fully managed support contract as well. Additionally, we're now able to use Let's Encrypt for digital certificates for the main web server as we do with our CDN and download servers.
Changes Made
We now have weekly automated server snapshots in addition to our nightly full backups. This ensures that if a server-corrupting incident does occur in the future, we'll have a very recent image to restore files to without much difficulty.
While we don't have the budget to keep a running QA environment alongside our standard one in the cloud, I now have the ability to spin up a temporary QA environment in the cloud to test updates on. This is in addition to a more-limited testing environment I've set up locally within a virtual machine to perform tests on. While this will increase the time to apply updates, it will allow proper QA testing without the expense of keeping an always-running QA environment live.
Future Changes
I'm working on separating out the download/update handling to a more simplified server instance outside the Rackspace Cloud. This will allow for a more robust setup where if a major incident happens on the Drupal server, it won't affect PA.c Platform downloads/updates. I'm also working on setting up a limited site that would allow for a status page about ongoing issues as well as links to download the PA.c Platform and popular apps that could be substituted in for the main PortableApps.com site in the event a major event does occur.
More long term, I'm looking to upgrade to Drupal 11 for the site. This will involve some major changes as many of the modules we use are incompatible. It will affect forum handling most of all. I looked into Backdrop CMS (a Drupal 7 fork that is maintained and gets security updates) but it doesn't appear to fully meet our needs. I'm also exploring using a different front end for the site itself and converting the forums to phpbb or similar, but this is less likely.
I'm also exploring moving hosting providers. As we've been hosting with Rackspace for years 'without a net' since they wouldn't have supported any issues with our site anyway, the ~$1,000USD per month hosting fees are likely something we can decrease significantly by moving to a provider with a lower cost and lesser level of support than we have now... but more support than we've unknowingly had up until now.
Last Thoughts
I'd like to offer my sincere apologies for anyone impacted by the events of last month. Although the main outage was 'only' 33 hours, I'm sure some folks had some critical needs that went unmet during that time. I hope that no one was impacted in a way that caused them permanent issues personally or professionally. I'd also like to thank you all for your continued support of PortableApps.com and your understanding of the issues that were face in this incident. You all make up a huge part of this project and I am genuinely grateful for you all.
Kind Regards,
John
i hope you are successfull with update drupal and changing server.
would like to hear more on the server change, only saw some ads in tech mag. was very cheap compairing to 1k.
maybe you migrate to some of it.
interesant topic, how the server technology changes.
on the net you see many praising easy upgrade drupal but possibly yealing for views.
hope all goes well.
wish all the best for it.
the only one thing missing is a nice alternative for the droped todolist, nice calendar kanban combo.
maybe i find a alternativ, if so i let you know. will look for possibly opensource. saw only few in a list.
i help were i can. maybe you can help me too. threads in account/track.
if you don't want to help don't post crap!
to point out...
hope you stay on drupal, as otherwise could lost people.
used to design, etc.
hope all stays only better on new drupal.
thx
i help were i can. maybe you can help me too. threads in account/track.
if you don't want to help don't post crap!