Jump to content
  • Sign Up!

    Join our friendly community of music lovers and be part of the fun 😎

server crash 5/4/09


Guest eFestivals

Recommended Posts

At around 1am today, the main eFestivals server suffered a disk failure. We are currently working on restoring the website content onto a new server from our backups.

We hope to have the majority of content back on-line by around 9pm this evening. Some content may remain unavailable for a little while longer.

Unfortunately, for the forums we've had to revert to the last backup that was made, so messages after 5am on Saturday 4th April have been lost.

Link to comment
Share on other sites

The full story.....

on the main eFestivals server there were two physical disks, with one being a mirror of the other. One totally failed at just after 1am on Sunday morning, causing everything to then run off the second disk. However, it seems to be the case that there were already problems with that second disk, which then started to fail more catastrophically as it was doing more work.

At around 8:30am yesterday, I found the forums saying "too many connections" (I guess because some of the forums data was on the damaged part of that second disk, so inaccessible), while the main part of the website was still working.

In normal circumstances, a reboot of that server would have cleared the 'too many connections' problem, but I couldn't log onto the server to reboot it, so had to call the data centre support. They tried re-booting the server, but it wouldn't restart, because some of the operating system was on the damaged part of the disk. They tried various things to try and recover this disk, but without luck.

Meanwhile, they built a new server for me, which came online at around 1pm, and I then started to upload the backups onto that new server. However - and somewhat stupidly (lesson learnt :)) - the backups were only here in my office, when they could have been on the other eFestivals servers as well .... because of the slow upload speed with ADSL, it took a long while for the first backups to be uploaded (if they'd have been on the other servers, it would have only taken minutes to copy them across). While they were uploading, I decided to not spoil my whole Sunday, and went to the pub to watch the footie. :)

Once the backups of the forums had uploaded, I was quickly able to restore these, and the forums were back online soon after 6pm yesterday. Because the last forums database backup (which has all of the posts and user data) had been made at around 5am on Saturday morning, that was the best backup that could be reverted to. Tha last backup of the forums directory - which includes user avatars, as well as the forums galleries - had been made in mid-march, so a few weeks of avatar changes &/or new user photos have been lost.

None of the content of the main part of the website has been lost. The pages, listings, etc data is held in databases on a different server, so this was completely unaffected. There was an issue with the setup of the new server tho, which meant I couldn't get the new server to access this data on that other server - this issue was resolved before 10am this morning, and the main part of the site has had full functionality (tho without photos) since then.

Meanwhile, since around 1pm yesterday, the backup of the photos has been being uploaded - this will take something like 55 hours to upload, which means that the vast majority of the photos won't be re-available until sometime late tomorrow. In the meantime, various adhoc photos have been uploaded, so that the front page looks complete.

Unfortunately, the last backup of the photos was done just before xmas, so the backup won't be able to restore those photos. However, we do have copies of those photos, so these will have to be put back manually, which isn't a huge task as there's only been a few festies covered so far this year.

While all this has been a big hassle and it's a shame that the server failed on an important day (Glasto ticket day), it's been far from a disaster, as just about everything is recoverable even tho it'll take a little while to get things straight again. Lessons have of course been learned, meaning that the way backups are made will be improved so that if similar happens again things will be a little easier to restore.

Link to comment
Share on other sites

Nasty. Are the servers not monitored for hardware issues?

yep, they are .... but at the data centre, I think only for if they go offline completely (I'll have to check) - the server was still running when I found it, so I guess that's why they didn't get an alert.

I got an automated email telling me that one of the disks had failed completely, which is why I know that it did fail, and at what time. But of course, I only got to read that at about the same time I discovered the problem. :(

As soon as I've wrapped up all of the issues with the restore, I'll be putting in place better backups and better monitoring.

Edited by eFestivals
Link to comment
Share on other sites

unfortunately, the upload of the photos broke sometime overnight, and so I've got to start again. Grrrrrr.

So that the same thing doesn't happen and all that upload time is wasted again, they're now being uploaded in smaller chunks, so over the next couple of days the photos will re-appear bit by bit.

Apologies for the delay in getting the photos restored.

Link to comment
Share on other sites

Sucks dude but I've learnt more about server maintenance reading one post than in 4 years of posting on various sites. Congrats - you've taught me something!

Link to comment
Share on other sites

Hi Neil.

Good job on getting the site back up and running so promptly. Quick question. There are still some posts missing from threads. Are these gone forever or are you still in the process of restoring those?

cheers

unfortunately, all posts and PMs made between 5am Saturday morning and whenever the forums stopped functioning (sometime after 1am and before 8:30am on Sunday) have been lost and can't be restored.

We had to revert to the last backup which was made at approx 5am on Saturday (they're made at 5am each day), as everything else was on the two disks that failed and can't be recovered.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...