Page 1 of 1

All accounts down now for over 6 hours!

Posted: Sat Sep 03, 2016 3:06 am
by lakrsrool
I'm not sure where we get support anymore. :?

I have 7 accounts, all of them have been down for over 6 hours as of now. :(

I see nothing in Twitter or FB other than "Be sure your email client is using - port 587 STARTTLS for SMTP." which I've done as of yesterday (9/1/16) when these messages were posted and see nothing here in the forum on this current situation.

Please keep us up to date, I do see "offline/maintenance" status for ports 143/993 on the "status page", but then it's not clear that this would cause all accounts to be down and if this is the case then we need more clarity on this. It is nice to have this "status page" so thank you for that, but the question remains; if we see "offline/maintenance" or perhaps it is when we see the "red signal" light that informs us that our accounts are down. Please clarify and it would be appreciated to be notified somehow, especially when downtime takes this much time, preferably here in the forum, but if not then in either/both FB and Twitter.

Bottom line, it would be greatly appreciated if we can please have more transparency in some way regarding issues of this kind?

Thanks in advance. :D

Full system update

Posted: Mon Sep 05, 2016 2:10 pm
by Havokmon
To everyone - I apologize for the lack of updates - there honestly hasn't been too much to update. What's been posted publicly (server is having hardware issues, we're moving to a new one), has not changed - the only thing that's changed is the frequency of crashes. Which unfortunately has increased. That increase has kept pushing back the switch.

I was able to mask the hardware issue by modifying memory allocation to read caching, but that really didn't fix the problem. There's an issue with a drive or controller that is causing driver timeouts, resulting in OS crashes.

I've been trying to avoid long downtimes during the migration, and things have just gone very poorly. The data replication has been frustratingly slow. Typically a migration goes like this:
1. Snapshot, replicate.
2. Snapshot, replicate.
3. shutdown delivery
4. Snapshot, replicate.
5. IP Change

Unfortunately even the 'snapshot' has been taking up to 30 minutes. That's just wrong. It's normally instantaneous. The final replicate of 2Gb of data took 5 hours today.
So it was slow going. Throw in random crashes during that 5 hour window, and hopefully you can understand the delay.

In any case, we're on the new server now. The old server will be rebuilt with new drives. new controllers, more memory
AND a fresh OS install. The new OS will allow us to run a 3rd party application to enable hot active/active replication between servers. That will allow us to IMMEDIATELY switch to a backup server at the first sign of trouble.

I appreciate everyone's patience and support