- Final Update 13.Jan.2012 @ 4:00 PM:
We have been monitoring the server Barricade (mail77.safesecureweb.com) over the last 24-hours and we have seen no further issue with mail. The spooling issues are no longer a problem and mail delays are within normal expectations of sending and receiving email.
All connection issues that were experienced by 3rd party mail applications such as Microsoft Outlook should no longer be receiving error messages and mail should be sent and received through the system without issue.
We do appreciate your patience in this matter and as always, if you do have any further questions, please feel free to contact our support staff.
——————————————————————————————-
- Update 12.Jan.2012 @ 3:30 PM:
We seem to have found a few hard drives from the same batch which were previously not known to be bad and the spool grew rapidly around 10:30 AM. We pulled out the drives from the suspected bad batch and performance on the disks improved quite rapidly. The spool was halved in about 15 minutes and has been fairly stable most of the day. Mail has been sending, albeit slowly.
The connection issues are still plaguing the server and in an attempt to fix this, we switched network traffic to a NIC that was installed (but not enabled) last night. We didn’t switch it then because at the time, mail77 seemed more stable. If you are finding MS Outlook, Mail or your mail program of choice getting connection errors please try logging into webmail. Webmail can be accessed by going to http://mail77.safesecureweb.com or to http://mail.[your_domain.com].
We have a robocopy running on the live hardware to copy the mail folders, etc to the new hardware. We also placed a new drive from a different batch of hard drives into the live hardware to begin rebuilding one of the arrays and that is progressing at a reasonable pace, but not hyper-fast because the server is live and e-mail takes a significant amount of read/writes to a hard disk. The RAID arrays on the new hardware are also rebuilding. Both are expected to finish overnight. If the arrays on the live server are stable, the server will remain on the live hardware until next Thursday when the version of SmarterMail is scheduled to be upgraded. If it still proves to be an issue, we will be cutting over to the new hardware tomorrow.
——————————————————————————————-
- Update 11.Jan.2012 @ 9:00 PM:
We have done a RAM upgrade to mail77 and made some suggested changes to the BIOS.
All spooled mail from earlier today was put back and sent out within 5 minutes.
——————————————————————————————-
- Update 11.Jan.2012 @ 5:54 PM:
Performance is still degraded on the old, live hardware. The spool is still a bit high but the server is sending messages. We are going to do a copy of the live data to the new hardware tonight, verify the RAID card has updated firmware to ensure it can handle the new drives and switch over to the new hardware overnight.
——————————————————————————————-
- Update 11.Jan.2012 @ 2:27 PM:
The spool has been growing but the mail server is processing messages, albeit slowly. We’re looking at what settings we can tweak temporarily to help with the I/O. We’ll be restarting SmarterMail a few times. Customers using POP3 are reporting errors connecting with mail.[your domain name here] in MS Outlook / Apple Mail, we’re finding changing mail.[your domain name here] to mail77.safesecureweb.com is allowing connections to affected customers.
——————————————————————————————-
- Update 11.Jan.2012 @ 12:21 PM:
The old hardware has been online since approximately 11AM and sending mail. The spool is holding steady.
——————————————————————————————-
- Update 11.Jan.2012 @ 10:45 AM:
“Hey, what’s going on with my e-mail??”
Mail77 was migrated to new hardware last night as previously reported. Mail service on mail77 is increasing load to the point where I/O is overloading a previously unreported hard disk error. That disk was causing the raid array to throttle down in the original hardware so the disks in the array could keep parity of data, i.e. stay synchronized. Because the drive was not reporting as failing or even erroring in the old hardware, when drives were chosen to move to the new hardware and rebuild the RAID array, the poorly performing drive was unknowingly moved. This means indexing, rebuilding of the RAID array and the very high I/O of a mail server have been causing performance on the new server to slow/stop basically every 10 – 20 minutes.
“Ok, so what are you going to do to get my mail back up quickly?”
The old server still has synchronized data and we are going to put the old server back online. Some users will not see mail from the past 10 – 12 hours. We are going to put that server back online, let the new hardware rebuild its RAID array, then synchronize/merge data. This means people logging into webmail and using IMAP on mail77 will not see messages from the past 10 – 12 hours because IMAP and webmail leave a copy of the messages on the mail server. POP3 users will be unaffected because POP3 downloads e-mail to your computer and does not leave a copy of e-mail on the mail server unless otherwise specified.
“What about tomorrow and going forward?”
The above decision was not made lightly and we are making every effort to get mail service up and running on the new hardware because it is impacting for everyone on that server. With the old server up and running, performance will be slow but it will work. It is a big upgrade for hardware, much better processor/more ram and what we thought at the time were better disks. Basically, our shared admin team hasn’t been working on anything else other than this for the past two days.
——————————————————————————————-
- Update 10.Jan.2012 @ 10:30 PM:
During the migration of Mail77 to new hardware, all mail was delivered to the original server. Once the new server was brought online and functioning correctly, we cut over to it. We are currently migrating all of the spooled messages from the old server to the new one at a rate of about 1000/10 minutes.
——————————————————————————————-
- Update 10.Jan.2012 @ 9:00 PM:
Mail75 has had all spooled mail fed back through and sent out. This server is now functioning normally.
Mail77 has been fully migrated to new hardware and spooled messages have been fed into the active spool for the last ~1 hour and will continue until all messages have been delivered
——————————————————————————————-
- Update 10.Jan.2012 @ 5:15pm:
spooled mail on Mail75 is being slowly fed back into the spool for delivery. new mail is delivering normally.
——————————————————————————————-
- Update 10.Jan.2012 @ 4:42 PM:
We are migrating mail77 to new hardware.
——————————————————————————————-
- Update 10.Jan.2012 @ 4:05 PM:
We are rebooting mail77. This should take a few minutes.
——————————————————————————————-
- Update 10.Jan.2012 @4:02 PM EST:
To verify if your mail is currently on one of these servers, please do the following…
Ping your mail IP address by opening a command prompt and type
ping mail.yourdomain.xyz
If the results return with either “208.112.71.220″ (mail75) or “204.12.14.36″ (mail77), you are on one of these servers. Mail on mail75 is currently delivering as we are moving into mail back into the spool to deliver. Mail77 is still performing in an underwhelming manner. These are definitely our top priorities. Unfortunately, the disk I/O on mail77 is taking a long time to narrow down to one or two single causes.
We do thank you for your patience thus far. Additional updates will be posted as soon as we have more information.
——————————————————————————————-
- Original Post 10.Jan.2012:
Hello,
The mail spools on mail75 and mail77 are abnormally high and while they are sending mail, there is a significant delay. Our shared admin team is working on resolving the spooling with these servers. We will provide an update when one is available.