Issues
Regarding Yesterday's Outage
Yesterday most (likely all) of our customers experienced an outage on their websites and virtual private servers. The reason for this was that our primary nameserver, ns1.bryght.com, was compromised and subsequently used to scan for vulnerable machines. The scanning overwhelmed our firewall with a very high number of small packets. This caused the load on our firewall to increase to the point where it was very slow to respond to legitimate traffic. The DNS server has since been rebuilt and services have returned to normal.
No other system was compromised and no private customer data was released.
We recommend that If you have not already configured your domain to take advantage of our third DNS server (ns3.bryght.com) that you do so now. ns3.bryght.com is located in a completely different data centre and as such provides additional redundancy. We are working to provide even more redundancy than that, and we thank everybody for their patience yesterday.
(The outage was not related to yesterday's increased traffic on the Internet due to the inauguration ceremonies in the United States. Though some people did privately make the connection, it was completely coincidental.)
Bryght Light to Change IP Address Tomorrow at 2 PM Pacific
Summary:
- If your DNS is already set up to be a CNAME to point to horse.bryght.com, then no action is required.
- If your DNS is set up to be an A record, then change it to a CNAME as per #1 if you can. If you can't, change your A Record from the old IP to the new IP Address below.
Old IP address:
- 209.31.179.218
New IP Address:
- 64.55.119.47
Bryght Hosting is changing the IP address of the Bryght Light hosting cluster because our current IP address provider is experiencing financial difficulties. This means that the IP address associated with the horse.bryght.com DNS name will be changing. If you have your site configured in the recommended way, i.e. you have a CNAME Record that points to horse.bryght.com, you don't have to make any changes and everything should work fine.
However, if you have setup an A Record to point to the IP address directly, you will have to change this record by contacting your DNS server and changing to the new IP address of 64.55.119.47.
We will be switching the IP address tomorrow September 24th, 2008, in the afternoon at 2 PM Pacific Time. Please contact us at support@bryght.com if you have any questions. Thank you and we apologize for the short notice and for any inconvenience.
Restoring Google Analytics statistics tracking
A few people have noticed on their sites that Google Analytics is no longer tracking users, or rather, it's not tracking all visits to the site. We recently upgraded the Google Analytics module to the latest official release, which added some features as well as permissions. To restore tracking, you may need to update your site's settings:
- go to Administer » User management » Access control and give your role the "administer google analytics" permission.
- then go to Administer » Site configuration » Google Analytics, and set which roles you want to track. If you want to track non-logged-in visitors to your site, make sure you include the "anonymous user" role.
We updated our documentation on statistics tracking as well to reflect the changes in the module.
All Servers Powered Up After the Move
Firewall Successfully Reset
Firewall Reset Soon, Expect Very Brief Downtime
Forums Temporarily Closed Due to Spam
The forums at support.bryght.com were getting hit pretty hard with spam over the last couple of days, so I took the executive decision to temporarily close them. I'm committed to working on documenting what's happening so that we can feed back to the Mollom anti-spam project (see also mollom.com) so that automated posters are dealt with. If you have support issues that you would normally post to the forum, until we can resurrect them, please use the contact form and we'll look at your issue via our email ticketing system.
Database Maintenance On Newer Sites Last Night; Issues With Image Module
Last night Roland ran the database maintenance on newer hosted service sites powered by Drupal 5. We are looking at doing another pass soon, because today we noticed there are a few other things about the newer databases that we can tune up.
While some sites are singing along happily today, we did notice that sites that use the Image module heavily, especially those that use the Random Image block, are suffering performance issues. Today we upgraded the version of the image module to its latest official release, which seems to have fixed an issue with thumbnailos, but I'm not convinced that it improved performance much. Specifically, sites that use the Image module in Drupal 5 are experiencing Internal Server 500 errors, but for reasons different, I suspect, than database performance issues that plagued us in the recent past. We are continuing to investigate this specific issue.
So Far So Good: Database Maintenance On Newer Sites Tonight
After running the database maintenance, details of which are below, on over 350 sites over the weekend, we noticed an increase in speed for our flagship site bryght.com (its database was reduced to 1/10 its original size!) as well as some of the sites we targeted for special attention. We're still waiting for word from those we've contacted specifically, so until we hear from them, we're not ready to say what we've done has successfully rooted out the 500 error problems people have been seeing on the hosted service. (We're waiting to hear from them as they are the people who watch over the site the most.) This evening we'll run the database maintenance on our newer sites, but we don't expect to see as much improvement as we did with the older sites, partly because the newer sites handle their individual databases better, and because at the outset we had the "Database Logging" module disabled by default. The equivalent module on older sites was filling up databases, making them run slowly for many operations.
A note on what we did to older databases, which we'll also do for newer sites tonight: we cleaned out the following database tables, which don't contain any content but rather user login sessions and old system logs: accesslog, watchdog, and sessions. Clearing out the last one means everybody on the site will be logged out. We hope that the inconvenience of having to login again is outweighed by a faster-running site overall. About 180 sites will undergo maintenance tonight, but like I say, we don't expect the performance gain to be the same as with older sites.
At any time during the day, please let us know if you encounter Internal Server 500 errors and what you were doing when it happened so we can take a deeper look. We will update this post with a comment when we're done with tonight's maintenance, and follow up with a post tomorrow morning.
We Couldn't Wait: Database Maintenance Underway
We have the database maintenance scripts written, and this morning we tested them out on one of our older sites (that we built for ourselves). We did say that we would do the maintenance overnight, but we were so happy with how fast and smoothly the test went that we decided to run it on 100 active sites on this, a Friday afternoon, and evaluate from there. We make backups on every site before proceeding with the maintenance, so no content will be lost.
If you see something unusual or unexpected on our site, please continue to report it and we'll look at it right away.