OSAF/Chandler outage report for 2009-04-28
Wednesday, April 29th, 2009We’re back! The entire OSAF/Chandler world was offline for about 24 hours from April 27th at 2pm to April 28th. This outage has been fixed with no permanent damage.
The root cause was “sparking” (ouch) on the power lines reported by PG&E. Our colo hosts, Hosted @ ISC, switched over to their generator as a precaution but the generator failed. ISC started shutting down machines when they knew the outage would exceed their UPS power.
Here’s where our outage stretched out longer than it needed to. ISC didn’t let us know they were powering down the machines, or that power was back up a couple hours later. They responded to my 3pm inquiry to their ticketing system late on the 27th, saying that everything should have been fine hours ago. Unfortunately, I had already gone to bed, so the OSAF/Chandler outage had to wait out the night.
In the morning of the 28th, I asked ISC to hit the power buttons on the machines for me, but nothing happened. I packed up quick tech kit plus some spare machines and hopped in the car for the hour drive to Redwood City. On-site around noon, I confirmed the machines appeared dead. Weird. Hopefully the power issues hadn’t killed all 4 physical machines’ power supplies or motherboards at once, right?
Turns out, our managed power device that lets us turn machines on and off remotely had taken a header. I started moving machines’ power around, finding out by the end which ports on the power switch were dead. The Hub machine moved off the power switch entirely.
That would have all be relatively simple, except for the extended duration, but there were other post-shutdown details that smacked me around for a while. I found that the tightened DNS configuration implemented back when DNS security went crazy kept our DNS machines from answering questions from themselves. I kept production machines like Hub off until I had DNS sorted out.
But far worse was a surprise related to the Debian Etch to Lenny upgrade. Since Lenny was released a few months ago, I’ve been upgrading OSAF machines in the background. I don’t know how I missed this, but the upgrade can remove the package used to bring up networking (ifupdown). I had been breaking the cardinal sysadmin rule of always rebooting machines after upgrading the OS, so I was very very confused when some machines came back up without any networking. Including our primary DNS server. And the secondary DNS server seemed broken because of the DNS config error I mentioned above.
I shot myself in the foot even more because before I realized all this, I had decided that the already-extended outage “seemed a good time” to upgrade the Hub from etch to lenny. After the reboot, no networking! Gah, it was just working fine before the upgrade!
So, after getting a grip on what was going on, tracing through networking startup scripts, and tracking down the missing ifupdown package issue, I go bumming through the ISC office for a USB key. I use one of the existing machines to pull the ifupdown package (both i386 for the virtual machines and amd64 for the Hub) using a command-line web browser, troll through syslog to figure out what device to mount, and get these packages onto their needed machines. Luckily, this all works and all machines come back up.
All’s well that ends well I suppose, but this was a tough outage to swallow after the 9 hours from the big fiber cut less than a month ago. Natural questions like “is our hosting good enough?” and “should we move?” come up.
My view is that weird things happen in every hosting environment; moving is not a cure-all for reliability issues. Our reliability isn’t as good as we’d like or as good as could be achieved, but I feel it’s still better to sit tight. The main reason is cost: hosting the Hub consumes a good amount of bandwidth (about 8Mbps) and ISC is providing all services (space, power, bandwidth, remote hands) for free for 9 rack units worth of equipment. It would be possible to move some services like mailing lists, code repositories, and wikis to other free services and someday the community may choose to go that route. But it’s nice to not have restrictions on capabilities or capacity that we get from hosting almost all of our own services; many open source projects would love to have access to the resources and flexibility that OSAF enjoys.
Overall I think we’ve just had a spate of bad luck and while a very rare 24-hour outage might be unacceptable to a commercial venture, as long as they stay very rare, they are acceptable to OSAF and Chandler communities and very worth the tradeoffs.
There are three items I want to undertake as a result of the outage:
- Place a DNS secondary outside of ISC
- Print out some phone numbers, IP ranges, and other “might need it offline” info
- Talk to ISC about coordinating during outages
As a final note, it turns out I could have determined that “everything should be fine now” by going to status.isc.org where there was some information about the outage, including when it ended. I hadn’t been checking that page during the outage but I certainly will (as well as using the phone as needed) during any future outages.
free viagra
buy viagra online
generic viagra
how does viagra work
cheap viagra
buy viagra
buy viagra online inurl
viagra 6 free samples
viagra online
viagra for women
viagra side effects
female viagra
natural viagra
online viagra
cheapest viagra prices
herbal viagra
alternative to viagra
buy generic viagra
purchase viagra online
free viagra without prescription
viagra attorneys
free viagra samples before buying
buy generic viagra cheap
viagra uk
generic viagra online
try viagra for free
generic viagra from india
fda approves viagra
free viagra sample
what is better viagra or levitra
discount generic viagra online
viagra cialis levitra
viagra dosage
viagra cheap
viagra on line
best price for viagra
free sample pack of viagra
viagra generic
viagra without prescription
discount viagra
gay viagra
mail order viagra
viagra inurl
generic viagra online paypal
generic viagra overnight
generic viagra online pharmacy
generic viagra uk
buy cheap viagra online uk
suppliers of viagra
how long does viagra last
viagra sex
generic viagra soft tabs
generic viagra 100mg
buy viagra onli
generic viagra online without prescription
viagra energy drink
cheapest uk supplier viagra
viagra cialis
generic viagra safe
viagra professional
viagra sales
viagra free trial pack
viagra lawyers
over the counter viagra
best price for generic viagra
viagra jokes
buying viagra
viagra samples
viagra sample
cialis
generic cialis
cheapest cialis
buy cialis online
buying generic cialis
cialis for order
what are the side effects of cialis
buy generic cialis
what is the generic name for cialis
cheap cialis
cialis online
buy cialis
cialis side effects
how long does cialis last
cialis forum
cialis lawyer ohio
cialis attorneys
cialis attorney columbus
cialis injury lawyer ohio
cialis injury attorney ohio
cialis injury lawyer columbus
prices cialis
cialis lawyers
viagra cialis levitra
cialis lawyer columbus
online generic cialis
daily cialis
cialis injury attorney columbus
cialis attorney ohio
cialis cost
cialis professional
cialis super active
how does cialis work
what does cialis look like
cialis drug
viagra cialis
cialis to buy new zealand
cialis without prescription
free cialis
cialis soft tabs
discount cialis
cialis generic
generic cialis from india
cheap cialis sale online
cialis daily
cialis reviews
cialis generico
how can i take cialis
cheap cialis si
cialis vs viagra
levitra
generic levitra
levitra attorneys
what is better viagra or levitra
viagra cialis levitra
levitra side effects
buy levitra
levitra online
levitra dangers
how does levitra work
levitra lawyers
what is the difference between levitra and viagra
levitra versus viagra
which works better viagra or levitra
buy levitra and overnight shipping
levitra vs viagra
canidan pharmacies levitra
how long does levitra last
viagra cialis levitra
levitra acheter
comprare levitra
levitra ohne rezept
levitra 20mg
levitra senza ricetta
cheapest generic levitra
levitra compra
cheap levitra
levitra overnight
levitra generika
levitra kaufen

