This article is more than 1 year old
My top three IT SNAFUs - and how I fixed them
Dave Cartwright shares his war stories
Everyone's had experiences where something just inexplicably didn't work. Or apparently inexplicably, anyhow.
Here are three of mine. What’s yours? Please share in the comments below.
Why is our application so slow?
We're going to need a bigger table
A funky new application (HTML GUI with a SQL Server back end) had been written and tested extensively on the test platform. On the morning of the go-live the head of application services grabbed me and asked if I could help them figure out why it was suddenly going really slowly.
We had a couple of conference calls with the (overseas) developers, neither of which bore fruit, so I asked them to set me up with credentials to the servers and the SQL Server database. 20 minutes later SQL Server Profiler was showing me that every few seconds there was a full table scan of a 1.3million row table.
A bit of digging showed a query that was JOINing umpteen tables for no apparent reason, and missing every index in the process. Turned out that this query was servicing the little “Your current dataset” box that appeared in the corner of every page (and whose content got refreshed every time you clicked to a new page).
It then turned out that the developers had never used either Profiler or Query Analyzer (a quick lesson followed). Didn't take long to chop the query to remove superfluous crap and hit some indexes, and all was well.
Which just left the question of why it was so fast on the test box given that (as the developers insisted) the database was a direct, like-for-like copy of the live one.
So I looked. And it wasn't. The 1.3million row table on the live server was actually a 50,000 row table on the test platform.
The moral of this story: as my dad taught me: never believe anyone, always see for yourself: people are often innocently mistaken, and sometimes they just tell porkies.
Why has our application slowed down again?
(Pic: Peretz Partensky) I think we took a wrong turning, dear
A few months later the same application started to run slowly again – or more accurately it would fail to load pages then work fine when you hit “Refresh”.
We asked what had changed, and the only answer was that the core LAN switch in the data centre had been replaced over the weekend with a chunky new Cisco one. But every other application was working absolutely fine (better than ever, in fact) – so why was this one suddenly hideous?
As “What's changed?” didn't help, we moved onto “What's special about that application?”. All we could think of was that it was in a special DMZ of its own, the other side of a pair of Cisco ASA5100 firewalls. So we ran up Wireshark and noted that there was quite a lot of ICMP redirect traffic flying around … most, but not all, of the time.
A check of the config on the new Cisco switch showed up a setting that limited the amount of ICMP traffic it would forward, to prevent traffic storms. The way the routing worked with the special DMZ involved a lot of ICMP Redirects being generated by the core switch. Hmmm.
Sadly the powers that be had consulted a third party Cisco specialist who, despite not actually looking at anything, decreed that this wasn't the problem.
So we were prohibited from changing the setting, even though the risk was negligible.
Over two months later one of the senior managers finally got bored with me banging on about it and told me I could make the change. Within five minutes the application owner was by my desk: “Did you change something? XX <the application> has suddenly started to fly”.
The moral: look at the evidence and stick with the facts. Persevere and you'll get to make the world a better place in the end.
Gravity is indiscriminate …
(Pic. S Roquette): I think I'll try defying gravity
My first job was in the university holidays with a defence contractor, back in the days when Novell NetWare 2.0a was the network server OS of choice. We wanted to extend the network from the office block at the top of the site down to the factory at the bottom – a few hundred metres away.
A handy building halfway down the hill housed the necessary repeater and we installed an IBM PC AT (remember those?) as the gateway machine in the cupboard by the cable's entry point in the factory.
Worked fine for a couple of weeks, then the gateway machine crashed. We wandered down through the drizzle, implemented a Big Red Switch remedy, and all was well for a couple of days. More drizzle, another turn-it-off-and-back-on-again and up it came.
By the third crash we were getting bored with getting wet so we pulled the gateway machine out of the cupboard to check the connections, ensure the LAN card was seated OK, and so on. At which point we noticed that everything was a bit damp around the LAN card.
Now, Cabling 101 teaches technicians that when feeding cables through holes in external walls, you should always put a small kink so that the cable runs up the wall for a few inches before disappearing into the hole.
Our final moral, then: if you don't put an uphill kink in the wire the rain will run along the cable, through the hole in the wall, down the inside wall and into the network card of the device it's plugged into. It's a gravity thing. ®