Friday, April 23, 2010

On IBM's onsite support

IBM was called to our datacenter to replace a tape drive for a collocated IBM Power i520 box. We have several BladeCenters of our own, and quite a few blade servers. Anyway, 2 IBM Engineers came in to replace the Ultrium tape drive (here come the "how many engineers does it take... jokes"). They then proceeded to pull the serial numbers off our Blades and then call them in (without permission) to find that there is no IBM Hardware Maintenance coverage.

Then they started badgering our engineer as to who to talk to about the coverage - even though we explicitly told them we keep plenty of spares and don't need coverage on the old blades - it would cost more than the blade itself.

They had no business touching or pulling the serial numbers on those Blades. They are our property, not the clients', and just because they were called in for a colo box does not mean they can touch everything else.

The consultant we were working with actually filed a complaint to IBM, and the response from the IBM manager was the following:

He said they did nothing wrong and that once they were allowed in the cage, any equipment within was fair game to them. He kept asking why we had a problem with them checking the serial numbers. His attitude was very surprising.

When asked how to prevent this in the future since they are only ones to service the area, he said that when an IBM Engineer is given access to the cage he should be instructed that he is to service only the piece of equipment he was called in for and that he is not given permission to the other equipment in the cage.

Epic fail. This is exactly why we don't buy hardware maintenance coverage from IBM.

On efficient solutions and low-priority cases

Here's a fine example of helpdesk efficiency taken from a real ticketing system.

Day 1. Issue created by Tier1 and escalated to Tier2. Priority set to low.
Description: "JPEGs not opening with Office Picture Viewer (set as Client's default) when opened from email. Viewer opens, but displays x'ed out thumbnails instead of the actual image. Windows Picture and Fax viewer opens JPEGs fine from emails, as does OPV when opening JPEG from desktop. Could this be another terminal server server registry issue?"

Day 2. Issue looked at by Tier2 and assigned to a tech.

Day 5. Issue de-escalated from Tier2 with a comment "Is this still an issue?"

3 hours later. Issue re-escalated to Tier2 with a screen shot and a comment: "YouTellMe.jpg and check your inbox."

Day 12. Issue de-escalated back to Tier1. "It's working fine for a test user on [server name], what server is this happening on? Check where IE is storing the temporary intenet files, should be to their my documents folder."

9 minutes later. Issue re-escalated to Tier2: "NOT A FIX! Even if this is the case, we will get calls from multiple users and not know they have a problem until they call. This needs to be implemented globally. Also, this doesn't explain why the client only started experiencing this after our last rollout of new servers."

Day 16. Issue de-escalated back to Tier1: "This IS the fix. This is currently, as far as I am aware, the only user that has had this issue. Also, we don't know that other user's may have moved their temporary files save location and if we globally change it then their's won't work. If it changes back after log out and back in then that is one thing, but making a global change for one user's problem is not a solution. Did it work? If so then we have resolved the issue."

45 minutes later, re-escalated to Tier2: "That is not the only user. I had the same issue, and I hadn't made any changes to my temp internet files until you told me to. What about [another_username]?"

Day 24. Issue de-escalated to Tier1 again. "What about him, is he having the same issue? Have you tried doing what I told you for him and did it work?"

20 minutes later. re-escalated to Tier2: "The point is that is not the only user. I've moved a few others' folder location to see if that helped. Besides, who would move their temporary internet files? Per [tech_3], this is an easy global fix."

6 minutes later. [tech_3] comment: "Lemme know what the fix is exactly and we can blanket out the changes needed to all users."

Day 25. Original Tier2 owner replies: "Need to move their Temporary Internet Files to their My Documents. But this only seems to be an issue, from what has said, for users using Microsoft Office Document Imaging and seems somewhat random as I didn't have the issue with the test user I created."

1 hour later. [tech_3] replies: "I would change the location of the temporary internet files for a test user and monitor the registry changes. We can then use that to create a blanket REG file. Let me know if you need my help"

2 hours later. Tier2 owner de-escalates the incident: "This is not going to be able to be done globally and will have to be done on a per user instance. If this has been done for those that have needed it, ie. those who have complained about this particular issue, then close the ticket. It does not need to come back to Tier2 again."

1 hour later. Issue re-escalated to Tier2: "[tech_3] just explained why this can't be implemented globally (encrypted DAT file). This ticket could have been put to rest a long time ago, had I known that. Instead, I was hearing from that it was an easy blanket fix, but meeting ambiguous resistance from Tier2 every time I pushed it up."

Same day. Incident resolved. Tier1 complains about Tier2 to Tier3. Tier3 immediately sends everyone involved an e-mail and re-opens the incident after 5 minutes of identifying what is going on using Procmon: "Change the OutlookSecureTempFolder key to a different location. Do NOT redirect the entire temporary internet files folder to a network drive!!! We want to keep it off the network, not on the network. Revert all changes!

[HKEY_USERS\\Software\Microsoft\Office\11.0\Outlook\Security]
"OutlookSecureTempFolder"="C:\\Documents and Settings\\\\Local Settings\\Temporary Internet Files\\""

45 minutes later. Tier3 comments: "If this is a widespread problem, put it in the login script or something. Just make sure the folder exists - make the script check for existence, and if it doesn't exist, create something like [program_drive]:\Temp with the script as a hidden folder with no execute rights. Don't use the [document_drive] for this type of things - it is purely for redirecting"My Documents"."

Day 26. Tier3 discovers that redirecting temporary internet files for those users broke a few unrelated apps. Changes are rolled back manually for affected users.

Day 40. Incident still sitting in Tier2 queue.