Wednesday, January 02, 2019

Still not getting through to that guy

This morning, returning to the salt mines, I noticed that one of our monitoring tools was showing an error condition that I knew was not strictly correct.  When cow-orker arrives, we discuss, and determine that it's an artifact of the significantly different rates of monitoring and reporting of that metric.

So cow-orker and I decide it would be good to make it configurable, so that we can have it only report if it's consistently bad.

Cow-orker makes code change, tests in isolation, deploys to the one machine where the problem was manifesting.

Cut to an hour later, when second cow-orker reports that he is suddenly seeing malformed messages arriving at his client.  We look, and the time coincides with cow-orker One deploying his changes.  But he's off at an early lunch, so we put it aside.

After lunch, I pass One in the hall and mention it.  He says that Two had mentioned it to him, but that they were seeing the errors from another machine, which had not been altered, so it could not possibly be his code.

So when I return to the pod, I ask Two, and he shows me the single, solitary error from another machine, and the ongoing several in a minute stream from the suspect machine.  I show this to One, and he then looks and realizes he did not pull the repo before making his changes, so his binary reverted a week's worth of fixes.  Off he goes to pull, re-implement, and redeploy.

Sheesh, you'd think that this 4th time he'd start to learn that at least he should at least just so "Ok, I'll check it out" before trying to dodge blame.





No comments: