devoops

In this session, I’ll share about an urgent DevOps challenge that happened when we realized critical system credentials were days away from expiring without a clear plan for renewal.

Some of the people that owned them were already on vacation, and we didn’t know the full extent of how many applications would be affected, who could make the changes, what changes needed to be made, etc.

This was about collecting data, constant communication, and making information visible.

I want to tell the story about how the incident was first noticed, how was the process to discover what needed to be done to fix it, communicating and coordinating the fixes, making sure it was fixed and then planning, scheduling and facilitating a blameless post-mortem that generated organizational learnings.

The incident response shows essential principles of DevOps including continuous improvement and effective communication within a technology company, used to prevent disaster for this occasion, and for future scenarios!

Link to the event.