10 Comments

Incidents are a great way to learn and improve! Sounds like manual remediation for multiple days was a miss but I am sure it left a mark on you :)

Expand full comment
author

Yeah, that lesson will stay for a while :)

Expand full comment
Nov 26, 2023Liked by Anton Zaides

Thanks for sharing your story. We have a 3rd party api to draw a chart in an email. So when the schedule task sent the email, emails went with no chart. Of course this didn’t block operations but the business value was lost. The logs reported 429 failure code too.

I wrote directly to the api provider to know their rate limit. I would have even called them. 😀

In another similar incident, api provider black-listed us and that was some fun.

Expand full comment
author

We also had a black-listing problem - in the same incident, after a few days of 429 errors they just blocked us 😅

I didn’t mention it to not complicate the story.

Definitely not fun 😂

Expand full comment

Love this article and the personal experience sharing, Anton.

3rd party APIs have been a pain to deal with and also know how to approach most effectively.

Another place I thought this article might have gone is needing to copy and store our own versions of the data rather and keep it in sync with the 3rd party via background jobs. I've had to work through that in my past experience due to how slow and unreliable the 3rd party was.

Sorry for your weekend :P

Expand full comment
author

You are right, in case the 3rd party API is used for reading data it can be useful.

In this case, it was an application our operations team was usingto manage all the drone flight missions, and it was mainly update calls.

Expand full comment

The manual work decision is a tough one. It’s not easy to decide what’s worth automating versus not. We all know the story of taking 5 hours to build an automation for something that takes 5 minutes to do manually :)

But in this case, it’s a good lesson learned.

Expand full comment
author

In this case, I think I used the ‘risk’ as an excuse to not automate. When properly done, automation is never riskier than manual work… lesson learned 🙃

Expand full comment

Love that you're sharing real-life tech stories. Awesome visuals and thanks for the mention.

Expand full comment
author

Thanks Neo! Your post gave me the push to publish this one :)

Expand full comment