Lmst

Every system works perfectly until it meets DNS, timezones, certificates, or humans.
Usually at the same time.
In production.
On a Friday.

Experience is just pattern recognition with better alerts.

#Production #DevOps #SiteReliability #EngineeringHumor #IncidentResponse #OnCall #TechReality #ByernNotes

Hidden network paths shape the cloud. Clarity brings control. Share your thoughts on how you face this drift. #ShadowNetworking #ZeroTrust #PlatformEngineering #CloudSecurity #Kubernetes #DevOps #CloudGovernance #ZeroTrustArchitecture #SiteReliability #DigitalTransformation #CloudStrategy #Security #DevSecOps #ModernIT
https://www.linkedin.com/pulse/hidden-pulse-cloud-how-manage-shadow-networking-sanjay-k-mohindroo--fdtbc

🚀 Tired of slow applications and rising bounce rates?

Even milliseconds matter when it comes to user experience. Our latest guide covers 10 proven APM best practices to reduce latency and improve response time across your entire stack.

Faster apps = happier users = better business outcomes.

📖 Read the full post here: https://www.atatus.com/blog/apm-best-practices-latency-response-time/

#APM #ApplicationPerformance #ReduceLatency #ResponseTime #PerformanceMonitoring #Observability #DevOps #Microservices #SiteReliability #APMT

Today's AWS outage was a stark reminder: what happens when the tools you rely on to manage incidents... are part of the incident?

When Slack, Zoom, PagerDuty, and even Statuspage are impacted, how do you get your response team re-connected to solve the underlying problem? Once they're talking to each other, they can improvise a response, but that first step of re-establishing contact is critical.

This isn't just a hypothetical. It's a real-world scenario that can paralyze even the most prepared organizations. Relying on a plan that's tucked away in a long-forgotten document is a recipe for disaster.

Here's what I recommend to the leaders I advise:

🔹 Have a "Rally Point" Plan: Don't just have a backup concept; have a pre-defined, communicated, and accessible fallback plan. Every second counts in an incident, and you can't waste time figuring out where to communicate. If you normally use Slack and Zoom, then think Google Meet or Microsoft Teams for your backup, and vice versa. Maybe even an old-fashioned conference call bridge. The key is that everyone knows where to go, when the normal places aren't working.

🔹 Make it Accessible: Your plan is useless if it's on a server that nobody can get to at the moment. Laminated wallet cards, a shared password vault with offline access, or a regularly updated file on every employee's laptop are all viable options.

🔹 Practice, Practice, Practice: Fire drills aren't just for fires. Run drills for your fallback communication plan. This ensures everyone remembers it exists and that the mechanisms still work.

🔹 Don't Forget Security: Assume that your fallback channel is compromised, and that outsiders are listening in. Use it just as a rendezvous point to direct responders to more secure, authenticated channels, where you can validate every participant. Don't discuss sensitive information in the open.

Incidents are costly, not just in revenue, but in reputation and team morale. Proactive preparation isn't a luxury; it's a necessity.

What's your team's communication fallback plan? Share your thoughts in the comments below. 👇

#IncidentManagement #BusinessContinuity #SiteReliability #DevOps #AWSOutage

🚀 We recently helped a client stuck on a slow host migrate their Umbraco site to UmbHost — faster, safer, zero downtime.

✅ Free migration assistance
✅ Daily backups with 7-day retention
✅ DDoS protection & Cloudflare CDN
✅ 99.9% uptime guarantee
✅ UK-based expert support

Need hosting that cares? Drop us a message!

https://umbhost.net/hosting/cloud-umbraco-hosting

#Umbraco #Migration #WebHosting #DevOps #SiteReliability

⏳ Downtime costs more than you think — lost sales, frustrated users, damaged reputation.

UmbHost offers 99.9% uptime SLA with UK-based support and certified Umbraco experts.

Typical ticket resolution under 20 minutes.

Want reliable hosting that has your back?

https://umbhost.net/hosting/cloud-umbraco-hosting

#WebHosting #Umbraco #SiteReliability #TechSupport

Strengthen your cloud systems with the top Chaos Engineering tools for DR — AWS FIS, Gremlin, Chaos Mesh, and Steadybit. Learn how to simulate failures, boost uptime, and improve resilience.
📖 https://medium.com/@ismailkovvuru/chaos-engineering-tools-for-dr-aws-fis-gremlin-chaos-mesh-steadybit-184778c3ca10
#ChaosEngineering #AWS #DisasterRecovery #DevOps #SiteReliability #AWSFIS #Gremlin #ChaosMesh #Steadybit #Cloud #Resilience #tech

DevOps friends 🚀 — Here’s a compact guide every AWS engineer needs:
🔍 Learn the real-world impact of HTTP status codes in CI/CD, monitoring, and production troubleshooting.
📚 Must-read: https://medium.com/@ismailkovvuru/http-status-codes-for-aws-devops-engineers-602c93568acb
#AWS #DevOps #HTTPStatusCodes #CloudInfra #Monitoring #development #cloud #SiteReliability

Robert Boedigheimer presents 'Make the Web Faster' July 24th at Nebraska.Code().

https://nebraskacode.amegala.com/

#webdevelopment #WebPerformance #sitescalability #sitereliability #webpagetestorg #fiddler #lighthouse #WebFaster #webdesign #webdeveloper #TechnologyConference #Nebraska #TechConf #TechCommunity #softwaredevelopment #lincolnnebraska

Hannaford's recent weeklong outage has me wondering: Do companies truly understand the cost of cutting corners on engineering talent?
These unacceptably long outages which are more frequently occurring at major retailers highlights a common problem I'm seeing in tech: undervaluing highly experienced & knowledgeable engineers. It's way past time for companies to rethink their hiring priorities... stop cheaping out on your Ops and Sec talent, it's going to cost you far more in the end!
I'm exceptionally good at building reliable & resilient systems & teams, so it's super frustrating to be unemployed while witnessing preventable outages for which I could have made a difference. Yes, it's true, 30+ years of engineering experience doesn't come cheap, but I'm damn sure my price is far less than the loss in revenue from a weeklong eComm outage at a major business!
Anyway, if yer looking for a decent engineer/leader, please reach out...
#open_to_work #engineering #siteReliability #Technology

https://www.mainepublic.org/business-and-economy/2024-11-18/hannaford-website-back-online-after-week-long-outage

No, I did not want to have a system-wide outage this morning, thankyouverymuch 😰

(but we recovered, although not without some sweating. Aren't new and different failure modes fun?)

(no, I'm not an SRE but we're a small shop)

#onCall #siteReliability #SRE

"What should I monitor? Am I tracking the right metrics?" 📈📊
Common industry metrics frameworks provide useful monitoring guidance for #DevOps and #SRE.
Here's a good overview for the different methods:
https://logz.io/blog/evops-sre-metrics/?utm_source=devrel&utm_medium=devrel
#monitoring #observability #sitereliability

No one ever complains about #steam going down or being slow, despite tens of millions of concurrent users at all times. I'd like to know more about how Valve manages that. The service itself is practically transparent. #sitereliability #devops #cloud #CloudComputing #videogames

Life of a SRE. I love this pic by
@attachmentgenie @cfgmgmtcamp .
It only shows how unsustainable this screen gazing approach is, with today's #microservices #cloudnative systems.
Time to revisit your #siteReliability practices
https://medium.com/@horovits/sre-revisited-slo-in-the-age-of-microservices-30c1ff80cb6a
#CfgMgmtCamp #SRE #DevOps

⚠️Massive outage hits Australia's second-largest telecom provider, leaving millions stranded without mobile and internet services. Imagine that's happening to you! Let's explain and try to avoid it:
https://www.relianoid.com/blog/australian-network-failure-millions-of-users-affected/
#TelecomOutage #SiteReliability #RELIANOID #TelecomDisruption #NetworkOutage #TechDowntime #ServiceRestoration #SiteReliabilityEngineering #HighAvailability #TelecomResilience #TechFailures #NetworkReliability #Australia #Australiaattack #outage #vulnerabilities

New #release!! Our new RELIANOID #Community Edition v5.21.0 simplifies your work with a #seamless process to upgrade from Community Edition to Enterprise, without the need of #redeployment.

All the info here:

https://www.relianoid.com/blog/new-release-relianoid-adc-load-balancer-community-edition-v5-21-0/

#RELIANOID #CommunityEdition #ReleaseAnnouncement #SystemUpgrade #GrubConfiguration #FarmsFix #DebianBuster #LoadBalancer #SiteReliability #SRE #EnterpriseUpgrade #Upgrade #LoadBalancing #OpenSource #Infrastructure #ReleaseNotes

Here are the steps to enable #http3/#quic in #caddy:
....

It takes 0, zero, nil lines to enable and configure #http3/#quic in #CaddyServer! You don't need to do anything special to keep up with the industry standard and progress. Caddy takes care of keeping your services up-to-date.

#systemadministration #sysadmin #devops #sre #web #linux #unix #windows #sitereliability