This post challenges misconceptions about chaotic on-call and livesite practices, offering lessons from extensive experience. It introduces common red flags like call hell, hero worship, and the wild west, and provides solutions. These include customer-focused monitoring, monitoring pruning, 1-2-3 troubleshooting rule, follow-the-sun schedules, and repair item deadlines. As services mature, standardized incident response and efficient toil control practices become crucial.
Category: Monitoring
Windows Operating System Metrics: CPU
This is a screenshot I took of my CPU metrics on my computer. This post provides a deep dive into the information contained in the Task Manager panel. The Graph The graph shows a sliding window plot of CPU utilization against time. Utilization: shows how much 'work' is being done by the processor. This includes … Continue reading Windows Operating System Metrics: CPU