How to Monitor Your Full Infrastructure Stack — Our Complete Guide
Most teams find out about outages the wrong way — a customer calls, or someone notices during a routine check. Our new 18-minute guide covers all nine monitoring layers so you know before your users do.
We've published a new resource: How to Monitor Your Full Infrastructure Stack — a comprehensive, no-fluff guide for IT professionals and MSPs who want complete visibility across every layer of their infrastructure.
The guide grew out of a pattern we kept seeing: teams that were monitoring carefully still getting blindsided by outages. The common thread wasn't carelessness — it was gaps. A server responding to pings while its web application returns 500 errors. An SSL certificate expiring silently until browsers start blocking users. A mail server IP getting blacklisted without a single obvious symptom until email delivery collapses. Each failure mode requires a different detection method, and most monitoring setups cover only a subset of them.
The guide walks through nine monitoring layers in order:
Layer 1 — Network Connectivity (Ping): The baseline health check every host should have. We cover what ICMP monitoring catches, what it misses, and why it should never be your only check for a critical service.
Layer 2 — Service Availability (TCP Port): Confirms that the service process is actually running and accepting connections. A server can respond to pings while its database, web server, or mail service has crashed. We include a reference table of the most important ports to monitor — from port 22 (SSH) to 6379 (Redis).
Layer 3 — Application Health (HTTP/HTTPS): Makes a real HTTP request and evaluates the response code and time to first byte. This is where you catch application crashes, slow deployments, and broken reverse proxies that ping and TCP checks miss entirely.
Layer 4 — Content Integrity (Keyword): A 200 OK doesn't mean your site is working. Keyword monitoring checks the response body for a known string — catching maintenance pages masquerading as live content, broken deployments that load a shell without content, and database failures that stop dynamic rendering.
Layer 5 — SSL Certificates: Expired certificates don't just break HTTPS — they lock users out entirely without warning. The guide covers certificate expiry monitoring, chain validation, and why the 30-day default alert window often isn't enough.
Layer 6 — Domain Health (DNS): DNS hijacking, misconfigured records, and propagation failures are among the hardest outages to diagnose when you're in the middle of one. We cover what to monitor, how often, and the specific failure modes that catch teams off guard.
Layer 7 — Email Infrastructure (SMTP): For organisations that send transactional email, an unmonitored mail server is a liability. We walk through TCP port checks on SMTP and submission ports and how to verify that your mail server is accepting connections before customers notice it isn't.
Layer 8 — Reputation (Blacklist): A single IP blacklisting can quietly destroy delivery rates for days before anyone connects the dots. This layer covers DNSBL and SURBL monitoring across the 16 most widely used zones, and how to set appropriate check intervals that don't abuse the DNS infrastructure.
Layer 9 — Internal Events (Log Files): Application logs are the earliest warning system for failures that haven't surfaced yet. We cover structured log monitoring for error rates, authentication failures, and resource warnings — the signals that appear minutes or hours before an outage becomes visible to users.
Beyond the nine layers, the guide includes a section on alerting strategy — how to set thresholds that balance speed with noise, when to use consecutive-failure requirements versus immediate alerts, and how to route notifications so the right person is reached without flooding everyone. There's also a dedicated MSP section covering multi-tenant monitoring, agent-based distributed checks, and white-label reporting considerations for service providers managing infrastructure on behalf of clients.
The guide closes with a monitoring baseline checklist — a structured list of checks organised by layer that you can work through against your current setup to find coverage gaps. It's the most practically useful part of the guide and the section we recommend starting with if you're auditing an existing environment rather than building from scratch.
Everything in the guide can be implemented directly inside MyMonitor365. Where a monitoring type is available — ping, TCP, HTTP, blacklist — we link to the relevant setup steps in the app. Where a type is on the roadmap, we note it so you can plan ahead.
The full guide is free — no account required. Read it alongside your monitoring setup and use the baseline checklist at the end to audit your current coverage.