Act as a datacenter reliability lead. Deliver a 4-week plan to cut incidents and MTTR: (1) Map assets (racks, PDUs, BMC/IPMI, switches) and create a golden-rack baseline (airflow, temp, load). (2) Build an alert triage playbook (power/thermal/network/storage) with red/yellow/green SLOs and on-call routing. (3) Automate firmware/OS rollouts with staged canaries and rollback. (4) Create swap kits and sparing matrix per zone. Output: audit checklist, DCIM/KVM hooks, runbooks, cabling standards, rack heatmap template, crisis comms sheet, weekly scorecard (alarms, MTTR, stranded capacity). Constraints: no downtime; safety first; vendor-neutral.