Troubleshooting and Recovery
Pi 5 Vault Node — Raspberry Pi 5 hosting a 5×10TB RAID5 array mounted at /media/plutus, exposing a single Samba share Cornucopia, and running the arr stack natively (Sonarr, Radarr, Prowlarr, qBittorrent‑nox, NZBGet). This runbook covers detection, containment, recovery, verification, and post‑incident documentation for the most common failures.
Quick Incident Checklist (30‑second view)
-
Identify subsystem: RAID / drive / filesystem / service / Samba.
-
Isolate if filesystem corruption or rebuild required: stop arr services.
-
Diagnose with
mdadm,smartctl,journalctl,getfacl. -
Act using the steps below for the specific scenario.
-
Verify with the verification commands.
-
Document commands run, timestamps, and outcomes.
Commands — Verification (run after fixes)
bash
# RAID health
sudo mdadm --detail /dev/md0
cat /proc/mdstat
# Mount & usage
df -h /media/plutus
mount | grep md0
# Services
systemctl status sonarr radarr prowlarr qbittorrent-nox nzbget
# Samba & ACLs
sudo testparm -s
getfacl /media/plutus | head -n 20
Scenario Playbooks (copy/paste)
1 Drive failure (detected via SMART or mdadm)
Detect
bash
sudo mdadm --detail /dev/md0
sudo smartctl -a /dev/sdX
Mark failed and remove
bash
sudo mdadm --manage /dev/md0 --fail /dev/sdX
sudo mdadm --manage /dev/md0 --remove /dev/sdX
Replace, partition, add
bash
sudo sgdisk -Z /dev/sdX
sudo sgdisk -n1:0:0 -t1:fd00 /dev/sdX
sudo mdadm --add /dev/md0 /dev/sdX
watch -n 10 cat /proc/mdstat
Safety: double‑check device node before failing/removing.
2 Array degraded but drive healthy
Checks
bash
cat /proc/mdstat
sudo mdadm --detail /dev/md0
dmesg | grep -i sata
Actions
-
Reseat cables; reseat drive on HAT.
-
If present but inactive:
sudo mdadm --add /dev/md0 /dev/sdX.
3 Corrupted filesystem
Take offline
bash
sudo systemctl stop sonarr radarr prowlarr qbittorrent-nox nzbget
sudo umount /media/plutus
Repair
bash
sudo fsck.ext4 -f /dev/md0
sudo mount /media/plutus
df -h /media/plutus
Safety: do not run fsck on a mounted filesystem.
4 Service fails to import (Sonarr/Radarr)
Check
bash
ls -ld /media/plutus/downloads /media/plutus/Media
getfacl /media/plutus/Media
journalctl -u sonarr -n 200
Fixes
-
Ensure services are in
mediagroup and systemd units haveUMask=0002. -
Reapply ACLs:
bash
sudo setfacl -R -m g:media:rwx /media/plutus
sudo setfacl -R -d -m g:media:rwx /media/plutus
- Confirm Samba
force user = plutus.
5 Samba permission mismatch
Verify
bash
sudo testparm -s
getfacl /media/plutus
Reapply
bash
sudo chown -R plutus:plutus /media/plutus
sudo setfacl -R -m g:media:rwx /media/plutus
sudo setfacl -R -d -m g:media:rwx /media/plutus
sudo systemctl restart smbd
smbclient -L localhost -N
Note: Changing force user will affect all clients.
Post‑Incident Documentation (minimum)
-
Timestamp start/end of incident.
-
Commands run (copy/paste).
-
Outputs (attach cropped evidence lines).
-
Root cause and fix applied.
-
Follow‑ups (drive replacement, monitoring changes, backup verification).
Evidence to include in report (minimal, high‑signal)
-
sudo mdadm --detail /dev/md0— key lines (raid level, size, state, active devices, chunk). -
df -h /media/plutus— one line. -
mount | grep md0— one line. -
[Cornucopia]stanza from/etc/samba/smb.conf. -
systemctl statusone‑line per arr service. -
One‑sentence SMART summary: “All drives passed SMART with 0 reallocated/pending sectors.”