Update - this is an old solved question. I've added what I've recently discovered on page 3.
I have an odd problem with the Raspberry Pi 4 + m.2 SSD on Raspbian that runs all kinds of things for me - PiHole including DHCP, Home Assistant, and PostgreSQL are the key ones. When this server goes down my network goes down because of lack of DHCP - until I switch over to router DHCP which mucks up IP address allocation. Home Assistant does all kinds of air conditioning automation, including actively changing how the ducted system works so it doesn't overheat or overcool rooms, but directly adjusting damper positions.
Current state is everything is running in docker containers. I have a six month old system image backup including the OS and the whole file system, and I have nightly backups of key parts of the file system pushed out to S3, including the Docker folder, HA, key OS config files, etc. I turned off syslog to reduce disk usage, which I'm regretting now. At least Pi Hole is working so the network is up.
Last night I modified the hosts file, I added an extra domain to the end of an existing entry. I rebooted to make sure all caches were emptied. When it came up I was getting 10 emails per minute from Home Assistant telling me it couldn't connect to PostgreSQL. I couldn't connect to the server using SSH. The server is in a cupboard in my son's room so I couldn't access it, but I rebooted it by turning power to the cupboard off then on.
At that point it came up, the emails stopped, and I could SSH in. I stopped all the docker containers except PiHole, started PostgreSQL container, and then started Home Assistant. The machine felt very slow, typing was instant but running commands took much longer than usual. top said CPU usage was minimal. Home Assistant came up. Around ten minutes later it stopped responding to SSH again, but Pi Hole is working, and I got an email overnight from apt regarding updates. Backups didn't run last night though, they're restic pushing to S3. In the ten minutes I could access the server I got access to the kernel log, which suggested there's some kind of disk issue (log below).
Question: how would you address this? Any suggestions for how to restore the OS so that I don't have to reinstall from scratch?
My plan is to approach things somewhat like this, obviously stopping if any step works:
- Get the Pi into my office and reboot it to see if it's magically started working
- Plug the SSD into my main Windows PC to check disk health with the manufacturer tool and Hard Disk Sentinel
- Plug the SSD into my Ubuntu PC and run fsck
- Try it again
If that fails to fix things I guess I'll reinstall the OS from scratch, then restore my various files from the last nightly backup.
Feb 22 20:52:38 pi4server kernel: [ 47.037040] br-eace4cfeb730: port 1(veth3249e98) entered blocking state
Feb 22 20:52:38 pi4server kernel: [ 47.037065] br-eace4cfeb730: port 1(veth3249e98) entered forwarding state
Feb 22 20:52:38 pi4server kernel: [ 47.769010] Bluetooth: hci1: Opcode 0x c03 failed: -4
Feb 22 20:52:44 pi4server kernel: [ 53.100876] sd 0:0:0:0: [sda] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [ 53.100908] sd 0:0:0:0: [sda] tag#13 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [ 53.100916] sd 0:0:0:0: [sda] tag#13 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [ 53.100926] sd 0:0:0:0: [sda] tag#13 CDB: opcode=0x28 28 00 23 4c 40 e8 00 02 80 00
Feb 22 20:52:44 pi4server kernel: [ 53.100933] critical target error, dev sda, sector 592199912 op 0x0:(READ) flags 0x80700 phys_seg 80 prio class 2
Feb 22 20:52:44 pi4server kernel: [ 53.101151] sd 0:0:0:0: [sda] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [ 53.101156] sd 0:0:0:0: [sda] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [ 53.101162] sd 0:0:0:0: [sda] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [ 53.101166] sd 0:0:0:0: [sda] tag#14 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [ 53.101167] sd 0:0:0:0: [sda] tag#15 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [ 53.101170] sd 0:0:0:0: [sda] tag#16 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [ 53.101175] sd 0:0:0:0: [sda] tag#15 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [ 53.101175] sd 0:0:0:0: [sda] tag#16 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [ 53.101176] sd 0:0:0:0: [sda] tag#14 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [ 53.101181] sd 0:0:0:0: [sda] tag#16 CDB: opcode=0x28 28 00 02 e5 ae b0 00 00 18 00
Feb 22 20:52:44 pi4server kernel: [ 53.101184] sd 0:0:0:0: [sda] tag#15 CDB: opcode=0x28 28 00 02 8e 56 c0 00 00 38 00
Feb 22 20:52:44 pi4server kernel: [ 53.101184] sd 0:0:0:0: [sda] tag#14 CDB: opcode=0x28 28 00 00 c5 e8 70 00 00 40 00
Feb 22 20:52:44 pi4server kernel: [ 53.101186] critical target error, dev sda, sector 48606896 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 2
Feb 22 20:52:44 pi4server kernel: [ 53.101190] critical target error, dev sda, sector 12970096 op 0x0:(READ) flags 0x80700 phys_seg 8 prio class 2
Feb 22 20:52:44 pi4server kernel: [ 53.101191] critical target error, dev sda, sector 42882752 op 0x0:(READ) flags 0x80700 phys_seg 7 prio class 2
Feb 22 20:52:44 pi4server kernel: [ 53.101218] sd 0:0:0:0: [sda] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [ 53.101227] sd 0:0:0:0: [sda] tag#12 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [ 53.101229] sd 0:0:0:0: [sda] tag#17 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [ 53.101234] sd 0:0:0:0: [sda] tag#12 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [ 53.101235] sd 0:0:0:0: [sda] tag#17 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [ 53.101241] sd 0:0:0:0: [sda] tag#17 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [ 53.101242] sd 0:0:0:0: [sda] tag#12 CDB: opcode=0x28 28 00 04 0c 2b 00 00 00 20 00
Feb 22 20:52:44 pi4server kernel: [ 53.101248] sd 0:0:0:0: [sda] tag#17 CDB: opcode=0x28 28 00 00 09 8c 48 00 01 a8 00
Feb 22 20:52:44 pi4server kernel: [ 53.101249] critical target error, dev sda, sector 67906304 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 2
Feb 22 20:52:44 pi4server kernel: [ 53.101252] critical target error, dev sda, sector 625736 op 0x0:(READ) flags 0x80700 phys_seg 53 prio class 2