Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.


timmmay

20858 posts

Uber Geek
+1 received by user: 5350

Trusted
Lifetime subscriber

#311878 23-Feb-2024 06:56
Send private message

Update - this is an old solved question. I've added what I've recently discovered on page 3.

 

I have an odd problem with the Raspberry Pi 4 + m.2 SSD on Raspbian that runs all kinds of things for me - PiHole including DHCP, Home Assistant, and PostgreSQL are the key ones. When this server goes down my network goes down because of lack of DHCP - until I switch over to router DHCP which mucks up IP address allocation. Home Assistant does all kinds of air conditioning automation, including actively changing how the ducted system works so it doesn't overheat or overcool rooms, but directly adjusting damper positions.

 

Current state is everything is running in docker containers. I have a six month old system image backup including the OS and the whole file system, and I have nightly backups of key parts of the file system pushed out to S3, including the Docker folder, HA, key OS config files, etc. I turned off syslog to reduce disk usage, which I'm regretting now. At least Pi Hole is working so the network is up.

 

Last night I modified the hosts file, I added an extra domain to the end of an existing entry. I rebooted to make sure all caches were emptied. When it came up I was getting 10 emails per minute from Home Assistant telling me it couldn't connect to PostgreSQL. I couldn't connect to the server using SSH. The server is in a cupboard in my son's room so I couldn't access it, but I rebooted it by turning power to the cupboard off then on.

 

At that point it came up, the emails stopped, and I could SSH in. I stopped all the docker containers except PiHole, started PostgreSQL container, and then started Home Assistant. The machine felt very slow, typing was instant but running commands took much longer than usual. top said CPU usage was minimal. Home Assistant came up. Around ten minutes later it stopped responding to SSH again, but Pi Hole is working, and I got an email overnight from apt regarding updates. Backups didn't run last night though, they're restic pushing to S3. In the ten minutes I could access the server I got access to the kernel log, which suggested there's some kind of disk issue (log below).

 

Question: how would you address this? Any suggestions for how to restore the OS so that I don't have to reinstall from scratch?

 

My plan is to approach things somewhat like this, obviously stopping if any step works:

 

  • Get the Pi into my office and reboot it to see if it's magically started working
  • Plug the SSD into my main Windows PC to check disk health with the manufacturer tool and Hard Disk Sentinel
  • Plug the SSD into my Ubuntu PC and run fsck
  • Try it again

If that fails to fix things I guess I'll reinstall the OS from scratch, then restore my various files from the last nightly backup.

 

 

 

Feb 22 20:52:38 pi4server kernel: [   47.037040] br-eace4cfeb730: port 1(veth3249e98) entered blocking state
Feb 22 20:52:38 pi4server kernel: [   47.037065] br-eace4cfeb730: port 1(veth3249e98) entered forwarding state
Feb 22 20:52:38 pi4server kernel: [   47.769010] Bluetooth: hci1: Opcode 0x c03 failed: -4
Feb 22 20:52:44 pi4server kernel: [   53.100876] sd 0:0:0:0: [sda] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [   53.100908] sd 0:0:0:0: [sda] tag#13 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [   53.100916] sd 0:0:0:0: [sda] tag#13 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [   53.100926] sd 0:0:0:0: [sda] tag#13 CDB: opcode=0x28 28 00 23 4c 40 e8 00 02 80 00
Feb 22 20:52:44 pi4server kernel: [   53.100933] critical target error, dev sda, sector 592199912 op 0x0:(READ) flags 0x80700 phys_seg 80 prio class 2
Feb 22 20:52:44 pi4server kernel: [   53.101151] sd 0:0:0:0: [sda] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [   53.101156] sd 0:0:0:0: [sda] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [   53.101162] sd 0:0:0:0: [sda] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [   53.101166] sd 0:0:0:0: [sda] tag#14 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [   53.101167] sd 0:0:0:0: [sda] tag#15 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [   53.101170] sd 0:0:0:0: [sda] tag#16 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [   53.101175] sd 0:0:0:0: [sda] tag#15 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [   53.101175] sd 0:0:0:0: [sda] tag#16 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [   53.101176] sd 0:0:0:0: [sda] tag#14 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [   53.101181] sd 0:0:0:0: [sda] tag#16 CDB: opcode=0x28 28 00 02 e5 ae b0 00 00 18 00
Feb 22 20:52:44 pi4server kernel: [   53.101184] sd 0:0:0:0: [sda] tag#15 CDB: opcode=0x28 28 00 02 8e 56 c0 00 00 38 00
Feb 22 20:52:44 pi4server kernel: [   53.101184] sd 0:0:0:0: [sda] tag#14 CDB: opcode=0x28 28 00 00 c5 e8 70 00 00 40 00
Feb 22 20:52:44 pi4server kernel: [   53.101186] critical target error, dev sda, sector 48606896 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 2
Feb 22 20:52:44 pi4server kernel: [   53.101190] critical target error, dev sda, sector 12970096 op 0x0:(READ) flags 0x80700 phys_seg 8 prio class 2
Feb 22 20:52:44 pi4server kernel: [   53.101191] critical target error, dev sda, sector 42882752 op 0x0:(READ) flags 0x80700 phys_seg 7 prio class 2
Feb 22 20:52:44 pi4server kernel: [   53.101218] sd 0:0:0:0: [sda] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [   53.101227] sd 0:0:0:0: [sda] tag#12 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [   53.101229] sd 0:0:0:0: [sda] tag#17 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [   53.101234] sd 0:0:0:0: [sda] tag#12 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [   53.101235] sd 0:0:0:0: [sda] tag#17 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [   53.101241] sd 0:0:0:0: [sda] tag#17 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [   53.101242] sd 0:0:0:0: [sda] tag#12 CDB: opcode=0x28 28 00 04 0c 2b 00 00 00 20 00
Feb 22 20:52:44 pi4server kernel: [   53.101248] sd 0:0:0:0: [sda] tag#17 CDB: opcode=0x28 28 00 00 09 8c 48 00 01 a8 00
Feb 22 20:52:44 pi4server kernel: [   53.101249] critical target error, dev sda, sector 67906304 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 2
Feb 22 20:52:44 pi4server kernel: [   53.101252] critical target error, dev sda, sector 625736 op 0x0:(READ) flags 0x80700 phys_seg 53 prio class 2


View this topic in a long page with up to 500 replies per page Create new topic

This is a filtered page: currently showing replies marked as answers. Click here to see full discussion.

timmmay

20858 posts

Uber Geek
+1 received by user: 5350

Trusted
Lifetime subscriber

  #3233110 21-May-2024 15:52
Send private message

I think I worked out why the SSD failed - excessive heat. I checked the SMART data for the new SSD and it was 85 degrees C. The ADATA is spec'd for operation up to 70c, so it did pretty well lasting as long as it did. The Transcend is rated for that temperature, but it will have affected its lifespan.

 

I tried a few things to reduce the temperature

 

  • 85c - m.2 inside case, no fan
  • 71c - m.2 inside case, constant 50% fan
  • 66c - m.2 with case open, using an extra USB cable
  • 64c - m.2 with case open with m.2 label removed
  • 45c - m.2 case open, passive cooler fitted (Jeyi Battleship, from memory)

Currently I have the two parts of the case separate (SSD enclosure part, PI part) with a USB cable between them, the Pi part down so the heat rises. I'll probably need to put the SSD with its USB adapter into a new case and put the base back on the Pi so it's physically protected. That's ugly and annoying.


View this topic in a long page with up to 500 replies per page Create new topic








Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.