Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.


timmmay

20578 posts

Uber Geek

Trusted
Lifetime subscriber

#311878 23-Feb-2024 06:56
Send private message

Update - this is an old solved question. I've added what I've recently discovered on page 3.

 

I have an odd problem with the Raspberry Pi 4 + m.2 SSD on Raspbian that runs all kinds of things for me - PiHole including DHCP, Home Assistant, and PostgreSQL are the key ones. When this server goes down my network goes down because of lack of DHCP - until I switch over to router DHCP which mucks up IP address allocation. Home Assistant does all kinds of air conditioning automation, including actively changing how the ducted system works so it doesn't overheat or overcool rooms, but directly adjusting damper positions.

 

Current state is everything is running in docker containers. I have a six month old system image backup including the OS and the whole file system, and I have nightly backups of key parts of the file system pushed out to S3, including the Docker folder, HA, key OS config files, etc. I turned off syslog to reduce disk usage, which I'm regretting now. At least Pi Hole is working so the network is up.

 

Last night I modified the hosts file, I added an extra domain to the end of an existing entry. I rebooted to make sure all caches were emptied. When it came up I was getting 10 emails per minute from Home Assistant telling me it couldn't connect to PostgreSQL. I couldn't connect to the server using SSH. The server is in a cupboard in my son's room so I couldn't access it, but I rebooted it by turning power to the cupboard off then on.

 

At that point it came up, the emails stopped, and I could SSH in. I stopped all the docker containers except PiHole, started PostgreSQL container, and then started Home Assistant. The machine felt very slow, typing was instant but running commands took much longer than usual. top said CPU usage was minimal. Home Assistant came up. Around ten minutes later it stopped responding to SSH again, but Pi Hole is working, and I got an email overnight from apt regarding updates. Backups didn't run last night though, they're restic pushing to S3. In the ten minutes I could access the server I got access to the kernel log, which suggested there's some kind of disk issue (log below).

 

Question: how would you address this? Any suggestions for how to restore the OS so that I don't have to reinstall from scratch?

 

My plan is to approach things somewhat like this, obviously stopping if any step works:

 

  • Get the Pi into my office and reboot it to see if it's magically started working
  • Plug the SSD into my main Windows PC to check disk health with the manufacturer tool and Hard Disk Sentinel
  • Plug the SSD into my Ubuntu PC and run fsck
  • Try it again

If that fails to fix things I guess I'll reinstall the OS from scratch, then restore my various files from the last nightly backup.

 

 

 

Feb 22 20:52:38 pi4server kernel: [   47.037040] br-eace4cfeb730: port 1(veth3249e98) entered blocking state
Feb 22 20:52:38 pi4server kernel: [   47.037065] br-eace4cfeb730: port 1(veth3249e98) entered forwarding state
Feb 22 20:52:38 pi4server kernel: [   47.769010] Bluetooth: hci1: Opcode 0x c03 failed: -4
Feb 22 20:52:44 pi4server kernel: [   53.100876] sd 0:0:0:0: [sda] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [   53.100908] sd 0:0:0:0: [sda] tag#13 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [   53.100916] sd 0:0:0:0: [sda] tag#13 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [   53.100926] sd 0:0:0:0: [sda] tag#13 CDB: opcode=0x28 28 00 23 4c 40 e8 00 02 80 00
Feb 22 20:52:44 pi4server kernel: [   53.100933] critical target error, dev sda, sector 592199912 op 0x0:(READ) flags 0x80700 phys_seg 80 prio class 2
Feb 22 20:52:44 pi4server kernel: [   53.101151] sd 0:0:0:0: [sda] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [   53.101156] sd 0:0:0:0: [sda] tag#15 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [   53.101162] sd 0:0:0:0: [sda] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [   53.101166] sd 0:0:0:0: [sda] tag#14 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [   53.101167] sd 0:0:0:0: [sda] tag#15 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [   53.101170] sd 0:0:0:0: [sda] tag#16 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [   53.101175] sd 0:0:0:0: [sda] tag#15 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [   53.101175] sd 0:0:0:0: [sda] tag#16 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [   53.101176] sd 0:0:0:0: [sda] tag#14 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [   53.101181] sd 0:0:0:0: [sda] tag#16 CDB: opcode=0x28 28 00 02 e5 ae b0 00 00 18 00
Feb 22 20:52:44 pi4server kernel: [   53.101184] sd 0:0:0:0: [sda] tag#15 CDB: opcode=0x28 28 00 02 8e 56 c0 00 00 38 00
Feb 22 20:52:44 pi4server kernel: [   53.101184] sd 0:0:0:0: [sda] tag#14 CDB: opcode=0x28 28 00 00 c5 e8 70 00 00 40 00
Feb 22 20:52:44 pi4server kernel: [   53.101186] critical target error, dev sda, sector 48606896 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 2
Feb 22 20:52:44 pi4server kernel: [   53.101190] critical target error, dev sda, sector 12970096 op 0x0:(READ) flags 0x80700 phys_seg 8 prio class 2
Feb 22 20:52:44 pi4server kernel: [   53.101191] critical target error, dev sda, sector 42882752 op 0x0:(READ) flags 0x80700 phys_seg 7 prio class 2
Feb 22 20:52:44 pi4server kernel: [   53.101218] sd 0:0:0:0: [sda] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [   53.101227] sd 0:0:0:0: [sda] tag#12 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [   53.101229] sd 0:0:0:0: [sda] tag#17 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s
Feb 22 20:52:44 pi4server kernel: [   53.101234] sd 0:0:0:0: [sda] tag#12 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [   53.101235] sd 0:0:0:0: [sda] tag#17 Sense Key : 0x4 [current]
Feb 22 20:52:44 pi4server kernel: [   53.101241] sd 0:0:0:0: [sda] tag#17 ASC=0x44 ASCQ=0x0
Feb 22 20:52:44 pi4server kernel: [   53.101242] sd 0:0:0:0: [sda] tag#12 CDB: opcode=0x28 28 00 04 0c 2b 00 00 00 20 00
Feb 22 20:52:44 pi4server kernel: [   53.101248] sd 0:0:0:0: [sda] tag#17 CDB: opcode=0x28 28 00 00 09 8c 48 00 01 a8 00
Feb 22 20:52:44 pi4server kernel: [   53.101249] critical target error, dev sda, sector 67906304 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 2
Feb 22 20:52:44 pi4server kernel: [   53.101252] critical target error, dev sda, sector 625736 op 0x0:(READ) flags 0x80700 phys_seg 53 prio class 2


View this topic in a long page with up to 500 replies per page Create new topic

This is a filtered page: currently showing replies marked as answers. Click here to see full discussion.

timmmay

20578 posts

Uber Geek

Trusted
Lifetime subscriber

  #3233110 21-May-2024 15:52
Send private message

I think I worked out why the SSD failed - excessive heat. I checked the SMART data for the new SSD and it was 85 degrees C. The ADATA is spec'd for operation up to 70c, so it did pretty well lasting as long as it did. The Transcend is rated for that temperature, but it will have affected its lifespan.

 

I tried a few things to reduce the temperature

 

  • 85c - m.2 inside case, no fan
  • 71c - m.2 inside case, constant 50% fan
  • 66c - m.2 with case open, using an extra USB cable
  • 64c - m.2 with case open with m.2 label removed
  • 45c - m.2 case open, passive cooler fitted (Jeyi Battleship, from memory)

Currently I have the two parts of the case separate (SSD enclosure part, PI part) with a USB cable between them, the Pi part down so the heat rises. I'll probably need to put the SSD with its USB adapter into a new case and put the base back on the Pi so it's physically protected. That's ugly and annoying.


View this topic in a long page with up to 500 replies per page Create new topic





News and reviews »

Air New Zealand Starts AI adoption with OpenAI
Posted 24-Jul-2025 16:00


eero Pro 7 Review
Posted 23-Jul-2025 12:07


BeeStation Plus Review
Posted 21-Jul-2025 14:21


eero Unveils New Wi-Fi 7 Products in New Zealand
Posted 21-Jul-2025 00:01


WiZ Introduces HDMI Sync Box and other Light Devices
Posted 20-Jul-2025 17:32


RedShield Enhances DDoS and Bot Attack Protection
Posted 20-Jul-2025 17:26


Seagate Ships 30TB Drives
Posted 17-Jul-2025 11:24


Oclean AirPump A10 Water Flosser Review
Posted 13-Jul-2025 11:05


Samsung Galaxy Z Fold7: Raising the Bar for Smartphones
Posted 10-Jul-2025 02:01


Samsung Galaxy Z Flip7 Brings New Edge-To-Edge FlexWindow
Posted 10-Jul-2025 02:01


Epson Launches New AM-C550Z WorkForce Enterprise printer
Posted 9-Jul-2025 18:22


Samsung Releases Smart Monitor M9
Posted 9-Jul-2025 17:46


Nearly Half of Older Kiwis Still Write their Passwords on Paper
Posted 9-Jul-2025 08:42


D-Link 4G+ Cat6 Wi-Fi 6 DWR-933M Mobile Hotspot Review
Posted 1-Jul-2025 11:34


Oppo A5 Series Launches With New Levels of Durability
Posted 30-Jun-2025 10:15



Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.