Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.




14269 posts

Uber Geek
+1 received by user: 2590

Trusted
Subscriber

Topic # 169758 24-Mar-2015 22:07
Send private message

I thought I'd fixed my data integrity issues by replacing the motherboard, but maybe not. Tonight I'm getting errors when I copy large files from one disk to another. Memtest x86 ran for 12 hours with no errors.

(Update - not sure this is really helpful - pretty sure it's my PC, though not 100% sure which part - CPU, cable, PSU). Can a few people please try this for me. Download Teracopy, change preferences to "always verify". Copy 100GB of files (eg movies) from one disk to another. If you can try between two spinning disks and between a spinning disk and an SSD that'd be interesting. If you hit "more" during the copy the tool will report any CRC failures. I'd like to know if anyone else gets any CRC errors at all. Today I got 5 errors copying 100GB of movies around, and copying one 100GB file just won't succeed without errors.

Motherboard is new, memtest passes every time never an error, the Intel processor test passes every time, Prime95 passes every time, and it happens with each of my 5 disks. I've changed some cables since I last tested too.

The main thing I'm wondering is do these CRC errors happen to people regularly and they just don't notice because it's never checked, or is there something wrong with my PC? I hope it's just my PC, that I can replace quite easily. I just don't want to spend $1500 replacing it and then find the new one does the same thing.

Update - SOLVED! It was the RAM. See the information on page 3 for more details.




AWS Certified Solution Architect Professional, Sysop Administrator Associate, and Developer Associate
TOGAF certified enterprise architect
Professional photographer


Filter this topic showing only the reply marked as answer View this topic in a long page with up to 500 replies per page Create new topic
 1 | 2 | 3 | 4
773 posts

Ultimate Geek
+1 received by user: 176


  Reply # 1267190 24-Mar-2015 23:05
Send private message

I use teracopy all the time. I always use it with crc verification turned on.

I use it on windows xp and windows 7.

I use it to copy or move files between windows xp and 7 and freebsd/zfs, fedora ext4 and some cheapie nas gadget off aliX.

It works.

It has never let me down.

I trust it.

If teracopy has a problem with an operation then you have a fault somewhere, it is not teracopy.



14269 posts

Uber Geek
+1 received by user: 2590

Trusted
Subscriber

  Reply # 1267238 25-Mar-2015 05:58
Send private message

First post clarified - Teracopy will be fine, I'm just trying to work out if these likely single byte errors are happening all the time on many PCs and no-one notices or if my PC is faulty. I hope it's just my PC, that can be replaced easily.




AWS Certified Solution Architect Professional, Sysop Administrator Associate, and Developer Associate
TOGAF certified enterprise architect
Professional photographer


 
 
 
 


528 posts

Ultimate Geek
+1 received by user: 197


  Reply # 1267239 25-Mar-2015 06:04
Send private message

timmmay: First post clarified - Teracopy will be fine, I'm just trying to work out if these likely single byte errors are happening all the time on many PCs and no-one notices or if my PC is faulty. I hope it's just my PC, that can be replaced easily.


Your PC is faulty (in some way, which may include software). I keep checksums for every file added to my RAID. When I replaced the RAID last year, I copied everything to the new system, and ran checksums over the whole lot (about 8TB). Only one error was found.

Not relevant in your case, since it's on a single computer, but I've had issues in the past with checksum offloading on network cards causing data corruption. These things unfortunately, can take a very long time to track down.



14269 posts

Uber Geek
+1 received by user: 2590

Trusted
Subscriber

  Reply # 1267241 25-Mar-2015 06:16
Send private message

I first noticed my data corruption around a year ago. I spent three very frustrating months tracking it down, I thought I'd solved it with the motherboard replacement - all tests right after I installed it were fine. Copying many hundreds of GB to my new RAID array copied with no errors, but now it's started again. Gutted.

What I want to avoid is buying a whole new PC and finding the same problems. I mean it's a new motherboard, RAM is testing fine, processor is testing fine, and while I can't really test disks easily I have five disks, four different brands, and they're all showing the same error, so it's not the disks. PSU couldn't do this could it? I'd probably go for a 100% replacement as this is so confusing and difficult to track down, case, cables and all. The only thing I haven't replaced is every cable - I did replace some a couple of SATA cables.

I had my eye on a SuperMicro motherboard, Xeon CPU, and ECC RAM. I might have to think more about getting it.

Still interested if anyone can be bothered running a couple of tests for me... :(




AWS Certified Solution Architect Professional, Sysop Administrator Associate, and Developer Associate
TOGAF certified enterprise architect
Professional photographer




14269 posts

Uber Geek
+1 received by user: 2590

Trusted
Subscriber

  Reply # 1267244 25-Mar-2015 06:36
Send private message

Here are the reports from H2testw

Samsung 840 SSD
Warning: Only 60000 of 122101 MByte tested.
The media is likely to be defective.
58.5 GByte OK (122879997 sectors)
1.5 KByte DATA LOST (3 sectors)
Details:0 KByte overwritten (0 sectors)
1.5 KByte slightly changed (< 8 bit/sector, 3 sectors)
0 KByte corrupted (0 sectors)
0 KByte aliased memory (0 sectors)
First error at offset: 0x000000005ac1c620
Expected: 0x0285dcfd89753620
Found: 0x02859cfd89753620
H2testw version 1.3
Writing speed: 320 MByte/s
Reading speed: 340 MByte/s
H2testw v1.4

Seagate Barracuda 1GB
Warning: Only 50000 of 953867 MByte tested.
The media is likely to be defective.
48.8 GByte OK (102399988 sectors)
6 KByte DATA LOST (12 sectors)
Details:0 KByte overwritten (0 sectors)
6 KByte slightly changed (< 8 bit/sector, 12 sectors)
0 KByte corrupted (0 sectors)
0 KByte aliased memory (0 sectors)
First error at offset: 0x000000034015f6a0
Expected: 0x5386ad804f6e56a0
Found: 0x53c6ad804f6e56a0
H2testw version 1.3
Writing speed: 93.5 MByte/s
Reading speed: 122 MByte/s
H2testw v1.4


2xHGST 4TB, ReFS with Mirror Storage space, aka RAID mirror (this one worked)
Warning: Only 80000 of 3806272 MByte tested.
Test finished without errors.
You can now delete the test files *.h2w or verify them again.
Writing speed: 107 MByte/s
Reading speed: 138 MByte/s
H2testw v1.4




And an event log entry or two, talking about the RAID mirror (I have 5 like this from this morning)
The file system detected a checksum error and was able to correct it. The name of the file or folder is "R:\Backups\SystemImages\Windows7-fresh-9-3-2015\144798B7048BCB19-00-00.mrimg".
The file system detected a checksum error and was able to correct it. The name of the file or folder is "R:\Backups\SystemImages\W10-preview-24-2-2015\Win10-Preview-Feb24-2015-00-00.mrimg".




AWS Certified Solution Architect Professional, Sysop Administrator Associate, and Developer Associate
TOGAF certified enterprise architect
Professional photographer


4717 posts

Uber Geek
+1 received by user: 2201

Trusted
Subscriber

  Reply # 1267250 25-Mar-2015 07:47
Send private message

A few PCs ago I had one that would randomly, every few months, completely crap all over the disk and require a clean reinstall of Windows. It was similar to you - memtest86 and all the other tests were 100% stable, and the machine itself never crashed. It would just sometimes try to boot to a disk full of garbage. I eventually decided it was the south bridge IDE controller on the motherboard, because I had two drives, which would both exhibit the issue when connected to the south bridge IDE controller, but never when connected to a secondary IDE controller (also on the motherboard).

Why didn't I just connect both drives to the other controller, you ask? Lazy, I guess. that would required faffing around in Windows to make it boot from a 'different' disk, whereas reinstalling from an image was just a case of shoving in the DVD and coming back in a couple of hours. All the important stuff was on the other drive.




iPad Air + iPhone SE + 2degrees 4tw!

These comments are my own and do not represent the opinions of 2degrees.


773 posts

Ultimate Geek
+1 received by user: 176


  Reply # 1267258 25-Mar-2015 07:50
Send private message

timmmay: First post clarified - Teracopy will be fine, I'm just trying to work out if these likely single byte errors are happening all the time on many PCs and no-one notices or if my PC is faulty. I hope it's just my PC, that can be replaced easily.


They happen all the time on windows PCs if you write large files to usb drives. There is a fault in the usb stack that will copy a whole file without reporting errors. The copied file will look fine and the size will be good but there will be a chunk of zeros somewhere in the middle of the file.

I have seen that on w2k, xp and 7 with a variety of usb devices on a variety of computers.

Folks don't tend to notice it because it will most likely be in a movie file and when you playback all that you see is a short glitch in the video.

Teracopy will spot those errors but not fix them.

For copying onto usb drives with windows, I use a program that I wrote which opens the destination file and writes a few MB. Then it closes it and opens it for appending the next few MB and then closes the file again etc etc.

It is slow but it works.




14269 posts

Uber Geek
+1 received by user: 2590

Trusted
Subscriber

  Reply # 1267260 25-Mar-2015 07:55
Send private message

SaltyNZ: A few PCs ago I had one that would randomly, every few months, completely crap all over the disk and require a clean reinstall of Windows. It was similar to you - memtest86 and all the other tests were 100% stable, and the machine itself never crashed. It would just sometimes try to boot to a disk full of garbage. I eventually decided it was the south bridge IDE controller on the motherboard, because I had two drives, which would both exhibit the issue when connected to the south bridge IDE controller, but never when connected to a secondary IDE controller (also on the motherboard).

Why didn't I just connect both drives to the other controller, you ask? Lazy, I guess. that would required faffing around in Windows to make it boot from a 'different' disk, whereas reinstalling from an image was just a case of shoving in the DVD and coming back in a couple of hours. All the important stuff was on the other drive.


Annoying. I replaced the motherboard!

jpoc: They happen all the time on windows PCs if you write large files to usb drives. There is a fault in the usb stack that will copy a whole file without reporting errors. The copied file will look fine and the size will be good but there will be a chunk of zeros somewhere in the middle of the file.

I have seen that on w2k, xp and 7 with a variety of usb devices on a variety of computers.

Folks don't tend to notice it because it will most likely be in a movie file and when you playback all that you see is a short glitch in the video.

Teracopy will spot those errors but not fix them.

For copying onto usb drives with windows, I use a program that I wrote which opens the destination file and writes a few MB. Then it closes it and opens it for appending the next few MB and then closes the file again etc etc.

It is slow but it works.


Different issue from me, I'm all SATA, but interesting.




AWS Certified Solution Architect Professional, Sysop Administrator Associate, and Developer Associate
TOGAF certified enterprise architect
Professional photographer


3344 posts

Uber Geek
+1 received by user: 1089

Trusted
Vocus

  Reply # 1267305 25-Mar-2015 08:58
Send private message

Have you tried replacing the power supply?  Maybe you only get memory faults when your power is under load (ie. disk activity)

Or, less likely but possible, your CPU itself is faulty...



14269 posts

Uber Geek
+1 received by user: 2590

Trusted
Subscriber

  Reply # 1267321 25-Mar-2015 09:24
Send private message

ubergeeknz: Have you tried replacing the power supply?  Maybe you only get memory faults when your power is under load (ie. disk activity)

Or, less likely but possible, your CPU itself is faulty...


I haven't tried changing PSU yet. I personally think it's unlikely, but possible, but I'd have thought CPU was unlikely too, and given motherboard and RAM is fine it's likely one of those. Current PSU is "Antec High Current Gamer HCG-520, 520W ATX PSU".

Plan of attack:
 - I'll try new SATA cables tonight - though I expect no luck as I'm having problems between multiple drives and having more than one faulty cable is really, really unlikely
 - I'll put my drives into another computer and check everything works
 - I'll try another PSU (does anyone in Wellington have one spare I can borrow for a weekend?)
 - At this point is must be CPU. I'm not replacing the CPU. If I get this far I'll build a new PC without reusing ANY parts, including case and PSU.


To all - if you were building a replacement PC would you replace the lot including PSU and case, or would you keep them and see what happens with the old ones first? Those components cost $400 and I tend to keep them between builds, but I'm getting really, really, really sick of this.




AWS Certified Solution Architect Professional, Sysop Administrator Associate, and Developer Associate
TOGAF certified enterprise architect
Professional photographer


dan

986 posts

Ultimate Geek
+1 received by user: 89


  Reply # 1267346 25-Mar-2015 10:05
One person supports this post
Send private message

you dont need people to try this for you, if you are getting crc errors copying files, then you have a fault somewhere on your system, likely hardware, but could be firmware/drivers

Ive used Teracopy 100s of times with verify,  on probably 50+ of computers in the last 5+ years. you only get CRC errors if something is wrong

no need to be throwing around the term bit rot 

If you disconnect all the hard drives apart from the samsung ssd, do you still get the crc errors copying a large file onto itself? rar up some files around 20gb into a single file, then copy it, delete and repeat several times, does you get a CRC fail? if not i probably wouldnt replace your machine to fix until you have done alot of trial/error with your other drives










14269 posts

Uber Geek
+1 received by user: 2590

Trusted
Subscriber

  Reply # 1267349 25-Mar-2015 10:12
Send private message

dan: you dont need people to try this for you, if you are getting crc errors copying files, then you have a fault somewhere on your system, likely hardware, but could be firmware/drivers

Ive used Teracopy 100s of times with verify,  on probably 50+ of computers in the last 5+ years. you only get CRC errors if something is wrong

no need to be throwing around the term bit rot 

If you disconnect all the hard drives apart from the samsung ssd, do you still get the crc errors copying a large file onto itself? rar up some files around 20gb into a single file, then copy it, delete and repeat several times, does you get a CRC fail? if not i probably wouldnt replace your machine to fix until you have done alot of trial/error with your other drives


Fair call, 99% sure it's just my machine. I'm not saying 100% sure as I think that would be arrogance from me, I don't know everything.

I haven't tried the copy from and to the SSD, but I have tried a program that writes a known set of data to the drive and reads it, that consistently fails on both the SSD and other drives (details/logs are further up this thread). I'm pretty sure I would get CRC failures copying from/to the drive. I'm not sure I understand your conclusion about the test though - copying from the drive to another drive uses the drive, the cables, the motherboard, and the CPU, it's not like it goes direct. I don't think it would give me any additional information compared with testing programs or copying with CRC validation.

Maybe you can help me understand why you think copying from the SSD to the SSD and having a failure would suggest I should't replace the computer?




AWS Certified Solution Architect Professional, Sysop Administrator Associate, and Developer Associate
TOGAF certified enterprise architect
Professional photographer


3344 posts

Uber Geek
+1 received by user: 1089

Trusted
Vocus

  Reply # 1267351 25-Mar-2015 10:14
Send private message

What you have is something pretty weird.  Keep at it, you will fix it.

As has been stated, CRC errors should never happen on local copy functions.  I have copied terabytes from one drive to another without any errors (even using dd | gzip | nc pipes across a LAN for that matter)

Also - start eliminating drives.  One faulty drive can impact others on the same SATA controller.

773 posts

Ultimate Geek
+1 received by user: 176


  Reply # 1267354 25-Mar-2015 10:16
Send private message

Are your power supply cables plugged directly into the drives or do you have power splitter or molex to sata power adaptors?

Can you check the connectors on the ends of the sata power cables? I have seen sata power connectors which have been cracked. They look just fine but when they are plugged into the drives, the cracks open out and there is insufficient pressure to ensure good connections between the connector and the drive. The cracks are usually in the corners of the connectors.

Are the power and data connectors on the drives OK? Sometimes, the end lugs get broken off and the connectors will still go on and feel firm but they may not be correctly aligned.



14269 posts

Uber Geek
+1 received by user: 2590

Trusted
Subscriber

  Reply # 1267358 25-Mar-2015 10:23
Send private message

ubergeeknz: What you have is something pretty weird.  Keep at it, you will fix it.

As has been stated, CRC errors should never happen on local copy functions.  I have copied terabytes from one drive to another without any errors (even using dd | gzip | nc pipes across a LAN for that matter)

Also - start eliminating drives.  One faulty drive can impact others on the same SATA controller.


It's a really frustrating problem! Who knows where the actual corruption is occurring.

Good call on eliminating drives. I can try single drive tests easily enough - though the OS SSD will probably need to be plugged in for all. I don't think I can run the testing program on a live boot.

jpoc: Are your power supply cables plugged directly into the drives or do you have power splitter or molex to sata power adaptors?

Can you check the connectors on the ends of the sata power cables? I have seen sata power connectors which have been cracked. They look just fine but when they are plugged into the drives, the cracks open out and there is insufficient pressure to ensure good connections between the connector and the drive. The cracks are usually in the corners of the connectors.

Are the power and data connectors on the drives OK? Sometimes, the end lugs get broken off and the connectors will still go on and feel firm but they may not be correctly aligned.


The drives are running directly off the cables that come from the PSU. I can check them, but visual inspection probably won't help. Given this is happening on multiple drives I doubt it's the problem but it's worth considering, they are the same brand and have had similar usage.

Again, with power and data connectors, since this happens to multiple disks seems unlikely - I suspect a central point rather than problems with cables.




AWS Certified Solution Architect Professional, Sysop Administrator Associate, and Developer Associate
TOGAF certified enterprise architect
Professional photographer


 1 | 2 | 3 | 4
Filter this topic showing only the reply marked as answer View this topic in a long page with up to 500 replies per page Create new topic



Twitter »

Follow us to receive Twitter updates when new discussions are posted in our forums:



Follow us to receive Twitter updates when news items and blogs are posted in our frontpage:



Follow us to receive Twitter updates when tech item prices are listed in our price comparison site:



Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.

Alternatively, you can receive a daily email with Geekzone updates.