Hi All,
I am in the process of migrating from 4x1 TB ZFS RAIDZ1 to 5x3TB ZFS RAIDZ2. It was nice to be able to have ALL the drives hooked up at the same time to do the setup and migration. However when I decided to test the new drives I ran into some problems. This is what I found.
I am using Ubuntu 12.04.4 with the 3.5 kernel and ZFS on Linux. My test methodology was going to be to use dd to write to the drives and then compare the smart data to see how many re-mapped sectors happened. I had a problem with one of my drives dropping out but otherwise seemed ok. I was ready to send it back to NewEgg when I decided to continue testing. I switched to using badblocks instead of dd because it was able to write just as fast and did not cause kswap0 to use a lot of CPU like dd did and in addition it did verification of the writes (takes a looonnnnggg time with 3TB drives).
In summary what I found is that I was not able to stably have more than 2 drives connected to the Marvell 9230 (it provides 4 of the SATA ports) under heavy load. These 4 ports are the first 4 white ones as you count from the edge of the board. The top 2 are connected to the Marvell 9172 and appear fine. My current setup that works has:
4x1TB old drives, connected to 4 blue ports which are the SATA2 ports provided by the Avoton.
1x250 SSD connected to one of the white SATA3 ports next to the blue ports provided by Avoton
1x3TB connected to other white SATA3 from Avoton
2x3TB connected to Marvell 9172 ports
2x3TB connected to Marvell 9230 ports
Every test I tried with more than 2 drives connected to the 9230 failed with entries in the kernel log and eventually the drive would get disconnected. I even shuffled drives around to make sure it was no the specific drive. It did not make a difference.
Has anyone else experienced this? I am on BIOS 1.8.
I am currently in communication with ASRock and have provided them all my test configurations and am very impressed with their responsiveness so far. Hopefully it is something they can fix with a BIOS update or who knows, I guess it could also be a Linux kernel issue. This is why I am putting this out there to see if other have had similar issues and could provide more info to ASRock.
This is what my kern.log entries looked like:
Jan 25 21:42:15 zfs kernel: [ 615.011152] ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jan 25 21:42:15 zfs kernel: [ 615.011329] ata9.00: failed command: SMART
Jan 25 21:42:15 zfs kernel: [ 615.011462] ata9.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Jan 25 21:42:15 zfs kernel: [ 615.011462] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 25 21:42:15 zfs kernel: [ 615.011908] ata9.00: status: { DRDY }
Jan 25 21:42:15 zfs kernel: [ 615.012024] ata9: hard resetting link
Jan 25 21:42:16 zfs kernel: [ 615.338638] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jan 25 21:42:21 zfs kernel: [ 620.330962] ata9.00: qc timeout (cmd 0xec)
Jan 25 21:42:21 zfs kernel: [ 620.338942] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 25 21:42:21 zfs kernel: [ 620.338948] ata9.00: revalidation failed (errno=-5)
Jan 25 21:42:21 zfs kernel: [ 620.339051] ata9: hard resetting link
Jan 25 21:42:21 zfs kernel: [ 620.666451] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jan 25 21:42:31 zfs kernel: [ 630.651103] ata9.00: qc timeout (cmd 0xec)
Jan 25 21:42:31 zfs kernel: [ 630.659078] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 25 21:42:31 zfs kernel: [ 630.659084] ata9.00: revalidation failed (errno=-5)
Jan 25 21:42:31 zfs kernel: [ 630.659185] ata9: limiting SATA link speed to 3.0 Gbps
Jan 25 21:42:31 zfs kernel: [ 630.659193] ata9: hard resetting link
Jan 25 21:42:31 zfs kernel: [ 630.986591] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Jan 25 21:43:01 zfs kernel: [ 660.940566] ata9.00: qc timeout (cmd 0xec)
Jan 25 21:43:01 zfs kernel: [ 660.948541] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 25 21:43:01 zfs kernel: [ 660.948546] ata9.00: revalidation failed (errno=-5)
Jan 25 21:43:01 zfs kernel: [ 660.948642] ata9.00: disabled
Jan 25 21:43:01 zfs kernel: [ 660.956535] ata9: hard resetting link
Jan 25 21:43:02 zfs kernel: [ 661.276053] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Jan 25 21:43:02 zfs kernel: [ 661.284034] ata9: EH complete
Jan 25 21:43:02 zfs kernel: [ 661.284215] sd 8:0:0:0: [sdg] Unhandled error code
Jan 25 21:43:02 zfs kernel: [ 661.284222] sd 8:0:0:0: [sdg]
Jan 25 21:43:02 zfs kernel: [ 661.284225] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 25 21:43:02 zfs kernel: [ 661.284231] sd 8:0:0:0: [sdg] CDB:
Jan 25 21:43:02 zfs kernel: [ 661.284234] Write(10): 2a 00 00 1a 00 00 00 04 00 00
Jan 25 21:43:02 zfs kernel: [ 661.284249] end_request: I/O error, dev sdg, sector 1703936
Jan 25 21:43:02 zfs kernel: [ 661.284252] sd 8:0:0:0: [sdg] Unhandled error code
Jan 25 21:43:02 zfs kernel: [ 661.284256] sd 8:0:0:0: [sdg]
Jan 25 21:43:02 zfs kernel: [ 661.284257] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 25 21:43:02 zfs kernel: [ 661.284259] sd 8:0:0:0: [sdg] CDB:
Jan 25 21:43:02 zfs kernel: [ 661.284260] Write(10): 2a 00 00 1a 7c 00 00 04 00 00
Jan 25 21:43:02 zfs kernel: [ 661.284269] end_request: I/O error, dev sdg, sector 1735680
Jan 25 21:43:02 zfs kernel: [ 661.284288] sd 8:0:0:0: [sdg] Unhandled error code
Jan 25 21:43:02 zfs kernel: [ 661.284291] sd 8:0:0:0: [sdg]
Jan 25 21:43:02 zfs kernel: [ 661.284292] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 25 21:43:02 zfs kernel: [ 661.284295] sd 8:0:0:0: [sdg] CDB:
Jan 25 21:43:02 zfs kernel: [ 661.284296] Write(10): 2a 00 00 1a 80 00 00 04 00 00
Jan 25 21:43:02 zfs kernel: [ 661.284306] end_request: I/O error, dev sdg, sector 1736704
I am in the process of migrating from 4x1 TB ZFS RAIDZ1 to 5x3TB ZFS RAIDZ2. It was nice to be able to have ALL the drives hooked up at the same time to do the setup and migration. However when I decided to test the new drives I ran into some problems. This is what I found.
I am using Ubuntu 12.04.4 with the 3.5 kernel and ZFS on Linux. My test methodology was going to be to use dd to write to the drives and then compare the smart data to see how many re-mapped sectors happened. I had a problem with one of my drives dropping out but otherwise seemed ok. I was ready to send it back to NewEgg when I decided to continue testing. I switched to using badblocks instead of dd because it was able to write just as fast and did not cause kswap0 to use a lot of CPU like dd did and in addition it did verification of the writes (takes a looonnnnggg time with 3TB drives).
In summary what I found is that I was not able to stably have more than 2 drives connected to the Marvell 9230 (it provides 4 of the SATA ports) under heavy load. These 4 ports are the first 4 white ones as you count from the edge of the board. The top 2 are connected to the Marvell 9172 and appear fine. My current setup that works has:
4x1TB old drives, connected to 4 blue ports which are the SATA2 ports provided by the Avoton.
1x250 SSD connected to one of the white SATA3 ports next to the blue ports provided by Avoton
1x3TB connected to other white SATA3 from Avoton
2x3TB connected to Marvell 9172 ports
2x3TB connected to Marvell 9230 ports
Every test I tried with more than 2 drives connected to the 9230 failed with entries in the kernel log and eventually the drive would get disconnected. I even shuffled drives around to make sure it was no the specific drive. It did not make a difference.
Has anyone else experienced this? I am on BIOS 1.8.
I am currently in communication with ASRock and have provided them all my test configurations and am very impressed with their responsiveness so far. Hopefully it is something they can fix with a BIOS update or who knows, I guess it could also be a Linux kernel issue. This is why I am putting this out there to see if other have had similar issues and could provide more info to ASRock.
This is what my kern.log entries looked like:
Jan 25 21:42:15 zfs kernel: [ 615.011152] ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jan 25 21:42:15 zfs kernel: [ 615.011329] ata9.00: failed command: SMART
Jan 25 21:42:15 zfs kernel: [ 615.011462] ata9.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Jan 25 21:42:15 zfs kernel: [ 615.011462] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 25 21:42:15 zfs kernel: [ 615.011908] ata9.00: status: { DRDY }
Jan 25 21:42:15 zfs kernel: [ 615.012024] ata9: hard resetting link
Jan 25 21:42:16 zfs kernel: [ 615.338638] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jan 25 21:42:21 zfs kernel: [ 620.330962] ata9.00: qc timeout (cmd 0xec)
Jan 25 21:42:21 zfs kernel: [ 620.338942] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 25 21:42:21 zfs kernel: [ 620.338948] ata9.00: revalidation failed (errno=-5)
Jan 25 21:42:21 zfs kernel: [ 620.339051] ata9: hard resetting link
Jan 25 21:42:21 zfs kernel: [ 620.666451] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jan 25 21:42:31 zfs kernel: [ 630.651103] ata9.00: qc timeout (cmd 0xec)
Jan 25 21:42:31 zfs kernel: [ 630.659078] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 25 21:42:31 zfs kernel: [ 630.659084] ata9.00: revalidation failed (errno=-5)
Jan 25 21:42:31 zfs kernel: [ 630.659185] ata9: limiting SATA link speed to 3.0 Gbps
Jan 25 21:42:31 zfs kernel: [ 630.659193] ata9: hard resetting link
Jan 25 21:42:31 zfs kernel: [ 630.986591] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Jan 25 21:43:01 zfs kernel: [ 660.940566] ata9.00: qc timeout (cmd 0xec)
Jan 25 21:43:01 zfs kernel: [ 660.948541] ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 25 21:43:01 zfs kernel: [ 660.948546] ata9.00: revalidation failed (errno=-5)
Jan 25 21:43:01 zfs kernel: [ 660.948642] ata9.00: disabled
Jan 25 21:43:01 zfs kernel: [ 660.956535] ata9: hard resetting link
Jan 25 21:43:02 zfs kernel: [ 661.276053] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Jan 25 21:43:02 zfs kernel: [ 661.284034] ata9: EH complete
Jan 25 21:43:02 zfs kernel: [ 661.284215] sd 8:0:0:0: [sdg] Unhandled error code
Jan 25 21:43:02 zfs kernel: [ 661.284222] sd 8:0:0:0: [sdg]
Jan 25 21:43:02 zfs kernel: [ 661.284225] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 25 21:43:02 zfs kernel: [ 661.284231] sd 8:0:0:0: [sdg] CDB:
Jan 25 21:43:02 zfs kernel: [ 661.284234] Write(10): 2a 00 00 1a 00 00 00 04 00 00
Jan 25 21:43:02 zfs kernel: [ 661.284249] end_request: I/O error, dev sdg, sector 1703936
Jan 25 21:43:02 zfs kernel: [ 661.284252] sd 8:0:0:0: [sdg] Unhandled error code
Jan 25 21:43:02 zfs kernel: [ 661.284256] sd 8:0:0:0: [sdg]
Jan 25 21:43:02 zfs kernel: [ 661.284257] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 25 21:43:02 zfs kernel: [ 661.284259] sd 8:0:0:0: [sdg] CDB:
Jan 25 21:43:02 zfs kernel: [ 661.284260] Write(10): 2a 00 00 1a 7c 00 00 04 00 00
Jan 25 21:43:02 zfs kernel: [ 661.284269] end_request: I/O error, dev sdg, sector 1735680
Jan 25 21:43:02 zfs kernel: [ 661.284288] sd 8:0:0:0: [sdg] Unhandled error code
Jan 25 21:43:02 zfs kernel: [ 661.284291] sd 8:0:0:0: [sdg]
Jan 25 21:43:02 zfs kernel: [ 661.284292] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 25 21:43:02 zfs kernel: [ 661.284295] sd 8:0:0:0: [sdg] CDB:
Jan 25 21:43:02 zfs kernel: [ 661.284296] Write(10): 2a 00 00 1a 80 00 00 04 00 00
Jan 25 21:43:02 zfs kernel: [ 661.284306] end_request: I/O error, dev sdg, sector 1736704
Comment