writing zeros to bad sector results in persistent read error

All of lore.kernel.org
 help / color / mirror / Atom feed

* writing zeros to bad sector results in persistent read error
@ 2014-06-07  0:11 Chris Murphy
  2014-06-07  1:26 ` Roger Heflin
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Chris Murphy @ 2014-06-07  0:11 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org List

This is a bit off topic as it doesn't involved md raid. But bad sectors are common sources of md raid problems, so I figured I'd post this here.

Summary: Hitachi/HGST Travelstar 5K750. smartctl will not complete an extended offline test, it stops 60% remaining reporting the LBA of the first error. Whether I use dd to read that LBA, or write zeros to it, or to a 1MB block surrounding it, I always get back a read error. Not a write error. I can't get rid of this bad sector. I have used the ATA secure erase command via hdparm and get the same results. Very weird, I'd expect a write error to occur.

### This is the entry from smartctl:
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       60%      1206         430197584

### Link to the full smartctl -x output
https://docs.google.com/file/d/0B_2Asp8DGjJ9VmdIZVo4UzdGaEE/edit

###  This is the command I used to try to write zeros over it, and the result:
# dd if=/dev/zero of=/dev/sda seek=430197584 count=1
dd: writing to ‘/dev/sda’: Input/output error
1+0 records in
0+0 records out
0 bytes (0 B) copied, 3.6149 s, 0.0 kB/s

### And this is the kernel message that appears as a result:

[15110.142071] ata1.00: exception Emask 0x0 SAct 0x20000 SErr 0x0 action 0x0
[15110.142079] ata1.00: irq_stat 0x40000008
[15110.142084] ata1.00: failed command: READ FPDMA QUEUED
[15110.142092] ata1.00: cmd 60/08:88:50:4b:a4/00:00:19:00:00/40 tag 17 ncq 4096 in
         res 51/40:08:50:4b:a4/00:00:19:00:00/40 Emask 0x409 (media error) <F>
[15110.142096] ata1.00: status: { DRDY ERR }
[15110.142099] ata1.00: error: { UNC }
[15110.144802] ata1.00: configured for UDMA/133
[15110.144826] sd 0:0:0:0: [sda] Unhandled sense code
[15110.144830] sd 0:0:0:0: [sda]  
[15110.144832] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[15110.144835] sd 0:0:0:0: [sda]  
[15110.144837] Sense Key : Medium Error [current] [descriptor]
[15110.144841] Descriptor sense data with sense descriptors (in hex):
[15110.144843]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[15110.144854]         19 a4 4b 50 
[15110.144860] sd 0:0:0:0: [sda]  
[15110.144863] Add. Sense: Unrecovered read error - auto reallocate failed
[15110.144865] sd 0:0:0:0: [sda] CDB: 
[15110.144867] Read(10): 28 00 19 a4 4b 50 00 00 08 00
[15110.144892] end_request: I/O error, dev sda, sector 430197584
[15110.144934] ata1: EH complete

### This is the complete dmesg
https://docs.google.com/file/d/0B_2Asp8DGjJ9c3hfelQyTnNoMU0/edit

At first I thought it was because I'm writing one 512 byte logical sector, but this drive has 4096 physical sectors. OK so I write out 8 logical sectors instead, still get a read error. If I do this, to put the bad sector in the middle of a 1MB write:

# dd if=/dev/zero of=/dev/sda seek=430196560 count=2048
dd: writing to ‘/dev/sda’: Input/output error
1025+0 records in
1024+0 records out

It stops right at LBA 430197584, again with a read error. So even though the drive SMART health assessment is "pass" and there are no other SMART values below threshold indicating "works as designed" this drive has effectively failed because any write operation to this LBA results in unrecoverable failure.

Anyway I find this confusing and unexpected.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-07  0:11 writing zeros to bad sector results in persistent read error Chris Murphy
@ 2014-06-07  1:26 ` Roger Heflin
  2014-06-07  1:51 ` Roman Mamedov
  2014-06-10 22:18 ` Eyal Lebedinsky
  2 siblings, 0 replies; 20+ messages in thread
From: Roger Heflin @ 2014-06-07  1:26 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-raid@vger.kernel.org List

hdparm --write-sector <sectornum>

is the only thing I was ever able to force a rewrite and/or relocate.

That worked on both seagate and wd disks to make the offline test go
past that point (at least until the next bad sector).

I did note that bad sectors appear to come in groups.

On Fri, Jun 6, 2014 at 7:11 PM, Chris Murphy <lists@colorremedies.com> wrote:
> This is a bit off topic as it doesn't involved md raid. But bad sectors are common sources of md raid problems, so I figured I'd post this here.
>
> Summary: Hitachi/HGST Travelstar 5K750. smartctl will not complete an extended offline test, it stops 60% remaining reporting the LBA of the first error. Whether I use dd to read that LBA, or write zeros to it, or to a 1MB block surrounding it, I always get back a read error. Not a write error. I can't get rid of this bad sector. I have used the ATA secure erase command via hdparm and get the same results. Very weird, I'd expect a write error to occur.
>
>
>
> ### This is the entry from smartctl:
> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
> # 1  Extended offline    Completed: read failure       60%      1206         430197584
>
> ### Link to the full smartctl -x output
> https://docs.google.com/file/d/0B_2Asp8DGjJ9VmdIZVo4UzdGaEE/edit
>
>
> ###  This is the command I used to try to write zeros over it, and the result:
> # dd if=/dev/zero of=/dev/sda seek=430197584 count=1
> dd: writing to '/dev/sda': Input/output error
> 1+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 3.6149 s, 0.0 kB/s
>
> ### And this is the kernel message that appears as a result:
>
> [15110.142071] ata1.00: exception Emask 0x0 SAct 0x20000 SErr 0x0 action 0x0
> [15110.142079] ata1.00: irq_stat 0x40000008
> [15110.142084] ata1.00: failed command: READ FPDMA QUEUED
> [15110.142092] ata1.00: cmd 60/08:88:50:4b:a4/00:00:19:00:00/40 tag 17 ncq 4096 in
>          res 51/40:08:50:4b:a4/00:00:19:00:00/40 Emask 0x409 (media error) <F>
> [15110.142096] ata1.00: status: { DRDY ERR }
> [15110.142099] ata1.00: error: { UNC }
> [15110.144802] ata1.00: configured for UDMA/133
> [15110.144826] sd 0:0:0:0: [sda] Unhandled sense code
> [15110.144830] sd 0:0:0:0: [sda]
> [15110.144832] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [15110.144835] sd 0:0:0:0: [sda]
> [15110.144837] Sense Key : Medium Error [current] [descriptor]
> [15110.144841] Descriptor sense data with sense descriptors (in hex):
> [15110.144843]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> [15110.144854]         19 a4 4b 50
> [15110.144860] sd 0:0:0:0: [sda]
> [15110.144863] Add. Sense: Unrecovered read error - auto reallocate failed
> [15110.144865] sd 0:0:0:0: [sda] CDB:
> [15110.144867] Read(10): 28 00 19 a4 4b 50 00 00 08 00
> [15110.144892] end_request: I/O error, dev sda, sector 430197584
> [15110.144934] ata1: EH complete
>
> ### This is the complete dmesg
> https://docs.google.com/file/d/0B_2Asp8DGjJ9c3hfelQyTnNoMU0/edit
>
> At first I thought it was because I'm writing one 512 byte logical sector, but this drive has 4096 physical sectors. OK so I write out 8 logical sectors instead, still get a read error. If I do this, to put the bad sector in the middle of a 1MB write:
>
> # dd if=/dev/zero of=/dev/sda seek=430196560 count=2048
> dd: writing to '/dev/sda': Input/output error
> 1025+0 records in
> 1024+0 records out
>
> It stops right at LBA 430197584, again with a read error. So even though the drive SMART health assessment is "pass" and there are no other SMART values below threshold indicating "works as designed" this drive has effectively failed because any write operation to this LBA results in unrecoverable failure.
>
> Anyway I find this confusing and unexpected.
>
>
> Chris Murphy--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-07  0:11 writing zeros to bad sector results in persistent read error Chris Murphy
  2014-06-07  1:26 ` Roger Heflin
@ 2014-06-07  1:51 ` Roman Mamedov
  2014-06-07 16:42   ` Chris Murphy
                     ` (2 more replies)
  2014-06-10 22:18 ` Eyal Lebedinsky
  2 siblings, 3 replies; 20+ messages in thread
From: Roman Mamedov @ 2014-06-07  1:51 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-raid@vger.kernel.org List

[-- Attachment #1: Type: text/plain, Size: 793 bytes --]

On Fri, 6 Jun 2014 18:11:03 -0600
Chris Murphy <lists@colorremedies.com> wrote:

> # dd if=/dev/zero of=/dev/sda seek=430196560 count=2048
> dd: writing to ‘/dev/sda’: Input/output error
> 1025+0 records in
> 1024+0 records out
> 
> It stops right at LBA 430197584, again with a read error. So even though the drive SMART health assessment is "pass" and there are no other SMART values below threshold indicating "works as designed" this drive has effectively failed because any write operation to this LBA results in unrecoverable failure.

Hello,

Try again with "oflag=direct";

If that doesn't help, remember this is a 4K-sector drive, maybe you should
retry with bs=4096 (recalculating the offset so it still writes to the proper
place).

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-07  1:51 ` Roman Mamedov
@ 2014-06-07 16:42   ` Chris Murphy
  2014-06-07 18:26   ` Chris Murphy
  2014-06-08  0:52   ` Chris Murphy
  2 siblings, 0 replies; 20+ messages in thread
From: Chris Murphy @ 2014-06-07 16:42 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org List; +Cc: Roman Mamedov, Roger Heflin

I "embargoed" the bad sector with partitioning to get the user back up and running. In the course of an OS X install, it managed to create a "recovery/boot" partition right on top of the bad sector. It no longer returns a read error for that sector. Clearly it was fixed with whatever write command the installer used, and dd as I used it just does something different and fails. There are more pending sectors so once I find their LBAs with SMART offline testing I'll try the other mentioned techniques.

What I'm still confused about is, an ATA secure erase had been done and yet the Current_Pending_Sector count was still 48, before and after the secure erase. That tells me that secure erase is just about zeroing and the drive firmware isn't actually confirming if the writes were successful. Tragic. I'd have thought it would write such sectors, confirm they're bad, and remove them from use in one whack, this is apparently not the case.

On Jun 6, 2014, at 7:51 PM, Roman Mamedov <rm@romanrm.net> wrote:
> Try again with "oflag=direct";
> 
> If that doesn't help, remember this is a 4K-sector drive, maybe you should
> retry with bs=4096 (recalculating the offset so it still writes to the proper
> place).

# smartctl -t select,430195536-max /dev/sda

The next bad LBA reported by SMART is 430235856.

# dd if=/dev/sda skip=430235856 count=1 | hexdump -C
dd: error reading ‘/dev/sda’: Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 6.97353 s, 0.0 kB/s

# dd if=/dev/zero of=/dev/sda seek=430235856 count=1
dd: writing to ‘/dev/sda’: Input/output error
1+0 records in
0+0 records out
0 bytes (0 B) copied, 3.69641 s, 0.0 kB/s

# dd if=/dev/zero of=/dev/sda seek=430235856 count=8
8+0 records in
8+0 records out
4096 bytes (4.1 kB) copied, 2.50232 s, 1.6 kB/s

# dd if=/dev/sda skip=430235856 count=1 | hexdump -C
1+0 records in
1+0 records out
512 bytes (512 B) copied00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
, 0.287629 s, 1.8 kB/s
*
00000200

###This command with count=8 worked. I don't know why it worked this time, it didn't with the earlier LBA.  When I issue the read command above piped through hexdump that had failed, now it works. Further when I check smart attributes, the current pending sector count has dropped by a value of 8. That seems conclusive the bad sector has been remapped.

So I'll keep doing selective offline tests to find bad sectors, and write to them this way and report back.

Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-07  1:51 ` Roman Mamedov
  2014-06-07 16:42   ` Chris Murphy
@ 2014-06-07 18:26   ` Chris Murphy
  2014-06-08  0:52   ` Chris Murphy
  2 siblings, 0 replies; 20+ messages in thread
From: Chris Murphy @ 2014-06-07 18:26 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org List; +Cc: Roman Mamedov, Roger Heflin

OK the selective offline is done, and now this is damn strange.

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Selective offline   Completed without error       00%      1212         -

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       24

How can there still be pending bad sectors, and yet no error and LBA reported?


Chris Murphy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-07  1:51 ` Roman Mamedov
  2014-06-07 16:42   ` Chris Murphy
  2014-06-07 18:26   ` Chris Murphy
@ 2014-06-08  0:52   ` Chris Murphy
  2014-06-08  1:50     ` Roger Heflin
                       ` (2 more replies)
  2 siblings, 3 replies; 20+ messages in thread
From: Chris Murphy @ 2014-06-08  0:52 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org List; +Cc: Roman Mamedov, Roger Heflin

I wrote:
> How can there still be pending bad sectors, and yet no error and LBA reported?

So I started another -t long test. And it comes up with an LBA not previously reported.

# 1  Extended offline    Completed: read failure       60%      1214         430234064

# dd if=/dev/zero of=/dev/sda seek=430234064 count=8
dd: writing to ‘/dev/sda’: Input/output error
1+0 records in
0+0 records out
0 bytes (0 B) copied, 3.63342 s, 0.0 kB/s

On this sector the technique fails.

# dd if=/dev/zero of=/dev/sda seek=430234064 count=8 oflag=direct
8+0 records in
8+0 records out
4096 bytes (4.1 kB) copied, 3.73824 s, 1.1 kB/s

This technique works.

However, this seems like a contradiction. A complete -t long results in:

# 1  Extended offline    Completed without error       00%      1219         -

and yet

197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       16

How are there 16 pending sectors, with no errors found during the extended offline test? In order to fix this without SMART reporting the affected LBAs, I'd have to write to every sector on the drive. This seems like bad design or implementation.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-08  0:52   ` Chris Murphy
@ 2014-06-08  1:50     ` Roger Heflin
  2014-06-08 21:50       ` Chris Murphy
  2014-06-08  8:10     ` Wilson Jonathan
  2014-06-09 19:37     ` Wolfgang Denk
  2 siblings, 1 reply; 20+ messages in thread
From: Roger Heflin @ 2014-06-08  1:50 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-raid@vger.kernel.org List, Roman Mamedov

Check messages file and see if it has in the last few weeks reporting
sectors bad.

Or do a dd if=/dev/sda of=/dev/null read test until it hits something,
then correct it, then continue on.

Or do repeated long/selective tests to see if you can find them.

Though, I had a seagate disk that I was able to get all of the pending
to be fixed, I had to remove the disk from the raid as it still would
randomly pause for 7 seconds while reading sectors that were not yet
classified as pending.   I tried a number of things to try to get the
disk to behave and/or replace those bad sectors, but finally gave up
on that disk and just replaced it (out of warranty) as I could not
ever get it to behave right.

On Sat, Jun 7, 2014 at 7:52 PM, Chris Murphy <lists@colorremedies.com> wrote:
> I wrote:
>> How can there still be pending bad sectors, and yet no error and LBA reported?
>
> So I started another -t long test. And it comes up with an LBA not previously reported.
>
> # 1  Extended offline    Completed: read failure       60%      1214         430234064
>
> # dd if=/dev/zero of=/dev/sda seek=430234064 count=8
> dd: writing to '/dev/sda': Input/output error
> 1+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 3.63342 s, 0.0 kB/s
>
> On this sector the technique fails.
>
> # dd if=/dev/zero of=/dev/sda seek=430234064 count=8 oflag=direct
> 8+0 records in
> 8+0 records out
> 4096 bytes (4.1 kB) copied, 3.73824 s, 1.1 kB/s
>
> This technique works.
>
> However, this seems like a contradiction. A complete -t long results in:
>
> # 1  Extended offline    Completed without error       00%      1219         -
>
> and yet
>
> 197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       16
>
> How are there 16 pending sectors, with no errors found during the extended offline test? In order to fix this without SMART reporting the affected LBAs, I'd have to write to every sector on the drive. This seems like bad design or implementation.
>
> Chris Murphy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-08  0:52   ` Chris Murphy
  2014-06-08  1:50     ` Roger Heflin
@ 2014-06-08  8:10     ` Wilson Jonathan
  2014-06-10  0:09       ` Chris Murphy
  2014-06-09 19:37     ` Wolfgang Denk
  2 siblings, 1 reply; 20+ messages in thread
From: Wilson Jonathan @ 2014-06-08  8:10 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-raid@vger.kernel.org List, Roman Mamedov, Roger Heflin

On Sat, 2014-06-07 at 18:52 -0600, Chris Murphy wrote:
> I wrote:
> > How can there still be pending bad sectors, and yet no error and LBA reported?
> 
> So I started another -t long test. And it comes up with an LBA not previously reported.
> 
> # 1  Extended offline    Completed: read failure       60%      1214         430234064
> 
> # dd if=/dev/zero of=/dev/sda seek=430234064 count=8
> dd: writing to ‘/dev/sda’: Input/output error
> 1+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 3.63342 s, 0.0 kB/s
> 
> On this sector the technique fails.
> 
> # dd if=/dev/zero of=/dev/sda seek=430234064 count=8 oflag=direct
> 8+0 records in
> 8+0 records out
> 4096 bytes (4.1 kB) copied, 3.73824 s, 1.1 kB/s

I may be missing something here, but surely after all this faffing about
and errors isn't it about time to replicate the data to a new drive and
then hit this one repeatedly with a very large hammer.

The law of diminishing returns must surely be coming into play by now.

> 
> This technique works.
> 
> However, this seems like a contradiction. A complete -t long results in:
> 
> # 1  Extended offline    Completed without error       00%      1219         -
> 
> and yet
> 
> 197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       16
> 
> How are there 16 pending sectors, with no errors found during the extended offline test? In order to fix this without SMART reporting the affected LBAs, I'd have to write to every sector on the drive. This seems like bad design or implementation.
> 
> Chris Murphy--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-08  1:50     ` Roger Heflin
@ 2014-06-08 21:50       ` Chris Murphy
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Murphy @ 2014-06-08 21:50 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org List


On Jun 7, 2014, at 7:50 PM, Roger Heflin <rogerheflin@gmail.com> wrote:

> Check messages file and see if it has in the last few weeks reporting
> sectors bad.

No errors except the Current Pending sectors reported by smartd which dumps into the journal.
> 
> Or do a dd if=/dev/sda of=/dev/null read test until it hits something,
> then correct it, then continue on.

No errors.

> 
> Or do repeated long/selective tests to see if you can find them.

No (additional) errors.

> 
> Though, I had a seagate disk that I was able to get all of the pending
> to be fixed, I had to remove the disk from the raid as it still would
> randomly pause for 7 seconds while reading sectors that were not yet
> classified as pending.   I tried a number of things to try to get the
> disk to behave and/or replace those bad sectors, but finally gave up
> on that disk and just replaced it (out of warranty) as I could not
> ever get it to behave right.

I think this drive isn't behaving correctly, to say there are pending sectors yet passes the extended self-test.


Chris Murphy


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-08  0:52   ` Chris Murphy
  2014-06-08  1:50     ` Roger Heflin
  2014-06-08  8:10     ` Wilson Jonathan
@ 2014-06-09 19:37     ` Wolfgang Denk
  2014-06-10  2:48       ` Chris Murphy
  2 siblings, 1 reply; 20+ messages in thread
From: Wolfgang Denk @ 2014-06-09 19:37 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-raid@vger.kernel.org List, Roman Mamedov, Roger Heflin

Dear Chris,

In message <0E76B97E-96DF-43A3-B8EC-4867964BF8E9@colorremedies.com> you wrote:
>
> # dd if=/dev/zero of=/dev/sda seek=430234064 count=8 oflag=direct
> 8+0 records in
> 8+0 records out
> 4096 bytes (4.1 kB) copied, 3.73824 s, 1.1 kB/s

This has been pointed out before - if this is a 4k sector drive, then
you should really write in units of 4 k, not 8 x 512 bytes as you do
here.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
"Security is mostly a superstition. It does not  exist  in  nature...
Life is either a daring adventure or nothing."         - Helen Keller

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-08  8:10     ` Wilson Jonathan
@ 2014-06-10  0:09       ` Chris Murphy
  2014-06-10  6:52         ` Wilson Jonathan
  2014-10-08 17:56         ` Phillip Susi
  0 siblings, 2 replies; 20+ messages in thread
From: Chris Murphy @ 2014-06-10  0:09 UTC (permalink / raw)
  To: Wilson Jonathan; +Cc: linux-raid@vger.kernel.org List

On Jun 8, 2014, at 2:10 AM, Wilson Jonathan <piercing_male@hotmail.com> wrote:

> On Sat, 2014-06-07 at 18:52 -0600, Chris Murphy wrote:
>> I wrote:
>>> How can there still be pending bad sectors, and yet no error and LBA reported?
>> 
>> So I started another -t long test. And it comes up with an LBA not previously reported.
>> 
>> # 1  Extended offline    Completed: read failure       60%      1214         430234064
>> 
>> # dd if=/dev/zero of=/dev/sda seek=430234064 count=8
>> dd: writing to ‘/dev/sda’: Input/output error
>> 1+0 records in
>> 0+0 records out
>> 0 bytes (0 B) copied, 3.63342 s, 0.0 kB/s
>> 
>> On this sector the technique fails.
>> 
>> # dd if=/dev/zero of=/dev/sda seek=430234064 count=8 oflag=direct
>> 8+0 records in
>> 8+0 records out
>> 4096 bytes (4.1 kB) copied, 3.73824 s, 1.1 kB/s
> 
> I may be missing something here, but surely after all this faffing about
> and errors isn't it about time to replicate the data to a new drive and
> then hit this one repeatedly with a very large hammer.
> 
> The law of diminishing returns must surely be coming into play by now.

No the question here isn't what's the right course of action from this point. This is an academic question: whether the reported behavior(s) are as designed.

From an enterprise perspective, my understanding is even one bad sector is disqualifying and the drive goes back to the manufacturer if it's under warranty; or otherwise demoted for less important use if it's not.

For consumer drives, which this is, all the manufacturers will say the drive is functioning as designed with bad sectors *if* they're being reallocated. Maybe some of them won't quibble and will send a replacement drive anyway.

But what I'm reporting is an instance where an ATA Secure Erase definitely did not fix up a single one of the bad sectors. Maybe that's consistent with the spec, I don't know, but it's not what I'd expect seeing as every sector, those with an without LBA's assigned, are overwritten. Yet pending sectors were not remapped. Further, with all sectors overwritten by software (not merely the ATA Secure Erase command) yields no errors yet SMART reports there are still pending sectors, yet it's own extended test says there are none. I think that's bad behavior. But perhaps I don't understand the design and it's actually working as designed.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-09 19:37     ` Wolfgang Denk
@ 2014-06-10  2:48       ` Chris Murphy
  2014-06-10 13:40         ` Phil Turmel
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Murphy @ 2014-06-10  2:48 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid@vger.kernel.org List


On Jun 9, 2014, at 1:37 PM, Wolfgang Denk <wd@denx.de> wrote:

> Dear Chris,
> 
> In message <0E76B97E-96DF-43A3-B8EC-4867964BF8E9@colorremedies.com> you wrote:
>> 
>> # dd if=/dev/zero of=/dev/sda seek=430234064 count=8 oflag=direct
>> 8+0 records in
>> 8+0 records out
>> 4096 bytes (4.1 kB) copied, 3.73824 s, 1.1 kB/s
> 
> This has been pointed out before - if this is a 4k sector drive, then
> you should really write in units of 4 k, not 8 x 512 bytes as you do
> here.

It worked so, why? The drive interface only accepts LBAs based on 512 byte sectors, so bs=512 count=8 is the same as bs=4096 count=1, it has to get translated into 512 byte LBAs regardless. If it were a 4096 byte logical sector drive I'd agree.

Chris Murphy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-10  0:09       ` Chris Murphy
@ 2014-06-10  6:52         ` Wilson Jonathan
  2014-10-08 17:56         ` Phillip Susi
  1 sibling, 0 replies; 20+ messages in thread
From: Wilson Jonathan @ 2014-06-10  6:52 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-raid@vger.kernel.org List

On Mon, 2014-06-09 at 18:09 -0600, Chris Murphy wrote:
> On Jun 8, 2014, at 2:10 AM, Wilson Jonathan <piercing_male@hotmail.com> wrote:
> 
> > On Sat, 2014-06-07 at 18:52 -0600, Chris Murphy wrote:
> >> I wrote:
> >>> How can there still be pending bad sectors, and yet no error and LBA reported?
> >> 
> >> So I started another -t long test. And it comes up with an LBA not previously reported.
> >> 
> >> # 1  Extended offline    Completed: read failure       60%      1214         430234064
> >> 
> >> # dd if=/dev/zero of=/dev/sda seek=430234064 count=8
> >> dd: writing to ‘/dev/sda’: Input/output error
> >> 1+0 records in
> >> 0+0 records out
> >> 0 bytes (0 B) copied, 3.63342 s, 0.0 kB/s
> >> 
> >> On this sector the technique fails.
> >> 
> >> # dd if=/dev/zero of=/dev/sda seek=430234064 count=8 oflag=direct
> >> 8+0 records in
> >> 8+0 records out
> >> 4096 bytes (4.1 kB) copied, 3.73824 s, 1.1 kB/s
> > 
> > I may be missing something here, but surely after all this faffing about
> > and errors isn't it about time to replicate the data to a new drive and
> > then hit this one repeatedly with a very large hammer.
> > 
> > The law of diminishing returns must surely be coming into play by now.
> 
> No the question here isn't what's the right course of action from this point. This is an academic question: whether the reported behavior(s) are as designed.
> 
> From an enterprise perspective, my understanding is even one bad sector is disqualifying and the drive goes back to the manufacturer if it's under warranty; or otherwise demoted for less important use if it's not.
> 
> For consumer drives, which this is, all the manufacturers will say the drive is functioning as designed with bad sectors *if* they're being reallocated. Maybe some of them won't quibble and will send a replacement drive anyway.
> 
> But what I'm reporting is an instance where an ATA Secure Erase definitely did not fix up a single one of the bad sectors. Maybe that's consistent with the spec, I don't know, but it's not what I'd expect seeing as every sector, those with an without LBA's assigned, are overwritten. Yet pending sectors were not remapped. Further, with all sectors overwritten by software (not merely the ATA Secure Erase command) yields no errors yet SMART reports there are still pending sectors, yet it's own extended test says there are none. I think that's bad behavior. But perhaps I don't understand the design and it's actually working as designed.
> 

Thanks for the clarification over this being an academic question rather
than a live, real world disk is playing up, what should I do type
question.

Your mention of the secure erase not, or seemingly not, re-mapping does
raise an important question. If it does indeed remap, due to data
change, but does not change the count then its an oddity, however if
secure erase fails to remap and does not change the data at all then
there is the potential for live data to be recoverable (although it
would be a small amount) in some form, perhaps by directly driving the
disk (as apposed to os control) or even moving the platters to another
disk case or some other method dependent on the time and effort involved
being worth it. 

> 
> Chris Murphy


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-10  2:48       ` Chris Murphy
@ 2014-06-10 13:40         ` Phil Turmel
  2014-06-29  0:05           ` Chris Murphy
  0 siblings, 1 reply; 20+ messages in thread
From: Phil Turmel @ 2014-06-10 13:40 UTC (permalink / raw)
  To: Chris Murphy, Wolfgang Denk, Roman Mamedov
  Cc: linux-raid@vger.kernel.org List

On 06/09/2014 10:48 PM, Chris Murphy wrote:
> 
> On Jun 9, 2014, at 1:37 PM, Wolfgang Denk <wd@denx.de> wrote:
> 
>> Dear Chris,
>> 
>> In message
>> <0E76B97E-96DF-43A3-B8EC-4867964BF8E9@colorremedies.com> you
>> wrote:
>>> 
>>> # dd if=/dev/zero of=/dev/sda seek=430234064 count=8 oflag=direct
>>> 8+0 records in 8+0 records out 4096 bytes (4.1 kB) copied,
>>> 3.73824 s, 1.1 kB/s
>> 
>> This has been pointed out before - if this is a 4k sector drive, 
>> then you should really write in units of 4 k, not 8 x 512 bytes as 
>> you do here.
> 
> It worked so, why?

Because writing 512 bytes into a 4096 byte physical sector requires a
read-modify-write cycle.  That will fail if the physical sector is
unreadable.  If you try to overwrite a bad 4k sector with eight 512-byte
writes, each will trigger an RMW, and the 'R' of the RMW will fail for
all eight logical sectors.  If you tell dd to use a block size of 4k, a
single write will be created and passed to the drive encompassing all
eight logical sectors at once.  So the drive doesn't need an RMW
cycle--a write attempt can be made without the preceding read.  Then the
drive has the opportunity to complete its rewrite or remap logic.

> The drive interface only accepts LBAs based on 512 byte sectors, so 
> bs=512 count=8 is the same as bs=4096 count=1, it has to get
> translated into 512 byte LBAs regardless.

The sector address does have to be translated to 512-byte LBAs.  That
has nothing to do with the *size* of each write.  So *NO*, it is *not*
the same.

"dd" is a terrible tool, except when it is perfect.  As a general rule,
if you aren't specifying 'bs=' every time you use it, you've messed up.
 And if you specify 'direct', remember that each block sized read or
write issued by dd will have to *complete* through the whole driver
stack before dd will issue the next one.

> If it were a 4096 byte logical sector drive I'd agree.

You do know that drives are physically incapable of writing partial
sectors?  It has to be emulated, either in drive firmware or OS driver
stack.  What you've written suggests you've missed that basic reality.
The rest is operator error.  Roman and Wolfgang were too polite when
pointing out the need for bs=4096 -- it isn't 'should', it is 'must'.

As for the secure erase, I too am surprised that it didn't take care of
pending errors.  But I am *not* surprised that that new errors were
discovered shortly after, as pending errors are only ever discovered
when *reading*.

HTH,

Phil

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-07  0:11 writing zeros to bad sector results in persistent read error Chris Murphy
  2014-06-07  1:26 ` Roger Heflin
  2014-06-07  1:51 ` Roman Mamedov
@ 2014-06-10 22:18 ` Eyal Lebedinsky
  2 siblings, 0 replies; 20+ messages in thread
From: Eyal Lebedinsky @ 2014-06-10 22:18 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org List

Related while not exactly on-topic: Is there a way to list all the pending sectors (rather
than just the first one failing during the extended test)? And the list of bad sectors?

I am asking about the lists kept by the disk, not the logical list kept by software raid.


TIA

On 06/07/14 10:11, Chris Murphy wrote:
> This is a bit off topic as it doesn't involved md raid. But bad sectors are common sources of md raid problems, so I figured I'd post this here.
>
> Summary: Hitachi/HGST Travelstar 5K750. smartctl will not complete an extended offline test, it stops 60% remaining reporting the LBA of the first error. Whether I use dd to read that LBA, or write zeros to it, or to a 1MB block surrounding it, I always get back a read error. Not a write error. I can't get rid of this bad sector. I have used the ATA secure erase command via hdparm and get the same results. Very weird, I'd expect a write error to occur.
>
>
>
> ### This is the entry from smartctl:
> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
> # 1  Extended offline    Completed: read failure       60%      1206         430197584
>
> ### Link to the full smartctl -x output
> https://docs.google.com/file/d/0B_2Asp8DGjJ9VmdIZVo4UzdGaEE/edit
>
>
> ###  This is the command I used to try to write zeros over it, and the result:
> # dd if=/dev/zero of=/dev/sda seek=430197584 count=1
> dd: writing to �/dev/sda�: Input/output error
> 1+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 3.6149 s, 0.0 kB/s
>
> ### And this is the kernel message that appears as a result:
>
> [15110.142071] ata1.00: exception Emask 0x0 SAct 0x20000 SErr 0x0 action 0x0
> [15110.142079] ata1.00: irq_stat 0x40000008
> [15110.142084] ata1.00: failed command: READ FPDMA QUEUED
> [15110.142092] ata1.00: cmd 60/08:88:50:4b:a4/00:00:19:00:00/40 tag 17 ncq 4096 in
>           res 51/40:08:50:4b:a4/00:00:19:00:00/40 Emask 0x409 (media error) <F>
> [15110.142096] ata1.00: status: { DRDY ERR }
> [15110.142099] ata1.00: error: { UNC }
> [15110.144802] ata1.00: configured for UDMA/133
> [15110.144826] sd 0:0:0:0: [sda] Unhandled sense code
> [15110.144830] sd 0:0:0:0: [sda]
> [15110.144832] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [15110.144835] sd 0:0:0:0: [sda]
> [15110.144837] Sense Key : Medium Error [current] [descriptor]
> [15110.144841] Descriptor sense data with sense descriptors (in hex):
> [15110.144843]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> [15110.144854]         19 a4 4b 50
> [15110.144860] sd 0:0:0:0: [sda]
> [15110.144863] Add. Sense: Unrecovered read error - auto reallocate failed
> [15110.144865] sd 0:0:0:0: [sda] CDB:
> [15110.144867] Read(10): 28 00 19 a4 4b 50 00 00 08 00
> [15110.144892] end_request: I/O error, dev sda, sector 430197584
> [15110.144934] ata1: EH complete
>
> ### This is the complete dmesg
> https://docs.google.com/file/d/0B_2Asp8DGjJ9c3hfelQyTnNoMU0/edit
>
> At first I thought it was because I'm writing one 512 byte logical sector, but this drive has 4096 physical sectors. OK so I write out 8 logical sectors instead, still get a read error. If I do this, to put the bad sector in the middle of a 1MB write:
>
> # dd if=/dev/zero of=/dev/sda seek=430196560 count=2048
> dd: writing to �/dev/sda�: Input/output error
> 1025+0 records in
> 1024+0 records out
>
> It stops right at LBA 430197584, again with a read error. So even though the drive SMART health assessment is "pass" and there are no other SMART values below threshold indicating "works as designed" this drive has effectively failed because any write operation to this LBA results in unrecoverable failure.
>
> Anyway I find this confusing and unexpected.
>
>
> Chris Murphy--

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-10 13:40         ` Phil Turmel
@ 2014-06-29  0:05           ` Chris Murphy
  2014-06-29 23:50             ` Martin K. Petersen
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Murphy @ 2014-06-29  0:05 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Wolfgang Denk, Roman Mamedov, linux-raid@vger.kernel.org List


On Jun 10, 2014, at 7:40 AM, Phil Turmel <philip@turmel.org> wrote:

> On 06/09/2014 10:48 PM, Chris Murphy wrote:
>> 
>> On Jun 9, 2014, at 1:37 PM, Wolfgang Denk <wd@denx.de> wrote:
>> 
>>> Dear Chris,
>>> 
>>> In message
>>> <0E76B97E-96DF-43A3-B8EC-4867964BF8E9@colorremedies.com> you
>>> wrote:
>>>> 
>>>> # dd if=/dev/zero of=/dev/sda seek=430234064 count=8 oflag=direct
>>>> 8+0 records in 8+0 records out 4096 bytes (4.1 kB) copied,
>>>> 3.73824 s, 1.1 kB/s
>>> 
>>> This has been pointed out before - if this is a 4k sector drive, 
>>> then you should really write in units of 4 k, not 8 x 512 bytes as 
>>> you do here.
>> 
>> It worked so, why?
> 
> Because writing 512 bytes into a 4096 byte physical sector requires a
> read-modify-write cycle.  That will fail if the physical sector is
> unreadable.  If you try to overwrite a bad 4k sector with eight 512-byte
> writes, each will trigger an RMW, and the 'R' of the RMW will fail for
> all eight logical sectors.  If you tell dd to use a block size of 4k, a
> single write will be created and passed to the drive encompassing all
> eight logical sectors at once.  So the drive doesn't need an RMW
> cycle--a write attempt can be made without the preceding read.  Then the
> drive has the opportunity to complete its rewrite or remap logic.

By doing some SCSI command tracing with the kernel, I've learned some things about this. Whether the drive has 512 byte or 4096 byte sectors has no bearing on the actual command issued to the drive. But the use of oflag=direct does change the behavior at the SCSI layer (for both drive types).

http://www.fpaste.org/114087/
[1]

The following commands all produce the same single write command to both types of drives:

# dd if=/dev/zero of=/dev/sdb bs=512 count=8
# dd if=/dev/zero of=/dev/sdb bs=4096 count=1
# dd if=/dev/zero of=/dev/sdb bs=4096 count=1 oflag=direct

The SCSI layer is clearly combining the bs=512 count=8 into a single write command. This is inhibited with oflag=direct.

I also found intermittent issuance of READ_10 to the drive, before WRITE_10, but wasn't able to figure out why it's intermittant. Maybe dd issues READ_10 the first time it's going to write to sector, and it was the READ_10 command triggering the read failure from the drive, preventing the WRITE_10 from even being issued. I can't test this because the drive no longer reports LBAs for any bad sectors.

> 
>> The drive interface only accepts LBAs based on 512 byte sectors, so 
>> bs=512 count=8 is the same as bs=4096 count=1, it has to get
>> translated into 512 byte LBAs regardless.
> 
> The sector address does have to be translated to 512-byte LBAs.  That
> has nothing to do with the *size* of each write.  So *NO*, it is *not*
> the same.

These two dd commands definitely result in the same write command for the same size (txlen=8) to the drive being issued by the SCSI layer:
# dd if=/dev/zero of=/dev/sdb bs=512 count=8
# dd if=/dev/zero of=/dev/sdb bs=4096 count=1


> "dd" is a terrible tool, except when it is perfect.  As a general rule,
> if you aren't specifying 'bs=' every time you use it, you've messed up.

I get the same WRITE_10 command for these two commands:

# dd if=/dev/zero of=/dev/sdb count=8
# dd if=/dev/zero of=/dev/sdb bs=4096 count=1

> And if you specify 'direct', remember that each block sized read or
> write issued by dd will have to *complete* through the whole driver
> stack before dd will issue the next one.

That's consistent with the tracing results.


> 
>> If it were a 4096 byte logical sector drive I'd agree.
> 
> You do know that drives are physically incapable of writing partial
> sectors?  It has to be emulated, either in drive firmware or OS driver
> stack.  What you've written suggests you've missed that basic reality.
> The rest is operator error.  Roman and Wolfgang were too polite when
> pointing out the need for bs=4096 -- it isn't 'should', it is 'must'.

That's true for oflag=direct, it's not true without it.

Also included for interest is the result of issue an hdparm write command. It works without a size specification, so I don't actually know what happens on the drive itself, plus the command that gets issued to the drive isn't "WRITE_10" but "ATA_16".


> As for the secure erase, I too am surprised that it didn't take care of
> pending errors.  But I am *not* surprised that that new errors were
> discovered shortly after, as pending errors are only ever discovered
> when *reading*.

SMART read the whole drive and said no errors found, even though current pending still reports a non-zero value. I think that is surprising.


Chris Murphy




[1]
Formats better in fpaste once clicking on Wrap. But I'll post the raw data here in case someone looks at this more than a month from now.
512/512

# dd if=/dev/zero of=/dev/sdb bs=512 count=8
              dd-891   [000] ....   550.352639: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=0 txlen=8 protect=0 raw=2a 00 00 00 00 00 00 00 08 00)
              
# dd if=/dev/zero of=/dev/sdb bs=4096 count=1
              dd-894   [000] ....   566.506562: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=0 txlen=8 protect=0 raw=2a 00 00 00 00 00 00 00 08 00)
            
# dd if=/dev/zero of=/dev/sdb bs=512 count=8 oflag=direct
              dd-1042  [000] .... 10261.418019: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=0 txlen=1 protect=0 raw=2a 00 00 00 00 00 00 00 01 00)
              dd-1042  [000] .... 10261.418294: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=1 txlen=1 protect=0 raw=2a 00 00 00 00 01 00 00 01 00)
              dd-1042  [000] .... 10261.418650: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=2 txlen=1 protect=0 raw=2a 00 00 00 00 02 00 00 01 00)
              dd-1042  [000] .... 10261.419006: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=3 txlen=1 protect=0 raw=2a 00 00 00 00 03 00 00 01 00)
              dd-1042  [000] .... 10261.419203: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=4 txlen=1 protect=0 raw=2a 00 00 00 00 04 00 00 01 00)
              dd-1042  [000] .... 10261.419365: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=5 txlen=1 protect=0 raw=2a 00 00 00 00 05 00 00 01 00)
              dd-1042  [000] .... 10261.419527: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=6 txlen=1 protect=0 raw=2a 00 00 00 00 06 00 00 01 00)
              dd-1042  [000] .... 10261.419766: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=7 txlen=1 protect=0 raw=2a 00 00 00 00 07 00 00 01 00)

# dd if=/dev/zero of=/dev/sdb bs=4096 count=1 oflag=direct
              dd-1045  [001] .... 10337.899923: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=0 txlen=8 protect=0 raw=2a 00 00 00 00 00 00 00 08 00)


512/4096

# dd if=/dev/zero of=/dev/sdb bs=512 count=8

              dd-1814  [002] ...1   530.285126: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=431467112 txlen=8 protect=0 raw=2a 00 19 b7 aa 68 00 00 08 00)

# dd if=/dev/zero of=/dev/sdb bs=4096 count=1

              dd-1881  [002] ...1  1094.707870: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=431467112 txlen=8 protect=0 raw=2a 00 19 b7 aa 68 00 00 08 00)


# dd if=/dev/zero of=/dev/sdb bs=512 count=8 oflag=direct

              dd-1890  [003] ...1  1255.136864: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=431467112 txlen=1 protect=0 raw=2a 00 19 b7 aa 68 00 00 01 00)
              dd-1890  [002] ...1  1255.422802: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=431467113 txlen=1 protect=0 raw=2a 00 19 b7 aa 69 00 00 01 00)
              dd-1890  [002] ...1  1255.423167: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=431467114 txlen=1 protect=0 raw=2a 00 19 b7 aa 6a 00 00 01 00)
              dd-1890  [002] ...1  1255.423386: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=431467115 txlen=1 protect=0 raw=2a 00 19 b7 aa 6b 00 00 01 00)
              dd-1890  [000] ...1  1255.423625: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=431467116 txlen=1 protect=0 raw=2a 00 19 b7 aa 6c 00 00 01 00)
              dd-1890  [002] ...1  1255.423921: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=431467117 txlen=1 protect=0 raw=2a 00 19 b7 aa 6d 00 00 01 00)
              dd-1890  [002] ...1  1255.424110: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=431467118 txlen=1 protect=0 raw=2a 00 19 b7 aa 6e 00 00 01 00)
              dd-1890  [002] ...1  1255.424309: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=431467119 txlen=1 protect=0 raw=2a 00 19 b7 aa 6f 00 00 01 00)

# dd if=/dev/zero of=/dev/sdb bs=4096 count=1 oflag=direct

              dd-1895  [002] ...1  1388.656777: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=SCSI_PROT_NORMAL cmnd=(WRITE_10 lba=431467112 txlen=8 protect=0 raw=2a 00 19 b7 aa 68 00 00 08 00)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-29  0:05           ` Chris Murphy
@ 2014-06-29 23:50             ` Martin K. Petersen
  2014-06-30  0:51               ` Roger Heflin
  0 siblings, 1 reply; 20+ messages in thread
From: Martin K. Petersen @ 2014-06-29 23:50 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Phil Turmel, Wolfgang Denk, Roman Mamedov,
	linux-raid@vger.kernel.org List

>>>>> "Chris" == Chris Murphy <lists@colorremedies.com> writes:

Chris,

Chris> The SCSI layer is clearly combining the bs=512 count=8 into a
Chris> single write command. This is inhibited with oflag=direct.

It's not really the SCSI layer that does any of this but the VM and/or
the I/O scheduler (depending on how things were submitted).

Chris> I also found intermittent issuance of READ_10 to the drive,
Chris> before WRITE_10, but wasn't able to figure out why it's
Chris> intermittant. 

It's either the page cache doing readahead or you doing partial writes
to uncached pages.

You can flush the page cache like this:

	echo 3 > /proc/sys/vm/drop_caches

>> You do know that drives are physically incapable of writing partial
>> sectors?  It has to be emulated, either in drive firmware or OS
>> driver stack.  What you've written suggests you've missed that basic
>> reality.  The rest is operator error.  Roman and Wolfgang were too
>> polite when pointing out the need for bs=4096 -- it isn't 'should',
>> it is 'must'.

Chris> That's true for oflag=direct, it's not true without it.

Correct.

In general, a buffered write() call in dd or any other userland app does
not have a 1:1 mapping with a SCSI WRITE command at the bottom of the
stack. The pages in question will simply be marked dirty and eventually
flushed to disk.

You can force a more block-centric behavior by using synchronous/direct
I/O.

Chris> Also included for interest is the result of issue an hdparm write
Chris> command. It works without a size specification, so I don't
Chris> actually know what happens on the drive itself, plus the command
Chris> that gets issued to the drive isn't "WRITE_10" but "ATA_16".

That's because the ATA command gets encapsulated in a SCSI command so it
can pass through the SCSI layer.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-29 23:50             ` Martin K. Petersen
@ 2014-06-30  0:51               ` Roger Heflin
  2014-10-08 17:51                 ` Phillip Susi
  0 siblings, 1 reply; 20+ messages in thread
From: Roger Heflin @ 2014-06-30  0:51 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Chris Murphy, Phil Turmel, Wolfgang Denk, Roman Mamedov,
	linux-raid@vger.kernel.org List

All of this is probably the reason that this command exists:

hdparm --write-sector <sectornum>

I believe it directly sends the scsi/ata layer commands.

On Sun, Jun 29, 2014 at 6:50 PM, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Chris" == Chris Murphy <lists@colorremedies.com> writes:
>
> Chris,
>
> Chris> The SCSI layer is clearly combining the bs=512 count=8 into a
> Chris> single write command. This is inhibited with oflag=direct.
>
> It's not really the SCSI layer that does any of this but the VM and/or
> the I/O scheduler (depending on how things were submitted).
>
> Chris> I also found intermittent issuance of READ_10 to the drive,
> Chris> before WRITE_10, but wasn't able to figure out why it's
> Chris> intermittant.
>
> It's either the page cache doing readahead or you doing partial writes
> to uncached pages.
>
> You can flush the page cache like this:
>
>         echo 3 > /proc/sys/vm/drop_caches
>
>>> You do know that drives are physically incapable of writing partial
>>> sectors?  It has to be emulated, either in drive firmware or OS
>>> driver stack.  What you've written suggests you've missed that basic
>>> reality.  The rest is operator error.  Roman and Wolfgang were too
>>> polite when pointing out the need for bs=4096 -- it isn't 'should',
>>> it is 'must'.
>
> Chris> That's true for oflag=direct, it's not true without it.
>
> Correct.
>
> In general, a buffered write() call in dd or any other userland app does
> not have a 1:1 mapping with a SCSI WRITE command at the bottom of the
> stack. The pages in question will simply be marked dirty and eventually
> flushed to disk.
>
> You can force a more block-centric behavior by using synchronous/direct
> I/O.
>
> Chris> Also included for interest is the result of issue an hdparm write
> Chris> command. It works without a size specification, so I don't
> Chris> actually know what happens on the drive itself, plus the command
> Chris> that gets issued to the drive isn't "WRITE_10" but "ATA_16".
>
> That's because the ATA command gets encapsulated in a SCSI command so it
> can pass through the SCSI layer.
>
> --
> Martin K. Petersen      Oracle Linux Engineering
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-30  0:51               ` Roger Heflin
@ 2014-10-08 17:51                 ` Phillip Susi
  0 siblings, 0 replies; 20+ messages in thread
From: Phillip Susi @ 2014-10-08 17:51 UTC (permalink / raw)
  To: Roger Heflin, Martin K. Petersen
  Cc: Chris Murphy, Phil Turmel, Wolfgang Denk, Roman Mamedov,
	linux-raid@vger.kernel.org List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 6/29/2014 8:51 PM, Roger Heflin wrote:
> All of this is probably the reason that this command exists:
> 
> hdparm --write-sector <sectornum>
> 
> I believe it directly sends the scsi/ata layer commands.

You end up with the same results as using dd ( with oflag=direct ), it
is just a matter of the path it takes to get there.

With dd, it calls write() to pass the data to the block layer, which
hands it to the scsi layer, which translates it into a scsi
WRITE_10/16 command, which hands it to libata which translates it into
an ata taskfile to be handed to the drive.

With hdparm --write-sector, it builds the ata taskfile, uses the SG_IO
ioctl to hand it to the block layer, which hands it down through the
scsi and libata layers which see that it needs no translation and it
goes to the drive unmodified.

The resulting taskfile the drive actually sees should be the same.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUNXmlAAoJEI5FoCIzSKrwMYwIAJj1c0pBxdOcCQAk4i26802S
/lbPHhY5Xu7wR5KbZSXeEazE/vgTT7mDgjWHoe6Vl9e+Ci90KJxSFgQXNNwcYtuK
V+UFrTyqiKAzfk8VbRj0kwxk1JuXRQesDlwCGUsBkjSO26pdhUVfxwP8I3JcOBQW
uKRmh8PE48iq7kDWQdtxve6IPnAj/VY8AubwRAaVAvZ3xsEUBlf7UAkvA4n3WvWN
mfO1VVWwv4zyZ6bEBoWfjj6//5C0R+q2TrnBDFD9pN/wY4TdAx0gtufiUWx0v5WG
NNzJ9tm5z2rNo/HNi4w1gHm0JLDhSky21sNX7KyY8/1tFjqa3KQT7iQ6vxk4UJM=
=Xv01
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: writing zeros to bad sector results in persistent read error
  2014-06-10  0:09       ` Chris Murphy
  2014-06-10  6:52         ` Wilson Jonathan
@ 2014-10-08 17:56         ` Phillip Susi
  1 sibling, 0 replies; 20+ messages in thread
From: Phillip Susi @ 2014-10-08 17:56 UTC (permalink / raw)
  To: Chris Murphy, Wilson Jonathan; +Cc: linux-raid@vger.kernel.org List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 6/9/2014 8:09 PM, Chris Murphy wrote:
> But what I'm reporting is an instance where an ATA Secure Erase 
> definitely did not fix up a single one of the bad sectors. Maybe 
> that's consistent with the spec, I don't know, but it's not what
> I'd expect seeing as every sector, those with an without LBA's
> assigned, are overwritten. Yet pending sectors were not remapped.
> Further, with all sectors overwritten by software (not merely the
> ATA Secure Erase command) yields no errors yet SMART reports there
> are still pending sectors, yet it's own extended test says there
> are none. I think that's bad behavior. But perhaps I don't
> understand the design and it's actually working as designed.

It sounds like what happened is the secure erase successfully rewrote
the sectors that were already flagged as pending, but did not
decrement the pending count.

FYI, rather than continuing to run a smart selftest to find one
sector, then use dd to fix it, and repeat, it would be much faster to
use the badblocks utility to read and rewrite the whole drive.  You
will want to make sure to use the correct sector size, and a
sufficiently large batch size for good performance.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUNXrjAAoJEI5FoCIzSKrw//cH/jgbli22/MmRpRXLOc0YJg8O
npSEI3fusBspMWhWS+a5SGRQQQjrfiK8mK8NkAC1VrX80zI8UcLkrBNVX1NQQ7eP
tgjJJJLN0BeQIk7RtAhO0rxajnZp19bBv7r8oRgWg9PRXrrxZHrXJNxHqUlANNsq
70blruORy3MbTqUk8QU4qXw/y5XduhRyJEX0SDogrQwI0xJqaUWPn5CQPQnKWydr
0q6evfdRVfLC2rg0AbQ1ksj+nRhTRkrUctXuNc/8GL4S6wR77bQwTXlyBn8E8Uec
T6lsCs5J43e2yyRtj3c0ZWcmyuZuwKbO4LHPAA4kYf9faHV/OEWPwlAHHVC1Ggo=
=2OuK
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-10-08 17:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-07  0:11 writing zeros to bad sector results in persistent read error Chris Murphy
2014-06-07  1:26 ` Roger Heflin
2014-06-07  1:51 ` Roman Mamedov
2014-06-07 16:42   ` Chris Murphy
2014-06-07 18:26   ` Chris Murphy
2014-06-08  0:52   ` Chris Murphy
2014-06-08  1:50     ` Roger Heflin
2014-06-08 21:50       ` Chris Murphy
2014-06-08  8:10     ` Wilson Jonathan
2014-06-10  0:09       ` Chris Murphy
2014-06-10  6:52         ` Wilson Jonathan
2014-10-08 17:56         ` Phillip Susi
2014-06-09 19:37     ` Wolfgang Denk
2014-06-10  2:48       ` Chris Murphy
2014-06-10 13:40         ` Phil Turmel
2014-06-29  0:05           ` Chris Murphy
2014-06-29 23:50             ` Martin K. Petersen
2014-06-30  0:51               ` Roger Heflin
2014-10-08 17:51                 ` Phillip Susi
2014-06-10 22:18 ` Eyal Lebedinsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.