All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linda Knippers <linda.knippers@hpe.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Subject: Re: [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax
Date: Fri, 5 May 2017 16:23:23 -0400	[thread overview]
Message-ID: <590CDF3B.8020303@hpe.com> (raw)
In-Reply-To: <CAPcyv4h9NfpF_n376u69FK5ddfJELBNhSaGFDrD5ku0LcMaxvw@mail.gmail.com>

On 05/05/2017 04:01 PM, Dan Williams wrote:
> On Fri, May 5, 2017 at 12:41 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
>> On 05/05/2017 03:14 PM, Dan Williams wrote:
>>> On Thu, May 4, 2017 at 2:49 PM, Dave Jiang <dave.jiang@intel.com> wrote:
>>>> Adding ndctl support that will allow clearing of bad blocks for a device.
>>>> Initial implementation will only support device dax devices. The ndctl
>>>> takes a device path and parameters of the starting bad block, and the number
>>>> of bad blocks to clear.
>>>>
>>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>>> ---
>>>>  Documentation/ndctl-clear-error.txt |   38 ++++++
>>>>  builtin.h                           |    1
>>>>  ndctl/Makefile.am                   |    1
>>>>  ndctl/clear-error.c                 |  239 +++++++++++++++++++++++++++++++++++
>>>>  ndctl/lib/libndctl.c                |   72 +++++++++++
>>>>  ndctl/lib/libndctl.sym              |    2
>>>>  ndctl/libndctl.h.in                 |   10 +
>>>>  ndctl/ndctl.c                       |    1
>>>>  8 files changed, 364 insertions(+)
>>>>  create mode 100644 Documentation/ndctl-clear-error.txt
>>>>  create mode 100644 ndctl/clear-error.c
>>>>
>>>> diff --git a/Documentation/ndctl-clear-error.txt b/Documentation/ndctl-clear-error.txt
>>>> new file mode 100644
>>>> index 0000000..b14521a
>>>> --- /dev/null
>>>> +++ b/Documentation/ndctl-clear-error.txt
>>>> @@ -0,0 +1,38 @@
>>>> +ndctl-clear-error(1)
>>>> +====================
>>>> +
>>>> +NAME
>>>> +----
>>>> +ndctl-clear-error - clear badblocks for a device
>>>
>>> I think "clear-error" is too ambiguous of a name, lets call the
>>> commands repair-media-errors and list-media-errors.
>>>
>>> I'd also like to see the list-media-errors patch first in the series
>>> since repairing is just an incremental step on top of listing. I don't
>>> think the word "badblocks" should appear anywhere in this document.
>>> That's a kernel implementation detail.
>>
>> Is this just for device dax?  If so, that should be more clear in the text,
>> rather than just used in the example.  If it's for other types of pmem devices,
>> then badblocks would make sense to keep, assuming it relates to the
>> badblocks information exposed in /sys.
> 
> It's not necessarily just for device-dax, that's just a starting
> point. The tool could gather errors some other way, so I don't think
> we want to leak the "badblocks" name to this interface.

If the current implementation is for device-dax, then the doc should say that.
I can change later (assuming the comment option survives).

> 
>>> The other benefit of "repair" vs "clear" is communicating that the
>>> data may be indeterminate after repair.
>>
>> For me, repair means you fixed it. If you want to communicate that the
>> data is indeterminate, I think you should say that in the doc rather
>> than assuming people interpret the word the same way you do.
> 
> True, and I think this aligns with another discomfort that we're
> trying to define an explicit error clearing interface when it really
> should an interface that writes data with error clearing as a side
> effect. Just like the block device error clearing case. I think we
> should abandon this command and jsut go create a command that just
> knows how to clear errors while writing, then a lot of these
> interpretation problems go away.

I'd be happy if the documentation just said what it actually does
or doesn't do.

> 
> 
>>> Hopefully in the future we'll
>>> get a standard mechanism to determine if the platform supports atomic
>>> error clearing with a determinate value.
>>>
>>>> +
>>>> +SYNOPSIS
>>>> +--------
>>>> +[verse]
>>>> +'ndctl clear-error' [<options>]
>>>> +
>>>> +EXAMPLES
>>>> +--------
>>>> +
>>>> +Clear poison (bad blocks) for the provided device
>>>> +[verse]
>>>> +ndctl clear-error -f /dev/dax0.0 -s 0 -l 8
>>>> +
>>>> +Clear poison (bad blocks) at block offset 0 for 8 blocks on device /dev/dax0.0
>>>
>>> The "poison" term is x86 specific. Lets just use "media error" everywhere.
>>
>> If you really mean uncorrectable error reported by an Address Range Scrub, you
>> could say that too.  Or are there other types of media errors that this could
>> work with?
> 
> I'm thinking about other architectures outside of x86 + ACPI that may
> have different terminology. So, the intent was to have something
> generic, but this concern also goes away if we just move to a model
> where the tool writes new data to clear errors.

I can definitely see going outside of the x86 part but can you really
get away without ACPI for this?

> 
>>>> +
>>>> +-l::
>>>> +--len::
>>>> +       The number of badblocks to clear in size of 512 bytes increments. The
>>>> +       length must fit within the badblocks range. If the length exceeds the
>>>> +       badblock range or is 0, the command will fail. Not providing a len
>>>> +       will result in a single block being cleared.
>>>
>>> s/badblocks/media error/
>>
>> If it's really not a badblock, does the 512 byte increment still make sense?
>> There's a DSM to get the unit size for what can be cleared which may not
>> be 512.  If the clearable unit is smaller, we're clearing too much.  If the unit
>> is larger, then we can't clear less than that.
> 
> Hopefully the units never get larger than 512. Since we can't address
> smaller than 512 bytes for block devices, I'd rather not expose the
> smaller than 512-byte error clearing support with this interface.

In any case, some argument checking would be helpful for better error messages
in whatever tool ends up doing this.

-- ljk
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

  reply	other threads:[~2017-05-05 20:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-04 21:48 [PATCH v6 0/3] ndctl: add error clearing support for dev dax Dave Jiang
2017-05-04 21:49 ` [PATCH v6 1/3] ndctl: add clear-error to ndctl for device dax Dave Jiang
2017-05-05 19:14   ` Dan Williams
2017-05-05 19:33     ` Dan Williams
2017-05-05 19:41     ` Linda Knippers
2017-05-05 20:01       ` Dan Williams
2017-05-05 20:23         ` Linda Knippers [this message]
2017-05-04 21:49 ` [PATCH v6 2/3] ndctl: add list-errors support Dave Jiang
2017-05-05 17:07   ` Kani, Toshimitsu
2017-05-05 17:14     ` Dave Jiang
2017-05-05 17:21       ` Kani, Toshimitsu
2017-05-05 17:27         ` Dave Jiang
2017-05-05 17:35           ` Kani, Toshimitsu
2017-05-05 17:49             ` Kani, Toshimitsu
2017-05-04 21:49 ` [PATCH v6 3/3] ndctl: add test for clear-error Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=590CDF3B.8020303@hpe.com \
    --to=linda.knippers@hpe.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-nvdimm@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.