linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Moyer <jmoyer@redhat.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org,
	hch@infradead.org, dm-devel@redhat.com
Subject: Re: [PATCH v5 2/8] drivers/pmem: Allow pmem_clear_poison() to accept arbitrary offset and len
Date: Fri, 21 Feb 2020 13:32:48 -0500	[thread overview]
Message-ID: <x498skv3i5r.fsf@segfault.boston.devel.redhat.com> (raw)
In-Reply-To: 20200220215707.GC10816@redhat.com

Vivek Goyal <vgoyal@redhat.com> writes:

> On Thu, Feb 20, 2020 at 04:35:17PM -0500, Jeff Moyer wrote:
>> Vivek Goyal <vgoyal@redhat.com> writes:
>> 
>> > Currently pmem_clear_poison() expects offset and len to be sector aligned.
>> > Atleast that seems to be the assumption with which code has been written.
>> > It is called only from pmem_do_bvec() which is called only from pmem_rw_page()
>> > and pmem_make_request() which will only passe sector aligned offset and len.
>> >
>> > Soon we want use this function from dax_zero_page_range() code path which
>> > can try to zero arbitrary range of memory with-in a page. So update this
>> > function to assume that offset and length can be arbitrary and do the
>> > necessary alignments as needed.
>> 
>> What caller will try to zero a range that is smaller than a sector?
>
> Hi Jeff,
>
> New dax zeroing interface (dax_zero_page_range()) can technically pass
> a range which is less than a sector. Or which is bigger than a sector
> but start and end are not aligned on sector boundaries.

Sure, but who will call it with misaligned ranges?

> At this point of time, all I care about is that case of an arbitrary
> range is handeled well. So if a caller passes a range in, we figure
> out subrange which is sector aligned in terms of start and end, and
> clear poison on those sectors and ignore rest of the range. And
> this itself will be an improvement over current behavior where 
> nothing is cleared if I/O is not sector aligned.

I don't think this makes sense.  The caller needs to know about the
blast radius of errors.  This is why I asked for a concrete example.
It might make more sense, for example, to return an error if not all of
the errors could be cleared.

>> > nvdimm_clear_poison() seems to assume offset and len to be aligned to
>> > clear_err_unit boundary. But this is currently internal detail and is
>> > not exported for others to use. So for now, continue to align offset and
>> > length to SECTOR_SIZE boundary. Improving it further and to align it
>> > to clear_err_unit boundary is a TODO item for future.
>> 
>> When there is a poisoned range of persistent memory, it is recorded by
>> the badblocks infrastructure, which currently operates on sectors.  So,
>> no matter what the error unit is for the hardware, we currently can't
>> record/report to userspace anything smaller than a sector, and so that
>> is what we expect when clearing errors.
>> 
>> Continuing on for completeness, we will currently not map a page with
>> badblocks into a process' address space.  So, let's say you have 256
>> bytes of bad pmem, we will tell you we've lost 512 bytes, and even if
>> you access a valid mmap()d address in the same page as the poisoned
>> memory, you will get a segfault.
>> 
>> Userspace can fix up the error by calling write(2) and friends to
>> provide new data, or by punching a hole and writing new data to the hole
>> (which may result in getting a new block, or reallocating the old block
>> and zeroing it, which will clear the error).
>
> Fair enough. I do not need poison clearing at finer granularity. It might
> be needed once dev_dax path wants to clear poison. Not sure how exactly
> that works.

It doesn't.  :)

>> > +	/*
>> > +	 * Callers can pass arbitrary offset and len. But nvdimm_clear_poison()
>> > +	 * expects memory offset and length to meet certain alignment
>> > +	 * restrction (clear_err_unit). Currently nvdimm does not export
>>                                                   ^^^^^^^^^^^^^^^^^^^^^^
>> > +	 * required alignment. So align offset and length to sector boundary
>> 
>> What is "nvdimm" in that sentence?  Because the nvdimm most certainly
>> does export the required alignment.  Perhaps you meant libnvdimm?
>
> I meant nvdimm_clear_poison() function in drivers/nvdimm/bus.c. Whatever
> it is called. It first queries alignement required (clear_err_unit) and
> then makes sure range passed in meets that alignment requirement.

My point was your comment is misleading.

>> We could potentially support clearing less than a sector, but I'd have
>> to understand the use cases better before offerring implementation
>> suggestions.
>
> I don't need clearing less than a secotr. Once somebody needs it they
> can implement it. All I am doing is making sure current logic is not
> broken when dax_zero_page_range() starts using this logic and passes
> an arbitrary range. We need to make sure we internally align I/O 

An arbitrary range is the same thing as less than a sector.  :)  Do you
know of an instance where the range will not be sector-aligned and sized?

> and carve out an aligned sub-range and pass that subrange to
> nvdimm_clear_poison().

And what happens to the rest?  The caller is left to trip over the
errors?  That sounds pretty terrible.  I really think there needs to be
an explicit contract here.

> So if you can make sure I am not breaking things and new interface
> will continue to clear poison on sector boundary, that will be great.

I think allowing arbitrary ranges /could/ break things.  How it breaks
things depends on what the caller is doing.

If ther eare no callers using the interface in this way, then I see no
need to relax the restriction.  I do think we could document it better.

-Jeff
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

  reply	other threads:[~2020-02-21 18:33 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-18 21:48 [PATCH v5 0/8] dax/pmem: Provide a dax operation to zero range of memory Vivek Goyal
2020-02-18 21:48 ` [PATCH v5 1/8] pmem: Add functions for reading/writing page to/from pmem Vivek Goyal
2020-02-18 21:48 ` [PATCH v5 2/8] drivers/pmem: Allow pmem_clear_poison() to accept arbitrary offset and len Vivek Goyal
2020-02-20 16:17   ` Christoph Hellwig
2020-02-20 21:35   ` Jeff Moyer
2020-02-20 21:57     ` Vivek Goyal
2020-02-21 18:32       ` Jeff Moyer [this message]
2020-02-21 20:17         ` Vivek Goyal
2020-02-21 21:00           ` Dan Williams
2020-02-21 21:24             ` Vivek Goyal
2020-02-21 21:30               ` Dan Williams
2020-02-21 21:33                 ` Jeff Moyer
2020-02-23 23:03           ` Dave Chinner
2020-02-24  0:40             ` Dan Williams
2020-02-24 13:50               ` Jeff Moyer
2020-02-24 20:48                 ` Dan Williams
2020-02-24 21:53                   ` Jeff Moyer
2020-02-25  0:26                     ` Dan Williams
2020-02-25 20:32                       ` Jeff Moyer
2020-02-25 21:52                         ` Dan Williams
2020-02-25 23:26                       ` Jane Chu
2020-02-24 15:38             ` Vivek Goyal
2020-02-27  3:02               ` Dave Chinner
2020-02-27  4:19                 ` Dan Williams
2020-02-28  1:30                   ` Dave Chinner
2020-02-28  3:28                     ` Dan Williams
2020-02-28 14:05                       ` Christoph Hellwig
2020-02-28 16:26                         ` Dan Williams
2020-02-24 20:13             ` Vivek Goyal
2020-02-24 20:52               ` Dan Williams
2020-02-24 21:15                 ` Vivek Goyal
2020-02-24 21:32                   ` Dan Williams
2020-02-25 13:36                     ` Vivek Goyal
2020-02-25 16:25                       ` Dan Williams
2020-02-25 20:08                         ` Vivek Goyal
2020-02-25 22:49                           ` Dan Williams
2020-02-26 13:51                             ` Vivek Goyal
2020-02-26 16:57                             ` Vivek Goyal
2020-02-27  3:11                               ` Dave Chinner
2020-02-27 15:25                                 ` Vivek Goyal
2020-02-28  1:50                                   ` Dave Chinner
2020-02-18 21:48 ` [PATCH v5 3/8] pmem: Enable pmem_do_write() to deal with arbitrary ranges Vivek Goyal
2020-02-20 16:17   ` Christoph Hellwig
2020-02-18 21:48 ` [PATCH v5 4/8] dax, pmem: Add a dax operation zero_page_range Vivek Goyal
2020-03-31 19:38   ` Dan Williams
2020-04-01 13:15     ` Vivek Goyal
2020-04-01 16:14     ` Vivek Goyal
2020-02-18 21:48 ` [PATCH v5 5/8] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver Vivek Goyal
2020-02-18 21:48 ` [PATCH v5 6/8] dm,dax: Add dax zero_page_range operation Vivek Goyal
2020-02-18 21:48 ` [PATCH v5 7/8] dax,iomap: Start using dax native zero_page_range() Vivek Goyal
2020-02-18 21:48 ` [PATCH v5 8/8] dax,iomap: Add helper dax_iomap_zero() to zero a range Vivek Goyal
2020-04-25 11:31   ` [PATCH v5 8/8] dax, iomap: " neolift9

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=x498skv3i5r.fsf@segfault.boston.devel.redhat.com \
    --to=jmoyer@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).