All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Moyer <jmoyer@redhat.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org,
	hch@infradead.org, dm-devel@redhat.com
Subject: Re: [PATCH v5 2/8] drivers/pmem: Allow pmem_clear_poison() to accept arbitrary offset and len
Date: Fri, 21 Feb 2020 13:32:48 -0500	[thread overview]
Message-ID: <x498skv3i5r.fsf@segfault.boston.devel.redhat.com> (raw)
In-Reply-To: 20200220215707.GC10816@redhat.com

Vivek Goyal <vgoyal@redhat.com> writes:

> On Thu, Feb 20, 2020 at 04:35:17PM -0500, Jeff Moyer wrote:
>> Vivek Goyal <vgoyal@redhat.com> writes:
>> 
>> > Currently pmem_clear_poison() expects offset and len to be sector aligned.
>> > Atleast that seems to be the assumption with which code has been written.
>> > It is called only from pmem_do_bvec() which is called only from pmem_rw_page()
>> > and pmem_make_request() which will only passe sector aligned offset and len.
>> >
>> > Soon we want use this function from dax_zero_page_range() code path which
>> > can try to zero arbitrary range of memory with-in a page. So update this
>> > function to assume that offset and length can be arbitrary and do the
>> > necessary alignments as needed.
>> 
>> What caller will try to zero a range that is smaller than a sector?
>
> Hi Jeff,
>
> New dax zeroing interface (dax_zero_page_range()) can technically pass
> a range which is less than a sector. Or which is bigger than a sector
> but start and end are not aligned on sector boundaries.

Sure, but who will call it with misaligned ranges?

> At this point of time, all I care about is that case of an arbitrary
> range is handeled well. So if a caller passes a range in, we figure
> out subrange which is sector aligned in terms of start and end, and
> clear poison on those sectors and ignore rest of the range. And
> this itself will be an improvement over current behavior where 
> nothing is cleared if I/O is not sector aligned.

I don't think this makes sense.  The caller needs to know about the
blast radius of errors.  This is why I asked for a concrete example.
It might make more sense, for example, to return an error if not all of
the errors could be cleared.

>> > nvdimm_clear_poison() seems to assume offset and len to be aligned to
>> > clear_err_unit boundary. But this is currently internal detail and is
>> > not exported for others to use. So for now, continue to align offset and
>> > length to SECTOR_SIZE boundary. Improving it further and to align it
>> > to clear_err_unit boundary is a TODO item for future.
>> 
>> When there is a poisoned range of persistent memory, it is recorded by
>> the badblocks infrastructure, which currently operates on sectors.  So,
>> no matter what the error unit is for the hardware, we currently can't
>> record/report to userspace anything smaller than a sector, and so that
>> is what we expect when clearing errors.
>> 
>> Continuing on for completeness, we will currently not map a page with
>> badblocks into a process' address space.  So, let's say you have 256
>> bytes of bad pmem, we will tell you we've lost 512 bytes, and even if
>> you access a valid mmap()d address in the same page as the poisoned
>> memory, you will get a segfault.
>> 
>> Userspace can fix up the error by calling write(2) and friends to
>> provide new data, or by punching a hole and writing new data to the hole
>> (which may result in getting a new block, or reallocating the old block
>> and zeroing it, which will clear the error).
>
> Fair enough. I do not need poison clearing at finer granularity. It might
> be needed once dev_dax path wants to clear poison. Not sure how exactly
> that works.

It doesn't.  :)

>> > +	/*
>> > +	 * Callers can pass arbitrary offset and len. But nvdimm_clear_poison()
>> > +	 * expects memory offset and length to meet certain alignment
>> > +	 * restrction (clear_err_unit). Currently nvdimm does not export
>>                                                   ^^^^^^^^^^^^^^^^^^^^^^
>> > +	 * required alignment. So align offset and length to sector boundary
>> 
>> What is "nvdimm" in that sentence?  Because the nvdimm most certainly
>> does export the required alignment.  Perhaps you meant libnvdimm?
>
> I meant nvdimm_clear_poison() function in drivers/nvdimm/bus.c. Whatever
> it is called. It first queries alignement required (clear_err_unit) and
> then makes sure range passed in meets that alignment requirement.

My point was your comment is misleading.

>> We could potentially support clearing less than a sector, but I'd have
>> to understand the use cases better before offerring implementation
>> suggestions.
>
> I don't need clearing less than a secotr. Once somebody needs it they
> can implement it. All I am doing is making sure current logic is not
> broken when dax_zero_page_range() starts using this logic and passes
> an arbitrary range. We need to make sure we internally align I/O 

An arbitrary range is the same thing as less than a sector.  :)  Do you
know of an instance where the range will not be sector-aligned and sized?

> and carve out an aligned sub-range and pass that subrange to
> nvdimm_clear_poison().

And what happens to the rest?  The caller is left to trip over the
errors?  That sounds pretty terrible.  I really think there needs to be
an explicit contract here.

> So if you can make sure I am not breaking things and new interface
> will continue to clear poison on sector boundary, that will be great.

I think allowing arbitrary ranges /could/ break things.  How it breaks
things depends on what the caller is doing.

If ther eare no callers using the interface in this way, then I see no
need to relax the restriction.  I do think we could document it better.

-Jeff
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Jeff Moyer <jmoyer@redhat.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org,
	hch@infradead.org, dan.j.williams@intel.com, dm-devel@redhat.com
Subject: Re: [PATCH v5 2/8] drivers/pmem: Allow pmem_clear_poison() to accept arbitrary offset and len
Date: Fri, 21 Feb 2020 13:32:48 -0500	[thread overview]
Message-ID: <x498skv3i5r.fsf@segfault.boston.devel.redhat.com> (raw)
In-Reply-To: 20200220215707.GC10816@redhat.com

Vivek Goyal <vgoyal@redhat.com> writes:

> On Thu, Feb 20, 2020 at 04:35:17PM -0500, Jeff Moyer wrote:
>> Vivek Goyal <vgoyal@redhat.com> writes:
>> 
>> > Currently pmem_clear_poison() expects offset and len to be sector aligned.
>> > Atleast that seems to be the assumption with which code has been written.
>> > It is called only from pmem_do_bvec() which is called only from pmem_rw_page()
>> > and pmem_make_request() which will only passe sector aligned offset and len.
>> >
>> > Soon we want use this function from dax_zero_page_range() code path which
>> > can try to zero arbitrary range of memory with-in a page. So update this
>> > function to assume that offset and length can be arbitrary and do the
>> > necessary alignments as needed.
>> 
>> What caller will try to zero a range that is smaller than a sector?
>
> Hi Jeff,
>
> New dax zeroing interface (dax_zero_page_range()) can technically pass
> a range which is less than a sector. Or which is bigger than a sector
> but start and end are not aligned on sector boundaries.

Sure, but who will call it with misaligned ranges?

> At this point of time, all I care about is that case of an arbitrary
> range is handeled well. So if a caller passes a range in, we figure
> out subrange which is sector aligned in terms of start and end, and
> clear poison on those sectors and ignore rest of the range. And
> this itself will be an improvement over current behavior where 
> nothing is cleared if I/O is not sector aligned.

I don't think this makes sense.  The caller needs to know about the
blast radius of errors.  This is why I asked for a concrete example.
It might make more sense, for example, to return an error if not all of
the errors could be cleared.

>> > nvdimm_clear_poison() seems to assume offset and len to be aligned to
>> > clear_err_unit boundary. But this is currently internal detail and is
>> > not exported for others to use. So for now, continue to align offset and
>> > length to SECTOR_SIZE boundary. Improving it further and to align it
>> > to clear_err_unit boundary is a TODO item for future.
>> 
>> When there is a poisoned range of persistent memory, it is recorded by
>> the badblocks infrastructure, which currently operates on sectors.  So,
>> no matter what the error unit is for the hardware, we currently can't
>> record/report to userspace anything smaller than a sector, and so that
>> is what we expect when clearing errors.
>> 
>> Continuing on for completeness, we will currently not map a page with
>> badblocks into a process' address space.  So, let's say you have 256
>> bytes of bad pmem, we will tell you we've lost 512 bytes, and even if
>> you access a valid mmap()d address in the same page as the poisoned
>> memory, you will get a segfault.
>> 
>> Userspace can fix up the error by calling write(2) and friends to
>> provide new data, or by punching a hole and writing new data to the hole
>> (which may result in getting a new block, or reallocating the old block
>> and zeroing it, which will clear the error).
>
> Fair enough. I do not need poison clearing at finer granularity. It might
> be needed once dev_dax path wants to clear poison. Not sure how exactly
> that works.

It doesn't.  :)

>> > +	/*
>> > +	 * Callers can pass arbitrary offset and len. But nvdimm_clear_poison()
>> > +	 * expects memory offset and length to meet certain alignment
>> > +	 * restrction (clear_err_unit). Currently nvdimm does not export
>>                                                   ^^^^^^^^^^^^^^^^^^^^^^
>> > +	 * required alignment. So align offset and length to sector boundary
>> 
>> What is "nvdimm" in that sentence?  Because the nvdimm most certainly
>> does export the required alignment.  Perhaps you meant libnvdimm?
>
> I meant nvdimm_clear_poison() function in drivers/nvdimm/bus.c. Whatever
> it is called. It first queries alignement required (clear_err_unit) and
> then makes sure range passed in meets that alignment requirement.

My point was your comment is misleading.

>> We could potentially support clearing less than a sector, but I'd have
>> to understand the use cases better before offerring implementation
>> suggestions.
>
> I don't need clearing less than a secotr. Once somebody needs it they
> can implement it. All I am doing is making sure current logic is not
> broken when dax_zero_page_range() starts using this logic and passes
> an arbitrary range. We need to make sure we internally align I/O 

An arbitrary range is the same thing as less than a sector.  :)  Do you
know of an instance where the range will not be sector-aligned and sized?

> and carve out an aligned sub-range and pass that subrange to
> nvdimm_clear_poison().

And what happens to the rest?  The caller is left to trip over the
errors?  That sounds pretty terrible.  I really think there needs to be
an explicit contract here.

> So if you can make sure I am not breaking things and new interface
> will continue to clear poison on sector boundary, that will be great.

I think allowing arbitrary ranges /could/ break things.  How it breaks
things depends on what the caller is doing.

If ther eare no callers using the interface in this way, then I see no
need to relax the restriction.  I do think we could document it better.

-Jeff


  reply	other threads:[~2020-02-21 18:33 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-18 21:48 [PATCH v5 0/8] dax/pmem: Provide a dax operation to zero range of memory Vivek Goyal
2020-02-18 21:48 ` Vivek Goyal
2020-02-18 21:48 ` [PATCH v5 1/8] pmem: Add functions for reading/writing page to/from pmem Vivek Goyal
2020-02-18 21:48   ` Vivek Goyal
2020-02-18 21:48 ` [PATCH v5 2/8] drivers/pmem: Allow pmem_clear_poison() to accept arbitrary offset and len Vivek Goyal
2020-02-18 21:48   ` Vivek Goyal
2020-02-20 16:17   ` Christoph Hellwig
2020-02-20 16:17     ` Christoph Hellwig
2020-02-20 21:35   ` Jeff Moyer
2020-02-20 21:35     ` Jeff Moyer
2020-02-20 21:57     ` Vivek Goyal
2020-02-20 21:57       ` Vivek Goyal
2020-02-21 18:32       ` Jeff Moyer [this message]
2020-02-21 18:32         ` Jeff Moyer
2020-02-21 20:17         ` Vivek Goyal
2020-02-21 20:17           ` Vivek Goyal
2020-02-21 21:00           ` Dan Williams
2020-02-21 21:00             ` Dan Williams
2020-02-21 21:24             ` Vivek Goyal
2020-02-21 21:24               ` Vivek Goyal
2020-02-21 21:30               ` Dan Williams
2020-02-21 21:30                 ` Dan Williams
2020-02-21 21:33                 ` Jeff Moyer
2020-02-21 21:33                   ` Jeff Moyer
2020-02-23 23:03           ` Dave Chinner
2020-02-23 23:03             ` Dave Chinner
2020-02-24  0:40             ` Dan Williams
2020-02-24  0:40               ` Dan Williams
2020-02-24 13:50               ` Jeff Moyer
2020-02-24 13:50                 ` Jeff Moyer
2020-02-24 20:48                 ` Dan Williams
2020-02-24 20:48                   ` Dan Williams
2020-02-24 21:53                   ` Jeff Moyer
2020-02-24 21:53                     ` Jeff Moyer
2020-02-25  0:26                     ` Dan Williams
2020-02-25  0:26                       ` Dan Williams
2020-02-25 20:32                       ` Jeff Moyer
2020-02-25 20:32                         ` Jeff Moyer
2020-02-25 21:52                         ` Dan Williams
2020-02-25 21:52                           ` Dan Williams
2020-02-25 23:26                       ` Jane Chu
2020-02-25 23:26                         ` Jane Chu
2020-02-24 15:38             ` Vivek Goyal
2020-02-24 15:38               ` Vivek Goyal
2020-02-27  3:02               ` Dave Chinner
2020-02-27  3:02                 ` Dave Chinner
2020-02-27  4:19                 ` Dan Williams
2020-02-27  4:19                   ` Dan Williams
2020-02-28  1:30                   ` Dave Chinner
2020-02-28  1:30                     ` Dave Chinner
2020-02-28  3:28                     ` Dan Williams
2020-02-28  3:28                       ` Dan Williams
2020-02-28 14:05                       ` Christoph Hellwig
2020-02-28 14:05                         ` Christoph Hellwig
2020-02-28 16:26                         ` Dan Williams
2020-02-28 16:26                           ` Dan Williams
2020-02-24 20:13             ` Vivek Goyal
2020-02-24 20:13               ` Vivek Goyal
2020-02-24 20:52               ` Dan Williams
2020-02-24 20:52                 ` Dan Williams
2020-02-24 21:15                 ` Vivek Goyal
2020-02-24 21:15                   ` Vivek Goyal
2020-02-24 21:32                   ` Dan Williams
2020-02-24 21:32                     ` Dan Williams
2020-02-25 13:36                     ` Vivek Goyal
2020-02-25 13:36                       ` Vivek Goyal
2020-02-25 16:25                       ` Dan Williams
2020-02-25 16:25                         ` Dan Williams
2020-02-25 20:08                         ` Vivek Goyal
2020-02-25 20:08                           ` Vivek Goyal
2020-02-25 22:49                           ` Dan Williams
2020-02-25 22:49                             ` Dan Williams
2020-02-26 13:51                             ` Vivek Goyal
2020-02-26 13:51                               ` Vivek Goyal
2020-02-26 16:57                             ` Vivek Goyal
2020-02-26 16:57                               ` Vivek Goyal
2020-02-27  3:11                               ` Dave Chinner
2020-02-27  3:11                                 ` Dave Chinner
2020-02-27 15:25                                 ` Vivek Goyal
2020-02-27 15:25                                   ` Vivek Goyal
2020-02-28  1:50                                   ` Dave Chinner
2020-02-28  1:50                                     ` Dave Chinner
2020-02-18 21:48 ` [PATCH v5 3/8] pmem: Enable pmem_do_write() to deal with arbitrary ranges Vivek Goyal
2020-02-18 21:48   ` Vivek Goyal
2020-02-20 16:17   ` Christoph Hellwig
2020-02-20 16:17     ` Christoph Hellwig
2020-02-18 21:48 ` [PATCH v5 4/8] dax, pmem: Add a dax operation zero_page_range Vivek Goyal
2020-02-18 21:48   ` Vivek Goyal
2020-03-31 19:38   ` Dan Williams
2020-03-31 19:38     ` Dan Williams
2020-04-01 13:15     ` Vivek Goyal
2020-04-01 13:15       ` Vivek Goyal
2020-04-01 16:14     ` Vivek Goyal
2020-04-01 16:14       ` Vivek Goyal
2020-02-18 21:48 ` [PATCH v5 5/8] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver Vivek Goyal
2020-02-18 21:48   ` Vivek Goyal
2020-02-18 21:48 ` [PATCH v5 6/8] dm,dax: Add dax zero_page_range operation Vivek Goyal
2020-02-18 21:48   ` Vivek Goyal
2020-02-18 21:48 ` [PATCH v5 7/8] dax,iomap: Start using dax native zero_page_range() Vivek Goyal
2020-02-18 21:48   ` Vivek Goyal
2020-02-18 21:48 ` [PATCH v5 8/8] dax,iomap: Add helper dax_iomap_zero() to zero a range Vivek Goyal
2020-02-18 21:48   ` Vivek Goyal
2020-04-25 11:31   ` [PATCH v5 8/8] dax, iomap: " neolift9

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=x498skv3i5r.fsf@segfault.boston.devel.redhat.com \
    --to=jmoyer@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.