linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* use memcpy_mcsafe() for copy_to_iter() (was: Re: [PATCH v3 0/9] Series short description)
@ 2018-05-11 17:10 Dan Williams
  2018-05-14  7:26 ` Ingo Molnar
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2018-05-11 17:10 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Luck, Tony, Andrew Morton, Mike Snitzer, Peter Zijlstra, X86 ML,
	Linux Kernel Mailing List, Andy Lutomirski, Ingo Molnar,
	Borislav Petkov, Mika Penttilä,
	linux-fsdevel, Thomas Gleixner, Linus Torvalds,
	Christoph Hellwig, Al Viro

Ingo, Thomas, Al, any concerns with this series?

On Thu, May 3, 2018 at 5:06 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> Changes since v2 [1]:
>
> * Fix source address increment in mcsafe_handle_tail() (Mika)
>
> * Extend the unit test to inject simulated write faults and validate
>   that data is properly transferred.
>
> * Rename MCSAFE_DEBUG to MCSAFE_TEST
>
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2018-May/015583.html
>
> ---
>
> Currently memcpy_mcsafe() is only deployed in the pmem driver when
> reading through a /dev/pmemX block device. However, a filesystem in dax
> mode mounted on a /dev/pmemX block device will bypass the block layer
> and the driver for reads. The filesystem-dax (fsdax) read case uses
> dax_direct_access() and copy_to_iter() to bypass the block layer.
>
> The result of the bypass is that the kernel treats machine checks during
> read as system fatal (reboot) when they could simply be flagged as an
> I/O error, similar to performing reads through the pmem driver. Prevent
> this fatal condition by deploying memcpy_mcsafe() in the fsdax read
> path.
>
> The main differences between this copy_to_user_mcsafe() and
> copy_user_generic_unrolled() are:
>
> * Typical tail/residue handling after a fault retries the copy
>   byte-by-byte until the fault happens again. Re-triggering machine
>   checks is potentially fatal so the implementation uses source alignment
>   and poison alignment assumptions to avoid re-triggering machine
>   checks.
>
> * SMAP coordination is handled external to the assembly with
>   __uaccess_begin() and __uaccess_end().
>
> * ITER_KVEC and ITER_BVEC can now end prematurely with an error.
>
> The new MCSAFE_TEST facility is proposed as a way to unit test the
> exception handling without requiring an ACPI EINJ capable platform.
>
> ---
>
> Dan Williams (9):
>       x86, memcpy_mcsafe: remove loop unrolling
>       x86, memcpy_mcsafe: add labels for write fault handling
>       x86, memcpy_mcsafe: return bytes remaining
>       x86, memcpy_mcsafe: add write-protection-fault handling
>       x86, memcpy_mcsafe: define copy_to_iter_mcsafe()
>       dax: introduce a ->copy_to_iter dax operation
>       dax: report bytes remaining in dax_iomap_actor()
>       pmem: switch to copy_to_iter_mcsafe()
>       x86, nfit_test: unit test for memcpy_mcsafe()
>
>
>  arch/x86/Kconfig                   |    1
>  arch/x86/Kconfig.debug             |    3 +
>  arch/x86/include/asm/mcsafe_test.h |   75 ++++++++++++++++++++++++
>  arch/x86/include/asm/string_64.h   |   10 ++-
>  arch/x86/include/asm/uaccess_64.h  |   14 +++++
>  arch/x86/lib/memcpy_64.S           |  112 +++++++++++++++++-------------------
>  arch/x86/lib/usercopy_64.c         |   21 +++++++
>  drivers/dax/super.c                |   10 +++
>  drivers/md/dm-linear.c             |   16 +++++
>  drivers/md/dm-log-writes.c         |   15 +++++
>  drivers/md/dm-stripe.c             |   21 +++++++
>  drivers/md/dm.c                    |   25 ++++++++
>  drivers/nvdimm/claim.c             |    3 +
>  drivers/nvdimm/pmem.c              |   13 +++-
>  drivers/s390/block/dcssblk.c       |    7 ++
>  fs/dax.c                           |   21 ++++---
>  include/linux/dax.h                |    5 ++
>  include/linux/device-mapper.h      |    5 +-
>  include/linux/string.h             |    4 +
>  include/linux/uio.h                |   15 +++++
>  lib/iov_iter.c                     |   61 ++++++++++++++++++++
>  tools/testing/nvdimm/test/nfit.c   |  104 +++++++++++++++++++++++++++++++++
>  22 files changed, 482 insertions(+), 79 deletions(-)
>  create mode 100644 arch/x86/include/asm/mcsafe_test.h
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: use memcpy_mcsafe() for copy_to_iter() (was: Re: [PATCH v3 0/9] Series short description)
  2018-05-11 17:10 use memcpy_mcsafe() for copy_to_iter() (was: Re: [PATCH v3 0/9] Series short description) Dan Williams
@ 2018-05-14  7:26 ` Ingo Molnar
  2018-05-14 15:52   ` Dan Williams
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2018-05-14  7:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Luck, Tony, Andrew Morton, Mike Snitzer,
	Peter Zijlstra, X86 ML, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Mika Penttilä,
	linux-fsdevel, Thomas Gleixner, Linus Torvalds,
	Christoph Hellwig, Al Viro, Peter Zijlstra


* Dan Williams <dan.j.williams@intel.com> wrote:

> Ingo, Thomas, Al, any concerns with this series?

Yeah, so:

   "[PATCH v3 0/9] Series short description"

... isn't the catchiest of titles to capture my [all too easily distracted] 
attention! ;-)

I have marked it now for -tip processing. Linus was happy with this and acked the 
approach, right?

Thanks,

	Ingo

> 
> On Thu, May 3, 2018 at 5:06 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > Changes since v2 [1]:
> >
> > * Fix source address increment in mcsafe_handle_tail() (Mika)
> >
> > * Extend the unit test to inject simulated write faults and validate
> >   that data is properly transferred.
> >
> > * Rename MCSAFE_DEBUG to MCSAFE_TEST
> >
> > [1]: https://lists.01.org/pipermail/linux-nvdimm/2018-May/015583.html
> >
> > ---
> >
> > Currently memcpy_mcsafe() is only deployed in the pmem driver when
> > reading through a /dev/pmemX block device. However, a filesystem in dax
> > mode mounted on a /dev/pmemX block device will bypass the block layer
> > and the driver for reads. The filesystem-dax (fsdax) read case uses
> > dax_direct_access() and copy_to_iter() to bypass the block layer.
> >
> > The result of the bypass is that the kernel treats machine checks during
> > read as system fatal (reboot) when they could simply be flagged as an
> > I/O error, similar to performing reads through the pmem driver. Prevent
> > this fatal condition by deploying memcpy_mcsafe() in the fsdax read
> > path.
> >
> > The main differences between this copy_to_user_mcsafe() and
> > copy_user_generic_unrolled() are:
> >
> > * Typical tail/residue handling after a fault retries the copy
> >   byte-by-byte until the fault happens again. Re-triggering machine
> >   checks is potentially fatal so the implementation uses source alignment
> >   and poison alignment assumptions to avoid re-triggering machine
> >   checks.
> >
> > * SMAP coordination is handled external to the assembly with
> >   __uaccess_begin() and __uaccess_end().
> >
> > * ITER_KVEC and ITER_BVEC can now end prematurely with an error.
> >
> > The new MCSAFE_TEST facility is proposed as a way to unit test the
> > exception handling without requiring an ACPI EINJ capable platform.
> >
> > ---
> >
> > Dan Williams (9):
> >       x86, memcpy_mcsafe: remove loop unrolling
> >       x86, memcpy_mcsafe: add labels for write fault handling
> >       x86, memcpy_mcsafe: return bytes remaining
> >       x86, memcpy_mcsafe: add write-protection-fault handling
> >       x86, memcpy_mcsafe: define copy_to_iter_mcsafe()
> >       dax: introduce a ->copy_to_iter dax operation
> >       dax: report bytes remaining in dax_iomap_actor()
> >       pmem: switch to copy_to_iter_mcsafe()
> >       x86, nfit_test: unit test for memcpy_mcsafe()
> >
> >
> >  arch/x86/Kconfig                   |    1
> >  arch/x86/Kconfig.debug             |    3 +
> >  arch/x86/include/asm/mcsafe_test.h |   75 ++++++++++++++++++++++++
> >  arch/x86/include/asm/string_64.h   |   10 ++-
> >  arch/x86/include/asm/uaccess_64.h  |   14 +++++
> >  arch/x86/lib/memcpy_64.S           |  112 +++++++++++++++++-------------------
> >  arch/x86/lib/usercopy_64.c         |   21 +++++++
> >  drivers/dax/super.c                |   10 +++
> >  drivers/md/dm-linear.c             |   16 +++++
> >  drivers/md/dm-log-writes.c         |   15 +++++
> >  drivers/md/dm-stripe.c             |   21 +++++++
> >  drivers/md/dm.c                    |   25 ++++++++
> >  drivers/nvdimm/claim.c             |    3 +
> >  drivers/nvdimm/pmem.c              |   13 +++-
> >  drivers/s390/block/dcssblk.c       |    7 ++
> >  fs/dax.c                           |   21 ++++---
> >  include/linux/dax.h                |    5 ++
> >  include/linux/device-mapper.h      |    5 +-
> >  include/linux/string.h             |    4 +
> >  include/linux/uio.h                |   15 +++++
> >  lib/iov_iter.c                     |   61 ++++++++++++++++++++
> >  tools/testing/nvdimm/test/nfit.c   |  104 +++++++++++++++++++++++++++++++++
> >  22 files changed, 482 insertions(+), 79 deletions(-)
> >  create mode 100644 arch/x86/include/asm/mcsafe_test.h
> > _______________________________________________
> > Linux-nvdimm mailing list
> > Linux-nvdimm@lists.01.org
> > https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: use memcpy_mcsafe() for copy_to_iter() (was: Re: [PATCH v3 0/9] Series short description)
  2018-05-14  7:26 ` Ingo Molnar
@ 2018-05-14 15:52   ` Dan Williams
  2018-05-14 17:53     ` Andy Lutomirski
                       ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dan Williams @ 2018-05-14 15:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-nvdimm, Luck, Tony, Andrew Morton, Mike Snitzer,
	Peter Zijlstra, X86 ML, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Mika Penttilä,
	linux-fsdevel, Thomas Gleixner, Linus Torvalds,
	Christoph Hellwig, Al Viro, Peter Zijlstra

On Mon, May 14, 2018 at 12:26 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Dan Williams <dan.j.williams@intel.com> wrote:
>
>> Ingo, Thomas, Al, any concerns with this series?
>
> Yeah, so:
>
>    "[PATCH v3 0/9] Series short description"
>
> ... isn't the catchiest of titles to capture my [all too easily distracted]
> attention! ;-)

My bad! After that mistake it became a toss-up between more spam and
hoping the distraction would not throw you off.

> I have marked it now for -tip processing. Linus was happy with this and acked the
> approach, right?

I think "happy" is a strong word when it comes to x86 machine check
handling. My interpretation is that he and Andy acquiesced that this
is about the best we can do with dax+mce as things stand today.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: use memcpy_mcsafe() for copy_to_iter() (was: Re: [PATCH v3 0/9] Series short description)
  2018-05-14 15:52   ` Dan Williams
@ 2018-05-14 17:53     ` Andy Lutomirski
  2018-05-14 19:20     ` Linus Torvalds
  2018-05-15  6:49     ` Ingo Molnar
  2 siblings, 0 replies; 7+ messages in thread
From: Andy Lutomirski @ 2018-05-14 17:53 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ingo Molnar, linux-nvdimm, Luck, Tony, Andrew Morton,
	Mike Snitzer, Peter Zijlstra, X86 ML, Linux Kernel Mailing List,
	Ingo Molnar, Borislav Petkov, Mika Penttilä,
	linux-fsdevel, Thomas Gleixner, Linus Torvalds,
	Christoph Hellwig, Al Viro, Peter Zijlstra



> On May 14, 2018, at 8:52 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> 

> 
> I think "happy" is a strong word when it comes to x86 machine check
> handling. My interpretation is that he and Andy acquiesced that this
> is about the best we can do with dax+mce as things stand today.


Agreed. I think it’s plausible that we could do slightly better for MCEs related to bad memory accessed through the direct map, but the code would be complicated and miserable to test, and it’s not even clear that it would materially decrease the chance that we survive as compared to these patches.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: use memcpy_mcsafe() for copy_to_iter() (was: Re: [PATCH v3 0/9] Series short description)
  2018-05-14 15:52   ` Dan Williams
  2018-05-14 17:53     ` Andy Lutomirski
@ 2018-05-14 19:20     ` Linus Torvalds
  2018-05-15  6:49     ` Ingo Molnar
  2 siblings, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2018-05-14 19:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ingo Molnar, linux-nvdimm, Tony Luck, Andrew Morton,
	Mike Snitzer, Peter Zijlstra, the arch/x86 maintainers,
	Linux Kernel Mailing List, Andy Lutomirski, Ingo Molnar,
	Borislav Petkov, Mika Penttilä,
	linux-fsdevel, Thomas Gleixner, Christoph Hellwig, Al Viro,
	Peter Zijlstra

On Mon, May 14, 2018 at 8:52 AM Dan Williams <dan.j.williams@intel.com>
wrote:

> I think "happy" is a strong word when it comes to x86 machine check
> handling. My interpretation is that he and Andy acquiesced that this
> is about the best we can do with dax+mce as things stand today.

Yeah. I definitely wouldn't say "happy", but perhaps "resigned". There's
nothing horrible going on, except for just plain nasty interfaces and
apparently disgusting hw problems wrt "rep movs/stos".

                  Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: use memcpy_mcsafe() for copy_to_iter() (was: Re: [PATCH v3 0/9] Series short description)
  2018-05-14 15:52   ` Dan Williams
  2018-05-14 17:53     ` Andy Lutomirski
  2018-05-14 19:20     ` Linus Torvalds
@ 2018-05-15  6:49     ` Ingo Molnar
  2018-05-15 17:20       ` Dan Williams
  2 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2018-05-15  6:49 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Luck, Tony, Andrew Morton, Mike Snitzer,
	Peter Zijlstra, X86 ML, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Mika Penttilä,
	linux-fsdevel, Thomas Gleixner, Linus Torvalds,
	Christoph Hellwig, Al Viro, Peter Zijlstra, Andy Lutomirski


* Dan Williams <dan.j.williams@intel.com> wrote:

> On Mon, May 14, 2018 at 12:26 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Dan Williams <dan.j.williams@intel.com> wrote:
> >
> >> Ingo, Thomas, Al, any concerns with this series?
> >
> > Yeah, so:
> >
> >    "[PATCH v3 0/9] Series short description"
> >
> > ... isn't the catchiest of titles to capture my [all too easily distracted]
> > attention! ;-)
> 
> My bad! After that mistake it became a toss-up between more spam and
> hoping the distraction would not throw you off.
> 
> > I have marked it now for -tip processing. Linus was happy with this and acked the
> > approach, right?
> 
> I think "happy" is a strong word when it comes to x86 machine check
> handling. My interpretation is that he and Andy acquiesced that this
> is about the best we can do with dax+mce as things stand today.

So, how would you like to go about this series?

To help move it forward I applied the first 5 commits to tip:x86/dax, on a
vanilla v4.17-rc5 base, did some minor edits to the changelogs, tested it
superficially (I don't have DAX so this essentially means build tests) and
pushed out the result.

Barring some later generic-x86 regression (unlikely) this looks good to me - feel 
free to cross-pull that branch into your DAX/nvdimm tree.

Or we could apply the remaining changes to -tip too - your call.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: use memcpy_mcsafe() for copy_to_iter() (was: Re: [PATCH v3 0/9] Series short description)
  2018-05-15  6:49     ` Ingo Molnar
@ 2018-05-15 17:20       ` Dan Williams
  0 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2018-05-15 17:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-nvdimm, Luck, Tony, Andrew Morton, Mike Snitzer,
	Peter Zijlstra, X86 ML, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Mika Penttilä,
	linux-fsdevel, Thomas Gleixner, Linus Torvalds,
	Christoph Hellwig, Al Viro, Peter Zijlstra, Andy Lutomirski

On Mon, May 14, 2018 at 11:49 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Dan Williams <dan.j.williams@intel.com> wrote:
>
>> On Mon, May 14, 2018 at 12:26 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >
>> > * Dan Williams <dan.j.williams@intel.com> wrote:
>> >
>> >> Ingo, Thomas, Al, any concerns with this series?
>> >
>> > Yeah, so:
>> >
>> >    "[PATCH v3 0/9] Series short description"
>> >
>> > ... isn't the catchiest of titles to capture my [all too easily distracted]
>> > attention! ;-)
>>
>> My bad! After that mistake it became a toss-up between more spam and
>> hoping the distraction would not throw you off.
>>
>> > I have marked it now for -tip processing. Linus was happy with this and acked the
>> > approach, right?
>>
>> I think "happy" is a strong word when it comes to x86 machine check
>> handling. My interpretation is that he and Andy acquiesced that this
>> is about the best we can do with dax+mce as things stand today.
>
> So, how would you like to go about this series?
>
> To help move it forward I applied the first 5 commits to tip:x86/dax, on a
> vanilla v4.17-rc5 base, did some minor edits to the changelogs, tested it
> superficially (I don't have DAX so this essentially means build tests) and
> pushed out the result.

Thanks for that. Technically speaking, you do have dax, but setting up
our unit tests is currently not friction free, so I would not expect
you to go through that effort. Hopefully we can revive 0day running
our unit tests one of these days.

> Barring some later generic-x86 regression (unlikely) this looks good to me - feel
> free to cross-pull that branch into your DAX/nvdimm tree.
>
> Or we could apply the remaining changes to -tip too - your call.

The remainder patches have developed a conflict with another topic
branch in the nvdimm tree, in particular "dax: introduce a
->copy_to_iter dax operation". I think the best course is for me to
rebase the remaining 4 on top of tip/x86/dax and carry the merge
conflict through the nvdimm tree.

> Thanks,
>
>         Ingo

Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-05-15 17:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-11 17:10 use memcpy_mcsafe() for copy_to_iter() (was: Re: [PATCH v3 0/9] Series short description) Dan Williams
2018-05-14  7:26 ` Ingo Molnar
2018-05-14 15:52   ` Dan Williams
2018-05-14 17:53     ` Andy Lutomirski
2018-05-14 19:20     ` Linus Torvalds
2018-05-15  6:49     ` Ingo Molnar
2018-05-15 17:20       ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).