All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Juergen Gross <jgross@suse.com>, X86 ML <x86@kernel.org>,
	Toshi Kani <toshi.kani@hp.com>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Luis Rodriguez <mcgrof@suse.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Stefan Bader <stefan.bader@canonical.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Ralf Baechle <ralf@linux-mips.org>,
	Henrique de Moraes Holschuh <hmh@hmh.eng.br>,
	Michael Ellerman <mpe@ellerman.id.au>, Tejun Heo <tj@kernel.org>,
	Paul Mackerras <paulus@samba.org>, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v4 6/6] arch, x86: pmem api for ensuring durability of persistent memory updates
Date: Wed, 17 Jun 2015 08:07:38 -0700	[thread overview]
Message-ID: <CALCETrXXYyjKHi1ajR6aescmjSo5eds=5g_byWpzBRbBNdsgRQ@mail.gmail.com> (raw)
In-Reply-To: <20150611211947.10271.80768.stgit@dwillia2-desk3.amr.corp.intel.com>

On Thu, Jun 11, 2015 at 2:19 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> From: Ross Zwisler <ross.zwisler@linux.intel.com>
>
> Based on an original patch by Ross Zwisler [1].
>
> Writes to persistent memory have the potential to be posted to cpu
> cache, cpu write buffers, and platform write buffers (memory controller)
> before being committed to persistent media.  Provide apis,
> memcpy_to_pmem(), sync_pmem(), and memremap_pmem(), to write data to
> pmem and assert that it is durable in PMEM (a persistent linear address
> range).  A '__pmem' attribute is added so sparse can track proper usage
> of pointers to pmem.
>
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000932.html
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> [djbw: various reworks]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/Kconfig                  |    1
>  arch/x86/include/asm/cacheflush.h |   36 +++++++++++++
>  arch/x86/include/asm/io.h         |    6 ++
>  drivers/block/pmem.c              |   75 +++++++++++++++++++++++++--
>  include/linux/compiler.h          |    2 +
>  include/linux/pmem.h              |  102 +++++++++++++++++++++++++++++++++++++
>  lib/Kconfig                       |    3 +
>  7 files changed, 218 insertions(+), 7 deletions(-)
>  create mode 100644 include/linux/pmem.h
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index f16caf7eac27..5dfb8f31ac48 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -28,6 +28,7 @@ config X86
>         select ARCH_HAS_FAST_MULTIPLIER
>         select ARCH_HAS_GCOV_PROFILE_ALL
>         select ARCH_HAS_MEMREMAP
> +       select ARCH_HAS_PMEM_API
>         select ARCH_HAS_SG_CHAIN
>         select ARCH_HAVE_NMI_SAFE_CMPXCHG
>         select ARCH_MIGHT_HAVE_ACPI_PDC         if ACPI
> diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
> index b6f7457d12e4..4d896487382c 100644
> --- a/arch/x86/include/asm/cacheflush.h
> +++ b/arch/x86/include/asm/cacheflush.h
> @@ -4,6 +4,7 @@
>  /* Caches aren't brain-dead on the intel. */
>  #include <asm-generic/cacheflush.h>
>  #include <asm/special_insns.h>
> +#include <asm/uaccess.h>
>
>  /*
>   * The set_memory_* API can be used to change various attributes of a virtual
> @@ -108,4 +109,39 @@ static inline int rodata_test(void)
>  }
>  #endif
>
> +#ifdef ARCH_HAS_NOCACHE_UACCESS
> +static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src, size_t n)
> +{
> +       /*
> +        * We are copying between two kernel buffers, if
> +        * __copy_from_user_inatomic_nocache() returns an error (page
> +        * fault) we would have already taken an unhandled fault before
> +        * the BUG_ON.  The BUG_ON is simply here to satisfy
> +        * __must_check and allow reuse of the common non-temporal store
> +        * implementation for memcpy_to_pmem().
> +        */
> +       BUG_ON(__copy_from_user_inatomic_nocache((void __force *) dst,
> +                               (void __user *) src, n));

Ick.  If we take a fault, we will lose the debugging information we
would otherwise have gotten unless we get lucky and get a usable CR2
value in the oops.

> +}
> +
> +static inline void arch_sync_pmem(void)
> +{
> +       wmb();
> +       pcommit_sfence();
> +}

This function is non-intuitive to me.  It's really "arch-specific sync
pmem after one or more copies using arch_memcpy_to_pmem".  If normal
stores or memcpy to non-WC memory is used instead, then it's
insufficient if the memory is WB and it's unnecessarily slow if the
memory is WT or UC (the first sfence isn't needed).

I would change the name and add documentation.  I'd also add a comment
about the wmb() being an SFENCE to flush pending non-temporal writes.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto@amacapital.net>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Juergen Gross <jgross@suse.com>, X86 ML <x86@kernel.org>,
	Toshi Kani <toshi.kani@hp.com>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Luis Rodriguez <mcgrof@suse.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Stefan Bader <stefan.bader@canonical.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Ralf Baechle <ralf@linux-mips.org>,
	Henrique de Moraes Holschuh <hmh@hmh.eng.br>,
	Michael Ellerman <mpe@ellerman.id.au>, Tejun Heo <tj@kernel.org>,
	Paul Mackerras <paulus@samba.org>, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v4 6/6] arch, x86: pmem api for ensuring durability of persistent memory updates
Date: Wed, 17 Jun 2015 08:07:38 -0700	[thread overview]
Message-ID: <CALCETrXXYyjKHi1ajR6aescmjSo5eds=5g_byWpzBRbBNdsgRQ@mail.gmail.com> (raw)
In-Reply-To: <20150611211947.10271.80768.stgit@dwillia2-desk3.amr.corp.intel.com>

On Thu, Jun 11, 2015 at 2:19 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> From: Ross Zwisler <ross.zwisler@linux.intel.com>
>
> Based on an original patch by Ross Zwisler [1].
>
> Writes to persistent memory have the potential to be posted to cpu
> cache, cpu write buffers, and platform write buffers (memory controller)
> before being committed to persistent media.  Provide apis,
> memcpy_to_pmem(), sync_pmem(), and memremap_pmem(), to write data to
> pmem and assert that it is durable in PMEM (a persistent linear address
> range).  A '__pmem' attribute is added so sparse can track proper usage
> of pointers to pmem.
>
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000932.html
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> [djbw: various reworks]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/Kconfig                  |    1
>  arch/x86/include/asm/cacheflush.h |   36 +++++++++++++
>  arch/x86/include/asm/io.h         |    6 ++
>  drivers/block/pmem.c              |   75 +++++++++++++++++++++++++--
>  include/linux/compiler.h          |    2 +
>  include/linux/pmem.h              |  102 +++++++++++++++++++++++++++++++++++++
>  lib/Kconfig                       |    3 +
>  7 files changed, 218 insertions(+), 7 deletions(-)
>  create mode 100644 include/linux/pmem.h
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index f16caf7eac27..5dfb8f31ac48 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -28,6 +28,7 @@ config X86
>         select ARCH_HAS_FAST_MULTIPLIER
>         select ARCH_HAS_GCOV_PROFILE_ALL
>         select ARCH_HAS_MEMREMAP
> +       select ARCH_HAS_PMEM_API
>         select ARCH_HAS_SG_CHAIN
>         select ARCH_HAVE_NMI_SAFE_CMPXCHG
>         select ARCH_MIGHT_HAVE_ACPI_PDC         if ACPI
> diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
> index b6f7457d12e4..4d896487382c 100644
> --- a/arch/x86/include/asm/cacheflush.h
> +++ b/arch/x86/include/asm/cacheflush.h
> @@ -4,6 +4,7 @@
>  /* Caches aren't brain-dead on the intel. */
>  #include <asm-generic/cacheflush.h>
>  #include <asm/special_insns.h>
> +#include <asm/uaccess.h>
>
>  /*
>   * The set_memory_* API can be used to change various attributes of a virtual
> @@ -108,4 +109,39 @@ static inline int rodata_test(void)
>  }
>  #endif
>
> +#ifdef ARCH_HAS_NOCACHE_UACCESS
> +static inline void arch_memcpy_to_pmem(void __pmem *dst, const void *src, size_t n)
> +{
> +       /*
> +        * We are copying between two kernel buffers, if
> +        * __copy_from_user_inatomic_nocache() returns an error (page
> +        * fault) we would have already taken an unhandled fault before
> +        * the BUG_ON.  The BUG_ON is simply here to satisfy
> +        * __must_check and allow reuse of the common non-temporal store
> +        * implementation for memcpy_to_pmem().
> +        */
> +       BUG_ON(__copy_from_user_inatomic_nocache((void __force *) dst,
> +                               (void __user *) src, n));

Ick.  If we take a fault, we will lose the debugging information we
would otherwise have gotten unless we get lucky and get a usable CR2
value in the oops.

> +}
> +
> +static inline void arch_sync_pmem(void)
> +{
> +       wmb();
> +       pcommit_sfence();
> +}

This function is non-intuitive to me.  It's really "arch-specific sync
pmem after one or more copies using arch_memcpy_to_pmem".  If normal
stores or memcpy to non-WC memory is used instead, then it's
insufficient if the memory is WB and it's unnecessarily slow if the
memory is WT or UC (the first sfence isn't needed).

I would change the name and add documentation.  I'd also add a comment
about the wmb() being an SFENCE to flush pending non-temporal writes.

--Andy

  parent reply	other threads:[~2015-06-17 15:07 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-11 21:19 [-tip PATCH v4 0/6] pmem api, generic ioremap_cache, and memremap Dan Williams
2015-06-11 21:19 ` Dan Williams
2015-06-11 21:19 ` [PATCH v4 1/6] arch: unify ioremap prototypes and macro aliases Dan Williams
2015-06-11 21:19   ` Dan Williams
2015-06-17 11:14   ` Christoph Hellwig
2015-06-17 11:14     ` Christoph Hellwig
2015-06-17 17:35   ` Toshi Kani
2015-06-17 17:35     ` Toshi Kani
2015-06-11 21:19 ` [PATCH v4 2/6] cleanup IORESOURCE_CACHEABLE vs ioremap() Dan Williams
2015-06-11 21:19   ` Dan Williams
2015-06-11 21:19 ` [PATCH v4 3/6] arch/*/asm/io.h: add ioremap_cache() to all architectures Dan Williams
2015-06-11 21:19   ` Dan Williams
2015-06-17 11:27   ` Christoph Hellwig
2015-06-17 11:27     ` Christoph Hellwig
2015-06-11 21:19 ` [PATCH v4 4/6] devm: fix ioremap_cache() usage Dan Williams
2015-06-11 21:19   ` Dan Williams
2015-06-11 21:19 ` [PATCH v4 5/6] arch: introduce memremap_cache() and memremap_wt() Dan Williams
2015-06-11 21:19   ` Dan Williams
2015-06-19 21:28   ` Toshi Kani
2015-06-19 21:28     ` Toshi Kani
2015-06-11 21:19 ` [PATCH v4 6/6] arch, x86: pmem api for ensuring durability of persistent memory updates Dan Williams
2015-06-11 21:19   ` Dan Williams
2015-06-17 11:31   ` Christoph Hellwig
2015-06-17 11:31     ` Christoph Hellwig
2015-06-17 14:54     ` Dan Williams
2015-06-17 14:54       ` Dan Williams
2015-06-17 15:08       ` Andy Lutomirski
2015-06-17 15:08         ` Andy Lutomirski
2015-06-17 15:07   ` Andy Lutomirski [this message]
2015-06-17 15:07     ` Andy Lutomirski
2015-06-17 15:15     ` Thomas Gleixner
2015-06-17 15:15       ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrXXYyjKHi1ajR6aescmjSo5eds=5g_byWpzBRbBNdsgRQ@mail.gmail.com' \
    --to=luto@amacapital.net \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=geert@linux-m68k.org \
    --cc=hch@lst.de \
    --cc=hmh@hmh.eng.br \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mcgrof@suse.com \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=ralf@linux-mips.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=stefan.bader@canonical.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=toshi.kani@hp.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.