All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	alistair@popple.id.au
Subject: Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
Date: Thu, 21 May 2020 00:13:42 +0530	[thread overview]
Message-ID: <87v9kqqwf5.fsf@linux.ibm.com> (raw)
In-Reply-To: <CAPcyv4jZhYXEmYGzqGPjPtq9ZWJNtQyszN0V0Xcv0qtByK_KCw@mail.gmail.com>

Dan Williams <dan.j.williams@intel.com> writes:

> On Tue, May 19, 2020 at 6:53 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>> > On Mon, May 18, 2020 at 10:30 PM Aneesh Kumar K.V
>> > <aneesh.kumar@linux.ibm.com> wrote:
>>
>> ...
>>
>> >> Applications using new instructions will behave as expected when running
>> >> on P8 and P9. Only future hardware will differentiate between 'dcbf' and
>> >> 'dcbfps'
>> >
>> > Right, this is the problem. Applications using new instructions behave
>> > as expected, the kernel has been shipping of_pmem and papr_scm for
>> > several cycles now, you're saying that the DAX applications written
>> > against those platforms are going to be broken on P8 and P9?
>>
>> The expecation is that both kernel and userspace would get upgraded to
>> use the new instruction before actual persistent memory devices are
>> made available.
>>
>> >
>> >> > I'm thinking the kernel
>> >> > should go as far as to disable DAX operation by default on new
>> >> > hardware until userspace asserts that it is prepared to switch to the
>> >> > new implementation. Is there any other way to ensure the forward
>> >> > compatibility of deployed ppc64 DAX applications?
>> >>
>> >> AFAIU there is no released persistent memory hardware on ppc64 platform
>> >> and we need to make sure before applications get enabled to use these
>> >> persistent memory devices, they should switch to use the new
>> >> instruction?
>> >
>> > Right, I want the kernel to offer some level of safety here because
>> > everything you are describing sounds like a flag day conversion. Am I
>> > misreading? Is there some other gate that prevents existing users of
>> > of_pmem and papr_scm from having their expectations violated when
>> > running on P8 / P9 hardware? Maybe there's tighter ecosystem control
>> > that I'm just not familiar with, I'm only going off the fact that the
>> > kernel has shipped a non-zero number of NVDIMM drivers that build with
>> > ARCH=ppc64 for several cycles.
>>
>> If we are looking at adding changes to kernel that will prevent a kernel
>> from running on newer hardware in a specific case, we could as well take
>> the changes to get the kernel use the newer instructions right?
>
> Oh, no, I'm not talking about stopping the kernel from running. I'm
> simply recommending that support for MAP_SYNC mappings (userspace
> managed flushing) be disabled by default on PPC with either a
> compile-time or run-time default to assert that userspace has been
> audited for legacy applications or that the platform owner is
> otherwise willing to take the risk.
>
>> But I agree with your concern that if we have older kernel/applications
>> that continue to use `dcbf` on future hardware we will end up
>> having issues w.r.t powerfail consistency. The plan is what you outlined
>> above as tighter ecosystem control. Considering we don't have a pmem
>> device generally available, we get both kernel and userspace upgraded
>> to use these new instructions before such a device is made available.
>
> Ok, I think a compile time kernel option with a runtime override
> satisfies my concern. Does that work for you?

something like below? But this still won't handle devdax mmap right?

diff --git a/arch/arm64/include/asm/libnvdimm.h b/arch/arm64/include/asm/libnvdimm.h
new file mode 100644
index 000000000000..aee697a72537
--- /dev/null
+++ b/arch/arm64/include/asm/libnvdimm.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARCH_LIBNVDIMM_H__
+#define __ARCH_LIBNVDIMM_H__
+
+#endif /* __ARCH_LIBNVDIMM_H__  */
diff --git a/arch/powerpc/include/asm/libnvdimm.h b/arch/powerpc/include/asm/libnvdimm.h
new file mode 100644
index 000000000000..da479200bfb8
--- /dev/null
+++ b/arch/powerpc/include/asm/libnvdimm.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARCH_LIBNVDIMM_H__
+#define __ARCH_LIBNVDIMM_H__
+
+#define arch_disable_sync_nvdimm arch_disable_sync_nvdimm
+extern bool arch_disable_sync_nvdimm(void);
+
+#endif /* __ARCH_LIBNVDIMM_H__  */
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 0666a8d29596..3ce4fb4f167b 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -9,6 +9,8 @@
 
 #include <asm/cacheflush.h>
 
+
+static bool sync_fault = IS_ENABLED(CONFIG_PPC_NVDIMM_SYNC_FAULT);
 /*
  * CONFIG_ARCH_HAS_PMEM_API symbols
  */
@@ -57,3 +59,16 @@ void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
 	memcpy_flushcache(to, page_to_virt(page) + offset, len);
 }
 EXPORT_SYMBOL(memcpy_page_flushcache);
+
+bool arch_disable_sync_nvdimm(void)
+{
+	return !sync_fault;
+}
+
+static int __init parse_sync_fault(char *p)
+{
+	sync_fault = true;
+	return 0;
+}
+early_param("enable_sync_fault", parse_sync_fault);
+
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 27a81c291be8..dde11d75a746 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -383,6 +383,15 @@ config PPC_KUEP
 
 	  If you're unsure, say Y.
 
+config PPC_NVDIMM_SYNC_FAULT
+	bool "Synchronous fault support (MAP_SYNC)"
+	default n
+	help
+	  Enable support for synchronous fault with nvdimm namespaces.
+
+	  If you're unsure, say N.
+
+
 config PPC_HAVE_KUAP
 	bool
 
diff --git a/arch/x86/include/asm/libnvdimm.h b/arch/x86/include/asm/libnvdimm.h
new file mode 100644
index 000000000000..aee697a72537
--- /dev/null
+++ b/arch/x86/include/asm/libnvdimm.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARCH_LIBNVDIMM_H__
+#define __ARCH_LIBNVDIMM_H__
+
+#endif /* __ARCH_LIBNVDIMM_H__  */
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index ccbb5b43b8b2..74a0809491af 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1278,6 +1278,13 @@ bool is_nvdimm_sync(struct nd_region *nd_region)
 	if (is_nd_volatile(&nd_region->dev))
 		return true;
 
+	/*
+	 * If arch is forcing a synchronous fault
+	 * disable.
+	 */
+	if (arch_disable_sync_nvdimm())
+		return false;
+
 	return is_nd_pmem(&nd_region->dev) &&
 		!test_bit(ND_REGION_ASYNC, &nd_region->flags);
 }
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18da4059be09..891449aebe91 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -13,6 +13,8 @@
 #include <linux/spinlock.h>
 #include <linux/bio.h>
 
+#include <asm/libnvdimm.h>
+
 struct badrange_entry {
 	u64 start;
 	u64 length;
@@ -286,4 +288,12 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
 }
 #endif
 
+#ifndef arch_disable_sync_nvdimm
+#define arch_disable_sync_nvdimm arch_disable_sync_nvdimm
+static inline bool arch_disable_sync_nvdimm()
+{
+	return false;
+}
+#endif
+
 #endif /* __LIBNVDIMM_H__ */
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: alistair@popple.id.au,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>
Subject: Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
Date: Thu, 21 May 2020 00:13:42 +0530	[thread overview]
Message-ID: <87v9kqqwf5.fsf@linux.ibm.com> (raw)
In-Reply-To: <CAPcyv4jZhYXEmYGzqGPjPtq9ZWJNtQyszN0V0Xcv0qtByK_KCw@mail.gmail.com>

Dan Williams <dan.j.williams@intel.com> writes:

> On Tue, May 19, 2020 at 6:53 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>> > On Mon, May 18, 2020 at 10:30 PM Aneesh Kumar K.V
>> > <aneesh.kumar@linux.ibm.com> wrote:
>>
>> ...
>>
>> >> Applications using new instructions will behave as expected when running
>> >> on P8 and P9. Only future hardware will differentiate between 'dcbf' and
>> >> 'dcbfps'
>> >
>> > Right, this is the problem. Applications using new instructions behave
>> > as expected, the kernel has been shipping of_pmem and papr_scm for
>> > several cycles now, you're saying that the DAX applications written
>> > against those platforms are going to be broken on P8 and P9?
>>
>> The expecation is that both kernel and userspace would get upgraded to
>> use the new instruction before actual persistent memory devices are
>> made available.
>>
>> >
>> >> > I'm thinking the kernel
>> >> > should go as far as to disable DAX operation by default on new
>> >> > hardware until userspace asserts that it is prepared to switch to the
>> >> > new implementation. Is there any other way to ensure the forward
>> >> > compatibility of deployed ppc64 DAX applications?
>> >>
>> >> AFAIU there is no released persistent memory hardware on ppc64 platform
>> >> and we need to make sure before applications get enabled to use these
>> >> persistent memory devices, they should switch to use the new
>> >> instruction?
>> >
>> > Right, I want the kernel to offer some level of safety here because
>> > everything you are describing sounds like a flag day conversion. Am I
>> > misreading? Is there some other gate that prevents existing users of
>> > of_pmem and papr_scm from having their expectations violated when
>> > running on P8 / P9 hardware? Maybe there's tighter ecosystem control
>> > that I'm just not familiar with, I'm only going off the fact that the
>> > kernel has shipped a non-zero number of NVDIMM drivers that build with
>> > ARCH=ppc64 for several cycles.
>>
>> If we are looking at adding changes to kernel that will prevent a kernel
>> from running on newer hardware in a specific case, we could as well take
>> the changes to get the kernel use the newer instructions right?
>
> Oh, no, I'm not talking about stopping the kernel from running. I'm
> simply recommending that support for MAP_SYNC mappings (userspace
> managed flushing) be disabled by default on PPC with either a
> compile-time or run-time default to assert that userspace has been
> audited for legacy applications or that the platform owner is
> otherwise willing to take the risk.
>
>> But I agree with your concern that if we have older kernel/applications
>> that continue to use `dcbf` on future hardware we will end up
>> having issues w.r.t powerfail consistency. The plan is what you outlined
>> above as tighter ecosystem control. Considering we don't have a pmem
>> device generally available, we get both kernel and userspace upgraded
>> to use these new instructions before such a device is made available.
>
> Ok, I think a compile time kernel option with a runtime override
> satisfies my concern. Does that work for you?

something like below? But this still won't handle devdax mmap right?

diff --git a/arch/arm64/include/asm/libnvdimm.h b/arch/arm64/include/asm/libnvdimm.h
new file mode 100644
index 000000000000..aee697a72537
--- /dev/null
+++ b/arch/arm64/include/asm/libnvdimm.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARCH_LIBNVDIMM_H__
+#define __ARCH_LIBNVDIMM_H__
+
+#endif /* __ARCH_LIBNVDIMM_H__  */
diff --git a/arch/powerpc/include/asm/libnvdimm.h b/arch/powerpc/include/asm/libnvdimm.h
new file mode 100644
index 000000000000..da479200bfb8
--- /dev/null
+++ b/arch/powerpc/include/asm/libnvdimm.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARCH_LIBNVDIMM_H__
+#define __ARCH_LIBNVDIMM_H__
+
+#define arch_disable_sync_nvdimm arch_disable_sync_nvdimm
+extern bool arch_disable_sync_nvdimm(void);
+
+#endif /* __ARCH_LIBNVDIMM_H__  */
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 0666a8d29596..3ce4fb4f167b 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -9,6 +9,8 @@
 
 #include <asm/cacheflush.h>
 
+
+static bool sync_fault = IS_ENABLED(CONFIG_PPC_NVDIMM_SYNC_FAULT);
 /*
  * CONFIG_ARCH_HAS_PMEM_API symbols
  */
@@ -57,3 +59,16 @@ void memcpy_page_flushcache(char *to, struct page *page, size_t offset,
 	memcpy_flushcache(to, page_to_virt(page) + offset, len);
 }
 EXPORT_SYMBOL(memcpy_page_flushcache);
+
+bool arch_disable_sync_nvdimm(void)
+{
+	return !sync_fault;
+}
+
+static int __init parse_sync_fault(char *p)
+{
+	sync_fault = true;
+	return 0;
+}
+early_param("enable_sync_fault", parse_sync_fault);
+
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 27a81c291be8..dde11d75a746 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -383,6 +383,15 @@ config PPC_KUEP
 
 	  If you're unsure, say Y.
 
+config PPC_NVDIMM_SYNC_FAULT
+	bool "Synchronous fault support (MAP_SYNC)"
+	default n
+	help
+	  Enable support for synchronous fault with nvdimm namespaces.
+
+	  If you're unsure, say N.
+
+
 config PPC_HAVE_KUAP
 	bool
 
diff --git a/arch/x86/include/asm/libnvdimm.h b/arch/x86/include/asm/libnvdimm.h
new file mode 100644
index 000000000000..aee697a72537
--- /dev/null
+++ b/arch/x86/include/asm/libnvdimm.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARCH_LIBNVDIMM_H__
+#define __ARCH_LIBNVDIMM_H__
+
+#endif /* __ARCH_LIBNVDIMM_H__  */
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index ccbb5b43b8b2..74a0809491af 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1278,6 +1278,13 @@ bool is_nvdimm_sync(struct nd_region *nd_region)
 	if (is_nd_volatile(&nd_region->dev))
 		return true;
 
+	/*
+	 * If arch is forcing a synchronous fault
+	 * disable.
+	 */
+	if (arch_disable_sync_nvdimm())
+		return false;
+
 	return is_nd_pmem(&nd_region->dev) &&
 		!test_bit(ND_REGION_ASYNC, &nd_region->flags);
 }
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 18da4059be09..891449aebe91 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -13,6 +13,8 @@
 #include <linux/spinlock.h>
 #include <linux/bio.h>
 
+#include <asm/libnvdimm.h>
+
 struct badrange_entry {
 	u64 start;
 	u64 length;
@@ -286,4 +288,12 @@ static inline void arch_invalidate_pmem(void *addr, size_t size)
 }
 #endif
 
+#ifndef arch_disable_sync_nvdimm
+#define arch_disable_sync_nvdimm arch_disable_sync_nvdimm
+static inline bool arch_disable_sync_nvdimm()
+{
+	return false;
+}
+#endif
+
 #endif /* __LIBNVDIMM_H__ */

  reply	other threads:[~2020-05-20 18:43 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-13  3:47 [PATCH v2 1/5] powerpc/pmem: Add new instructions for persistent storage and sync Aneesh Kumar K.V
2020-05-13  3:47 ` Aneesh Kumar K.V
2020-05-13  3:47 ` [PATCH v2 2/5] powerpc/pmem: Add flush routines using new pmem store and sync instruction Aneesh Kumar K.V
2020-05-13  3:47   ` Aneesh Kumar K.V
2020-05-13  3:47 ` [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier Aneesh Kumar K.V
2020-05-13  3:47   ` Aneesh Kumar K.V
2020-05-13 16:14   ` Dan Williams
2020-05-13 16:14     ` Dan Williams
2020-05-19  5:30     ` Aneesh Kumar K.V
2020-05-19  5:30       ` Aneesh Kumar K.V
2020-05-19  7:09       ` Dan Williams
2020-05-19  7:09         ` Dan Williams
2020-05-19 13:52         ` Aneesh Kumar K.V
2020-05-19 13:52           ` Aneesh Kumar K.V
2020-05-19 18:59           ` Dan Williams
2020-05-19 18:59             ` Dan Williams
2020-05-20 18:43             ` Aneesh Kumar K.V [this message]
2020-05-20 18:43               ` Aneesh Kumar K.V
2020-05-21 14:38             ` Jeff Moyer
2020-05-21 14:38               ` Jeff Moyer
2020-05-21 17:02               ` Aneesh Kumar K.V
2020-05-21 17:02                 ` Aneesh Kumar K.V
2020-05-21 18:25                 ` Dan Williams
2020-05-21 18:25                   ` Dan Williams
2020-05-21 18:52                   ` Mikulas Patocka
2020-05-21 18:52                     ` Mikulas Patocka
2020-05-22  9:31                     ` Michal Suchánek
2020-05-22  9:31                       ` Michal Suchánek
2020-05-22 10:08                       ` Aneesh Kumar K.V
2020-05-22 10:08                         ` Aneesh Kumar K.V
2020-05-22 13:01                         ` Mikulas Patocka
2020-05-22 13:01                           ` Mikulas Patocka
2020-06-26 10:20                           ` Michal Suchánek
2020-06-26 10:20                             ` Michal Suchánek
2020-05-21 18:34               ` Dan Williams
2020-05-21 18:34                 ` Dan Williams
2020-05-13  3:47 ` [PATCH v2 4/5] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction Aneesh Kumar K.V
2020-05-13  3:47   ` Aneesh Kumar K.V
2020-05-13  6:44   ` kbuild test robot
2020-05-13  6:44     ` kbuild test robot
2020-05-13  6:44     ` kbuild test robot
2020-05-13  3:47 ` [PATCH v2 5/5] powerpc/pmem: Avoid the barrier in flush routines Aneesh Kumar K.V
2020-05-13  3:47   ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v9kqqwf5.fsf@linux.ibm.com \
    --to=aneesh.kumar@linux.ibm.com \
    --cc=alistair@popple.id.au \
    --cc=dan.j.williams@intel.com \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.