nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH] ACPI/NFIT: Add no_deepflush param to dynamic control flush operation
@ 2022-06-29  8:31 Dennis.Wu
  2022-06-29 15:27 ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Dennis.Wu @ 2022-06-29  8:31 UTC (permalink / raw)
  To: nvdimm; +Cc: dan.j.williams, vishal.l.verma, dave.jiang, ira.weiny, Dennis.Wu

reason: in the current BTT implimentation deepflush is always
used and deepflush is very expensive. Since customer already
know the ADR can protect the WPQ data in memory controller and
no need to call deepflush to get better performance. BTT w/o
deepflush, performance can improve 300%~600% with diff FIO jobs.

How: Add one param "no_deepflush" in the nfit module parameter.
if "modprob nfit no_deepflush=1", customer can get the higher
performance but not strict data security. Before modprob nfit,
you may need to "ndctl disable-region".

Next: In the BTT, use flag to control the data w/o deepflush
in the case "no_deepflush=0".

Signed-off-by: Dennis.Wu <dennis.wu@intel.com>
---
 drivers/acpi/nfit/core.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index e5d7f2bda13f..ec0ad48b0283 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -52,6 +52,10 @@ static bool force_labels;
 module_param(force_labels, bool, 0444);
 MODULE_PARM_DESC(force_labels, "Opt-in to labels despite missing methods");
 
+static bool no_deepflush;
+module_param(no_deepflush, bool, 0644);
+MODULE_PARM_DESC(no_deepflush, "skip deep flush if ADR or no strict security");
+
 LIST_HEAD(acpi_descs);
 DEFINE_MUTEX(acpi_desc_lock);
 
@@ -981,8 +985,10 @@ static void *add_table(struct acpi_nfit_desc *acpi_desc,
 			return err;
 		break;
 	case ACPI_NFIT_TYPE_FLUSH_ADDRESS:
-		if (!add_flush(acpi_desc, prev, table))
-			return err;
+		if (!no_deepflush) {
+			if (!add_flush(acpi_desc, prev, table))
+				return err;
+		}
 		break;
 	case ACPI_NFIT_TYPE_SMBIOS:
 		dev_dbg(dev, "smbios\n");
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ACPI/NFIT: Add no_deepflush param to dynamic control flush operation
  2022-06-29  8:31 [PATCH] ACPI/NFIT: Add no_deepflush param to dynamic control flush operation Dennis.Wu
@ 2022-06-29 15:27 ` Christoph Hellwig
  2022-07-13  1:25   ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2022-06-29 15:27 UTC (permalink / raw)
  To: Dennis.Wu; +Cc: nvdimm, dan.j.williams, vishal.l.verma, dave.jiang, ira.weiny

On Wed, Jun 29, 2022 at 04:31:18PM +0800, Dennis.Wu wrote:
> reason: in the current BTT implimentation deepflush is always
> used and deepflush is very expensive. Since customer already
> know the ADR can protect the WPQ data in memory controller and
> no need to call deepflush to get better performance. BTT w/o
> deepflush, performance can improve 300%~600% with diff FIO jobs.
> 
> How: Add one param "no_deepflush" in the nfit module parameter.
> if "modprob nfit no_deepflush=1", customer can get the higher
> performance but not strict data security. Before modprob nfit,
> you may need to "ndctl disable-region".

This goes back to my question from years ago:  why do we ever
do this deep flush in the Linux nvdimm stack to start with?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ACPI/NFIT: Add no_deepflush param to dynamic control flush operation
  2022-06-29 15:27 ` Christoph Hellwig
@ 2022-07-13  1:25   ` Dan Williams
  2022-07-20  6:24     ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2022-07-13  1:25 UTC (permalink / raw)
  To: Christoph Hellwig, Dennis.Wu
  Cc: nvdimm, dan.j.williams, vishal.l.verma, dave.jiang, ira.weiny

Christoph Hellwig wrote:
> On Wed, Jun 29, 2022 at 04:31:18PM +0800, Dennis.Wu wrote:
> > reason: in the current BTT implimentation deepflush is always
> > used and deepflush is very expensive. Since customer already
> > know the ADR can protect the WPQ data in memory controller and
> > no need to call deepflush to get better performance. BTT w/o
> > deepflush, performance can improve 300%~600% with diff FIO jobs.
> > 
> > How: Add one param "no_deepflush" in the nfit module parameter.
> > if "modprob nfit no_deepflush=1", customer can get the higher
> > performance but not strict data security. Before modprob nfit,
> > you may need to "ndctl disable-region".
> 
> This goes back to my question from years ago:  why do we ever
> do this deep flush in the Linux nvdimm stack to start with?

The rationale is to push the data to smaller failure domain. Similar to
flushing disk write-caches. Otherwise, if you trust your memory power
supplies like you trust your disks then just rely on them to take care
of the data.

Otherwise, by default the kernel should default to taking as much care
as possible to push writes to the smallest failure domain possible.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ACPI/NFIT: Add no_deepflush param to dynamic control flush operation
  2022-07-13  1:25   ` Dan Williams
@ 2022-07-20  6:24     ` Christoph Hellwig
  2022-09-20  3:08       ` Wu, Dennis
  2022-09-20 19:30       ` Dan Williams
  0 siblings, 2 replies; 8+ messages in thread
From: Christoph Hellwig @ 2022-07-20  6:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Dennis.Wu, nvdimm, vishal.l.verma, dave.jiang,
	ira.weiny

On Tue, Jul 12, 2022 at 06:25:30PM -0700, Dan Williams wrote:
> > This goes back to my question from years ago:  why do we ever
> > do this deep flush in the Linux nvdimm stack to start with?
> 
> The rationale is to push the data to smaller failure domain. Similar to
> flushing disk write-caches.

Flushing disk caches is not about a smaller failure domain.  Flushing
disk caches is about making data durable _at _all_.

> Otherwise, if you trust your memory power
> supplies like you trust your disks then just rely on them to take care
> of the data.

Well, it seems like all the benchmarketing schemes around pmem seem to
trust it.  Why would kernel block I/O be different from device dax,
MAP_SYNC?

> Otherwise, by default the kernel should default to taking as much care
> as possible to push writes to the smallest failure domain possible.

In which case we need remve the device dax direct map and MAP_SYNC.
Reducing the failure domain is not what fsync or REQ_OP_FLUSH are
about, they are about making changes durable.  How durable is up to
your device implementation.  But if you trust it only a little you
should not offer that half way option to start with.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] ACPI/NFIT: Add no_deepflush param to dynamic control flush operation
  2022-07-20  6:24     ` Christoph Hellwig
@ 2022-09-20  3:08       ` Wu, Dennis
  2022-09-20 11:46         ` Christoph Hellwig
  2022-09-20 19:30       ` Dan Williams
  1 sibling, 1 reply; 8+ messages in thread
From: Wu, Dennis @ 2022-09-20  3:08 UTC (permalink / raw)
  To: Christoph Hellwig, Williams, Dan J
  Cc: nvdimm, Verma, Vishal L, Jiang, Dave, Weiny, Ira

Hi, Dan,
Will we add this patch to some new kernel version?

Thank you a lot!
Dennis Wu

-----Original Message-----
From: Christoph Hellwig <hch@infradead.org> 
Sent: Wednesday, July 20, 2022 2:25 PM
To: Williams, Dan J <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>; Wu, Dennis <dennis.wu@intel.com>; nvdimm@lists.linux.dev; Verma, Vishal L <vishal.l.verma@intel.com>; Jiang, Dave <dave.jiang@intel.com>; Weiny, Ira <ira.weiny@intel.com>
Subject: Re: [PATCH] ACPI/NFIT: Add no_deepflush param to dynamic control flush operation

On Tue, Jul 12, 2022 at 06:25:30PM -0700, Dan Williams wrote:
> > This goes back to my question from years ago:  why do we ever do 
> > this deep flush in the Linux nvdimm stack to start with?
> 
> The rationale is to push the data to smaller failure domain. Similar 
> to flushing disk write-caches.

Flushing disk caches is not about a smaller failure domain.  Flushing disk caches is about making data durable _at _all_.

> Otherwise, if you trust your memory power supplies like you trust your 
> disks then just rely on them to take care of the data.

Well, it seems like all the benchmarketing schemes around pmem seem to trust it.  Why would kernel block I/O be different from device dax, MAP_SYNC?

> Otherwise, by default the kernel should default to taking as much care 
> as possible to push writes to the smallest failure domain possible.

In which case we need remve the device dax direct map and MAP_SYNC.
Reducing the failure domain is not what fsync or REQ_OP_FLUSH are about, they are about making changes durable.  How durable is up to your device implementation.  But if you trust it only a little you should not offer that half way option to start with.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ACPI/NFIT: Add no_deepflush param to dynamic control flush operation
  2022-09-20  3:08       ` Wu, Dennis
@ 2022-09-20 11:46         ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2022-09-20 11:46 UTC (permalink / raw)
  To: Wu, Dennis
  Cc: Christoph Hellwig, Williams, Dan J, nvdimm, Verma, Vishal L,
	Jiang, Dave, Weiny, Ira

On Tue, Sep 20, 2022 at 03:08:03AM +0000, Wu, Dennis wrote:
> Hi, Dan,
> Will we add this patch to some new kernel version?

How about addressing my comments and explaining what the exact
data integrity model is here first?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ACPI/NFIT: Add no_deepflush param to dynamic control flush operation
  2022-07-20  6:24     ` Christoph Hellwig
  2022-09-20  3:08       ` Wu, Dennis
@ 2022-09-20 19:30       ` Dan Williams
  2022-10-20  6:23         ` Wu, Dennis
  1 sibling, 1 reply; 8+ messages in thread
From: Dan Williams @ 2022-09-20 19:30 UTC (permalink / raw)
  To: Christoph Hellwig, Dan Williams
  Cc: Christoph Hellwig, Dennis.Wu, nvdimm, vishal.l.verma, dave.jiang,
	ira.weiny

Christoph Hellwig wrote:
[..]
> > Otherwise, by default the kernel should default to taking as much care
> > as possible to push writes to the smallest failure domain possible.
> 
> In which case we need remve the device dax direct map and MAP_SYNC.
> Reducing the failure domain is not what fsync or REQ_OP_FLUSH are
> about, they are about making changes durable.  How durable is up to
> your device implementation.  But if you trust it only a little you
> should not offer that half way option to start with.

That's a good point, but (with my Linux kernel developer hat on) I would
flip it around and make this the platform vendor's problem. If the
platform vendor has validated ADR* and that platform power supplies
maintain stable power in a powerloss scenario, then 'deepflush' is a
complete nop, why publish a flush mechanism?

In other words, unless the platform vendor has no confidence in the
standard durability model (persistence / durability at global visibility
outside the CPU cache) it should skip publishing these flush hints in
the ACPI NFIT table.

The recourse for an end user whose vendor has published this mechanism
in error is to talk to their BIOS vendor to turn off the flush
capability, or use the ACPI table override mechanism to edit out the
flush capability.

I will also note that CXL has done away with this software flush concept
and defines a standard Global Persistence Flush mechanism in the
protocol that fires at impending power-loss events.

* ADR: Asynchronous DRAM refresh, a platform signal to flush write
  buffers in the device upon detection of power-loss.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] ACPI/NFIT: Add no_deepflush param to dynamic control flush operation
  2022-09-20 19:30       ` Dan Williams
@ 2022-10-20  6:23         ` Wu, Dennis
  0 siblings, 0 replies; 8+ messages in thread
From: Wu, Dennis @ 2022-10-20  6:23 UTC (permalink / raw)
  To: Williams, Dan J, Christoph Hellwig
  Cc: Christoph Hellwig, nvdimm, Verma, Vishal L, Jiang, Dave, Weiny, Ira

Hi Dan,

ADR is the platform capability which is default support in the Intel platform. (eADR is the extend capability to keep the cache as the persistent domain, not well support from the PRC customers)

So far, the customer use PMem would not consider the deep flush since 100% data security will not only leverage the single server, it will leverage the software (CRC, checksum) and distributed system. 
Deep flush is the smallest persistent domain and can assure that data persistent in any situation that might be useful for some Financial scenarios (didn't see in the PRC market so far).
That's why options can be used for user to pick the data security level. 

GPF, as my understanding, still leverage the ADR from the original design and still the capability of the platform and have the chance to be failure. 

Thanks a lot and best regards,
Dennis Wu 

-----Original Message-----
From: Williams, Dan J <dan.j.williams@intel.com> 
Sent: Wednesday, September 21, 2022 3:31 AM
To: Christoph Hellwig <hch@infradead.org>; Williams, Dan J <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>; Wu, Dennis <dennis.wu@intel.com>; nvdimm@lists.linux.dev; Verma, Vishal L <vishal.l.verma@intel.com>; Jiang, Dave <dave.jiang@intel.com>; Weiny, Ira <ira.weiny@intel.com>
Subject: Re: [PATCH] ACPI/NFIT: Add no_deepflush param to dynamic control flush operation

Christoph Hellwig wrote:
[..]
> > Otherwise, by default the kernel should default to taking as much 
> > care as possible to push writes to the smallest failure domain possible.
> 
> In which case we need remve the device dax direct map and MAP_SYNC.
> Reducing the failure domain is not what fsync or REQ_OP_FLUSH are 
> about, they are about making changes durable.  How durable is up to 
> your device implementation.  But if you trust it only a little you 
> should not offer that half way option to start with.

That's a good point, but (with my Linux kernel developer hat on) I would flip it around and make this the platform vendor's problem. If the platform vendor has validated ADR* and that platform power supplies maintain stable power in a powerloss scenario, then 'deepflush' is a complete nop, why publish a flush mechanism?

In other words, unless the platform vendor has no confidence in the standard durability model (persistence / durability at global visibility outside the CPU cache) it should skip publishing these flush hints in the ACPI NFIT table.

The recourse for an end user whose vendor has published this mechanism in error is to talk to their BIOS vendor to turn off the flush capability, or use the ACPI table override mechanism to edit out the flush capability.

I will also note that CXL has done away with this software flush concept and defines a standard Global Persistence Flush mechanism in the protocol that fires at impending power-loss events.

* ADR: Asynchronous DRAM refresh, a platform signal to flush write
  buffers in the device upon detection of power-loss.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-10-20  6:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-29  8:31 [PATCH] ACPI/NFIT: Add no_deepflush param to dynamic control flush operation Dennis.Wu
2022-06-29 15:27 ` Christoph Hellwig
2022-07-13  1:25   ` Dan Williams
2022-07-20  6:24     ` Christoph Hellwig
2022-09-20  3:08       ` Wu, Dennis
2022-09-20 11:46         ` Christoph Hellwig
2022-09-20 19:30       ` Dan Williams
2022-10-20  6:23         ` Wu, Dennis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).