All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-21 23:48 ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-21 23:48 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-acpi, linux-kernel

The nvdimm_flush() mechanism helps to reduce the impact of an ADR
(asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
platform WPQ (write-pending-queue) buffers when power is removed. The
nvdimm_flush() mechanism performs that same function on-demand.

When a pmem namespace is associated with a block device, an
nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
request. However, when a namespace is in device-dax mode, or namespaces
are disabled, userspace needs another path.

The new 'flush' attribute is visible when it can be determined that the
interleave-set either does, or does not have DIMMs that expose WPQ-flush
addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
flushes DIMMs, or returns "0" the flush operation is a platform nop.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 8de5a04644a1..3495b4c23941 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(size);
 
+static ssize_t flush_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct nd_region *nd_region = to_nd_region(dev);
+
+	if (nvdimm_has_flush(nd_region)) {
+		nvdimm_flush(nd_region);
+		return sprintf(buf, "1\n");
+	}
+	return sprintf(buf, "0\n");
+}
+static DEVICE_ATTR_RO(flush);
+
 static ssize_t mappings_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -474,6 +487,7 @@ static DEVICE_ATTR_RO(resource);
 
 static struct attribute *nd_region_attributes[] = {
 	&dev_attr_size.attr,
+	&dev_attr_flush.attr,
 	&dev_attr_nstype.attr,
 	&dev_attr_mappings.attr,
 	&dev_attr_btt_seed.attr,
@@ -508,6 +522,9 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
 	if (!is_nd_pmem(dev) && a == &dev_attr_resource.attr)
 		return 0;
 
+	if (a == &dev_attr_flush.attr && nvdimm_has_flush(nd_region) < 0)
+		return 0;
+
 	if (a != &dev_attr_set_cookie.attr
 			&& a != &dev_attr_available_size.attr)
 		return a->mode;

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-21 23:48 ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-21 23:48 UTC (permalink / raw)
  To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
  Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA

The nvdimm_flush() mechanism helps to reduce the impact of an ADR
(asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
platform WPQ (write-pending-queue) buffers when power is removed. The
nvdimm_flush() mechanism performs that same function on-demand.

When a pmem namespace is associated with a block device, an
nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
request. However, when a namespace is in device-dax mode, or namespaces
are disabled, userspace needs another path.

The new 'flush' attribute is visible when it can be determined that the
interleave-set either does, or does not have DIMMs that expose WPQ-flush
addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
flushes DIMMs, or returns "0" the flush operation is a platform nop.

Signed-off-by: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 8de5a04644a1..3495b4c23941 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(size);
 
+static ssize_t flush_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct nd_region *nd_region = to_nd_region(dev);
+
+	if (nvdimm_has_flush(nd_region)) {
+		nvdimm_flush(nd_region);
+		return sprintf(buf, "1\n");
+	}
+	return sprintf(buf, "0\n");
+}
+static DEVICE_ATTR_RO(flush);
+
 static ssize_t mappings_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -474,6 +487,7 @@ static DEVICE_ATTR_RO(resource);
 
 static struct attribute *nd_region_attributes[] = {
 	&dev_attr_size.attr,
+	&dev_attr_flush.attr,
 	&dev_attr_nstype.attr,
 	&dev_attr_mappings.attr,
 	&dev_attr_btt_seed.attr,
@@ -508,6 +522,9 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
 	if (!is_nd_pmem(dev) && a == &dev_attr_resource.attr)
 		return 0;
 
+	if (a == &dev_attr_flush.attr && nvdimm_has_flush(nd_region) < 0)
+		return 0;
+
 	if (a != &dev_attr_set_cookie.attr
 			&& a != &dev_attr_available_size.attr)
 		return a->mode;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-21 23:48 ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-21 23:48 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-acpi, linux-kernel

The nvdimm_flush() mechanism helps to reduce the impact of an ADR
(asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
platform WPQ (write-pending-queue) buffers when power is removed. The
nvdimm_flush() mechanism performs that same function on-demand.

When a pmem namespace is associated with a block device, an
nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
request. However, when a namespace is in device-dax mode, or namespaces
are disabled, userspace needs another path.

The new 'flush' attribute is visible when it can be determined that the
interleave-set either does, or does not have DIMMs that expose WPQ-flush
addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
flushes DIMMs, or returns "0" the flush operation is a platform nop.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 8de5a04644a1..3495b4c23941 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(size);
 
+static ssize_t flush_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct nd_region *nd_region = to_nd_region(dev);
+
+	if (nvdimm_has_flush(nd_region)) {
+		nvdimm_flush(nd_region);
+		return sprintf(buf, "1\n");
+	}
+	return sprintf(buf, "0\n");
+}
+static DEVICE_ATTR_RO(flush);
+
 static ssize_t mappings_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -474,6 +487,7 @@ static DEVICE_ATTR_RO(resource);
 
 static struct attribute *nd_region_attributes[] = {
 	&dev_attr_size.attr,
+	&dev_attr_flush.attr,
 	&dev_attr_nstype.attr,
 	&dev_attr_mappings.attr,
 	&dev_attr_btt_seed.attr,
@@ -508,6 +522,9 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
 	if (!is_nd_pmem(dev) && a == &dev_attr_resource.attr)
 		return 0;
 
+	if (a == &dev_attr_flush.attr && nvdimm_has_flush(nd_region) < 0)
+		return 0;
+
 	if (a != &dev_attr_set_cookie.attr
 			&& a != &dev_attr_available_size.attr)
 		return a->mode;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
  2017-04-21 23:48 ` Dan Williams
@ 2017-04-24  5:31   ` Masayoshi Mizuma
  -1 siblings, 0 replies; 29+ messages in thread
From: Masayoshi Mizuma @ 2017-04-24  5:31 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, linux-acpi, linux-kernel

On Fri, 21 Apr 2017 16:48:57 -0700 Dan Williams wrote:
> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
> platform WPQ (write-pending-queue) buffers when power is removed. The
> nvdimm_flush() mechanism performs that same function on-demand.
> 
> When a pmem namespace is associated with a block device, an
> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
> request. However, when a namespace is in device-dax mode, or namespaces
> are disabled, userspace needs another path.
> 
> The new 'flush' attribute is visible when it can be determined that the
> interleave-set either does, or does not have DIMMs that expose WPQ-flush
> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
> flushes DIMMs, or returns "0" the flush operation is a platform nop.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 8de5a04644a1..3495b4c23941 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(size);
>  
> +static ssize_t flush_show(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +	struct nd_region *nd_region = to_nd_region(dev);
> +
> +	if (nvdimm_has_flush(nd_region)) {

nvdimm_has_flush() also returns as -ENXIO, so

if (nvdimm_has_flush(nd_region) == 1)  

> +		nvdimm_flush(nd_region);
> +		return sprintf(buf, "1\n");
> +	}
> +	return sprintf(buf, "0\n");
> +}
> +static DEVICE_ATTR_RO(flush);
> +

I think separating show and store is better because
users may only check wheter the device has the flush capability or not.

The separate code is like as follows (not tested).

static ssize_t flush_show(struct device *dev,
	struct device_attribute *attr, char *buf)
{
	struct nd_region *nd_region = to_nd_region(dev);

	if (nvdimm_has_flush(nd_region) == 1) {
		return sprintf(buf, "1\n");
	} else {
		return sprintf(buf, "0\n");
	}
}

static ssize_t flush_store(struct device *dev,
	struct device_attribute *attr, const char *buf, size_t len)
{
	bool flush;
	int rc = strtobool(buf, &flush);
	struct nd_region *nd_region = to_nd_region(dev);

	if (rc)
		return rc;

	if (flush && (nvdimm_has_flush(nd_region) == 1)) {
		nvdimm_flush(nd_region);
		return 0;
	} else {
		return -1;
	}
}
static DEVICE_ATTR_RW(flush);

Regards,
Masayoshi Mizuma

>  static ssize_t mappings_show(struct device *dev,
>  		struct device_attribute *attr, char *buf)
>  {
> @@ -474,6 +487,7 @@ static DEVICE_ATTR_RO(resource);
>  
>  static struct attribute *nd_region_attributes[] = {
>  	&dev_attr_size.attr,
> +	&dev_attr_flush.attr,
>  	&dev_attr_nstype.attr,
>  	&dev_attr_mappings.attr,
>  	&dev_attr_btt_seed.attr,
> @@ -508,6 +522,9 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
>  	if (!is_nd_pmem(dev) && a == &dev_attr_resource.attr)
>  		return 0;
>  
> +	if (a == &dev_attr_flush.attr && nvdimm_has_flush(nd_region) < 0)
> +		return 0;
> +
>  	if (a != &dev_attr_set_cookie.attr
>  			&& a != &dev_attr_available_size.attr)
>  		return a->mode;
> 
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24  5:31   ` Masayoshi Mizuma
  0 siblings, 0 replies; 29+ messages in thread
From: Masayoshi Mizuma @ 2017-04-24  5:31 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, linux-acpi, linux-kernel

On Fri, 21 Apr 2017 16:48:57 -0700 Dan Williams wrote:
> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
> platform WPQ (write-pending-queue) buffers when power is removed. The
> nvdimm_flush() mechanism performs that same function on-demand.
> 
> When a pmem namespace is associated with a block device, an
> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
> request. However, when a namespace is in device-dax mode, or namespaces
> are disabled, userspace needs another path.
> 
> The new 'flush' attribute is visible when it can be determined that the
> interleave-set either does, or does not have DIMMs that expose WPQ-flush
> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
> flushes DIMMs, or returns "0" the flush operation is a platform nop.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 8de5a04644a1..3495b4c23941 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(size);
>  
> +static ssize_t flush_show(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +	struct nd_region *nd_region = to_nd_region(dev);
> +
> +	if (nvdimm_has_flush(nd_region)) {

nvdimm_has_flush() also returns as -ENXIO, so

if (nvdimm_has_flush(nd_region) == 1)  

> +		nvdimm_flush(nd_region);
> +		return sprintf(buf, "1\n");
> +	}
> +	return sprintf(buf, "0\n");
> +}
> +static DEVICE_ATTR_RO(flush);
> +

I think separating show and store is better because
users may only check wheter the device has the flush capability or not.

The separate code is like as follows (not tested).

static ssize_t flush_show(struct device *dev,
	struct device_attribute *attr, char *buf)
{
	struct nd_region *nd_region = to_nd_region(dev);

	if (nvdimm_has_flush(nd_region) == 1) {
		return sprintf(buf, "1\n");
	} else {
		return sprintf(buf, "0\n");
	}
}

static ssize_t flush_store(struct device *dev,
	struct device_attribute *attr, const char *buf, size_t len)
{
	bool flush;
	int rc = strtobool(buf, &flush);
	struct nd_region *nd_region = to_nd_region(dev);

	if (rc)
		return rc;

	if (flush && (nvdimm_has_flush(nd_region) == 1)) {
		nvdimm_flush(nd_region);
		return 0;
	} else {
		return -1;
	}
}
static DEVICE_ATTR_RW(flush);

Regards,
Masayoshi Mizuma

>  static ssize_t mappings_show(struct device *dev,
>  		struct device_attribute *attr, char *buf)
>  {
> @@ -474,6 +487,7 @@ static DEVICE_ATTR_RO(resource);
>  
>  static struct attribute *nd_region_attributes[] = {
>  	&dev_attr_size.attr,
> +	&dev_attr_flush.attr,
>  	&dev_attr_nstype.attr,
>  	&dev_attr_mappings.attr,
>  	&dev_attr_btt_seed.attr,
> @@ -508,6 +522,9 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
>  	if (!is_nd_pmem(dev) && a == &dev_attr_resource.attr)
>  		return 0;
>  
> +	if (a == &dev_attr_flush.attr && nvdimm_has_flush(nd_region) < 0)
> +		return 0;
> +
>  	if (a != &dev_attr_set_cookie.attr
>  			&& a != &dev_attr_available_size.attr)
>  		return a->mode;
> 
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24  7:04     ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-24  7:04 UTC (permalink / raw)
  To: Masayoshi Mizuma; +Cc: Linux ACPI, Linux Kernel Mailing List, linux-nvdimm

On Sun, Apr 23, 2017 at 10:31 PM, Masayoshi Mizuma
<m.mizuma@jp.fujitsu.com> wrote:
> On Fri, 21 Apr 2017 16:48:57 -0700 Dan Williams wrote:
>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>> platform WPQ (write-pending-queue) buffers when power is removed. The
>> nvdimm_flush() mechanism performs that same function on-demand.
>>
>> When a pmem namespace is associated with a block device, an
>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>> request. However, when a namespace is in device-dax mode, or namespaces
>> are disabled, userspace needs another path.
>>
>> The new 'flush' attribute is visible when it can be determined that the
>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
>> index 8de5a04644a1..3495b4c23941 100644
>> --- a/drivers/nvdimm/region_devs.c
>> +++ b/drivers/nvdimm/region_devs.c
>> @@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
>>  }
>>  static DEVICE_ATTR_RO(size);
>>
>> +static ssize_t flush_show(struct device *dev,
>> +             struct device_attribute *attr, char *buf)
>> +{
>> +     struct nd_region *nd_region = to_nd_region(dev);
>> +
>> +     if (nvdimm_has_flush(nd_region)) {
>
> nvdimm_has_flush() also returns as -ENXIO, so
>
> if (nvdimm_has_flush(nd_region) == 1)

If it returns -ENXIO then region_visible() will hide the attribute.

>
>> +             nvdimm_flush(nd_region);
>> +             return sprintf(buf, "1\n");
>> +     }
>> +     return sprintf(buf, "0\n");
>> +}
>> +static DEVICE_ATTR_RO(flush);
>> +
>
> I think separating show and store is better because
> users may only check wheter the device has the flush capability or not.

Makes sense, I'll separate. Thanks for the review.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24  7:04     ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-24  7:04 UTC (permalink / raw)
  To: Masayoshi Mizuma
  Cc: Linux ACPI, Linux Kernel Mailing List,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw

On Sun, Apr 23, 2017 at 10:31 PM, Masayoshi Mizuma
<m.mizuma-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> On Fri, 21 Apr 2017 16:48:57 -0700 Dan Williams wrote:
>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>> platform WPQ (write-pending-queue) buffers when power is removed. The
>> nvdimm_flush() mechanism performs that same function on-demand.
>>
>> When a pmem namespace is associated with a block device, an
>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>> request. However, when a namespace is in device-dax mode, or namespaces
>> are disabled, userspace needs another path.
>>
>> The new 'flush' attribute is visible when it can be determined that the
>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>
>> Signed-off-by: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>> ---
>>  drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
>> index 8de5a04644a1..3495b4c23941 100644
>> --- a/drivers/nvdimm/region_devs.c
>> +++ b/drivers/nvdimm/region_devs.c
>> @@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
>>  }
>>  static DEVICE_ATTR_RO(size);
>>
>> +static ssize_t flush_show(struct device *dev,
>> +             struct device_attribute *attr, char *buf)
>> +{
>> +     struct nd_region *nd_region = to_nd_region(dev);
>> +
>> +     if (nvdimm_has_flush(nd_region)) {
>
> nvdimm_has_flush() also returns as -ENXIO, so
>
> if (nvdimm_has_flush(nd_region) == 1)

If it returns -ENXIO then region_visible() will hide the attribute.

>
>> +             nvdimm_flush(nd_region);
>> +             return sprintf(buf, "1\n");
>> +     }
>> +     return sprintf(buf, "0\n");
>> +}
>> +static DEVICE_ATTR_RO(flush);
>> +
>
> I think separating show and store is better because
> users may only check wheter the device has the flush capability or not.

Makes sense, I'll separate. Thanks for the review.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24  7:04     ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-24  7:04 UTC (permalink / raw)
  To: Masayoshi Mizuma
  Cc: linux-nvdimm@lists.01.org, Linux ACPI, Linux Kernel Mailing List

On Sun, Apr 23, 2017 at 10:31 PM, Masayoshi Mizuma
<m.mizuma@jp.fujitsu.com> wrote:
> On Fri, 21 Apr 2017 16:48:57 -0700 Dan Williams wrote:
>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>> platform WPQ (write-pending-queue) buffers when power is removed. The
>> nvdimm_flush() mechanism performs that same function on-demand.
>>
>> When a pmem namespace is associated with a block device, an
>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>> request. However, when a namespace is in device-dax mode, or namespaces
>> are disabled, userspace needs another path.
>>
>> The new 'flush' attribute is visible when it can be determined that the
>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
>> index 8de5a04644a1..3495b4c23941 100644
>> --- a/drivers/nvdimm/region_devs.c
>> +++ b/drivers/nvdimm/region_devs.c
>> @@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
>>  }
>>  static DEVICE_ATTR_RO(size);
>>
>> +static ssize_t flush_show(struct device *dev,
>> +             struct device_attribute *attr, char *buf)
>> +{
>> +     struct nd_region *nd_region = to_nd_region(dev);
>> +
>> +     if (nvdimm_has_flush(nd_region)) {
>
> nvdimm_has_flush() also returns as -ENXIO, so
>
> if (nvdimm_has_flush(nd_region) == 1)

If it returns -ENXIO then region_visible() will hide the attribute.

>
>> +             nvdimm_flush(nd_region);
>> +             return sprintf(buf, "1\n");
>> +     }
>> +     return sprintf(buf, "0\n");
>> +}
>> +static DEVICE_ATTR_RO(flush);
>> +
>
> I think separating show and store is better because
> users may only check wheter the device has the flush capability or not.

Makes sense, I'll separate. Thanks for the review.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 16:26   ` Jeff Moyer
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff Moyer @ 2017-04-24 16:26 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-acpi, linux-kernel, linux-nvdimm

Dan Williams <dan.j.williams@intel.com> writes:

> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
> platform WPQ (write-pending-queue) buffers when power is removed. The
> nvdimm_flush() mechanism performs that same function on-demand.
>
> When a pmem namespace is associated with a block device, an
> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
> request. However, when a namespace is in device-dax mode, or namespaces
> are disabled, userspace needs another path.
>
> The new 'flush' attribute is visible when it can be determined that the
> interleave-set either does, or does not have DIMMs that expose WPQ-flush
> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

NACK.  This should function the same way it does for a pmem device.
Wire up sync.

-Jeff

> ---
>  drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 8de5a04644a1..3495b4c23941 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(size);
>  
> +static ssize_t flush_show(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +	struct nd_region *nd_region = to_nd_region(dev);
> +
> +	if (nvdimm_has_flush(nd_region)) {
> +		nvdimm_flush(nd_region);
> +		return sprintf(buf, "1\n");
> +	}
> +	return sprintf(buf, "0\n");
> +}
> +static DEVICE_ATTR_RO(flush);
> +
>  static ssize_t mappings_show(struct device *dev,
>  		struct device_attribute *attr, char *buf)
>  {
> @@ -474,6 +487,7 @@ static DEVICE_ATTR_RO(resource);
>  
>  static struct attribute *nd_region_attributes[] = {
>  	&dev_attr_size.attr,
> +	&dev_attr_flush.attr,
>  	&dev_attr_nstype.attr,
>  	&dev_attr_mappings.attr,
>  	&dev_attr_btt_seed.attr,
> @@ -508,6 +522,9 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
>  	if (!is_nd_pmem(dev) && a == &dev_attr_resource.attr)
>  		return 0;
>  
> +	if (a == &dev_attr_flush.attr && nvdimm_has_flush(nd_region) < 0)
> +		return 0;
> +
>  	if (a != &dev_attr_set_cookie.attr
>  			&& a != &dev_attr_available_size.attr)
>  		return a->mode;
>
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 16:26   ` Jeff Moyer
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff Moyer @ 2017-04-24 16:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw

Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:

> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
> platform WPQ (write-pending-queue) buffers when power is removed. The
> nvdimm_flush() mechanism performs that same function on-demand.
>
> When a pmem namespace is associated with a block device, an
> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
> request. However, when a namespace is in device-dax mode, or namespaces
> are disabled, userspace needs another path.
>
> The new 'flush' attribute is visible when it can be determined that the
> interleave-set either does, or does not have DIMMs that expose WPQ-flush
> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>
> Signed-off-by: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

NACK.  This should function the same way it does for a pmem device.
Wire up sync.

-Jeff

> ---
>  drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 8de5a04644a1..3495b4c23941 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(size);
>  
> +static ssize_t flush_show(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +	struct nd_region *nd_region = to_nd_region(dev);
> +
> +	if (nvdimm_has_flush(nd_region)) {
> +		nvdimm_flush(nd_region);
> +		return sprintf(buf, "1\n");
> +	}
> +	return sprintf(buf, "0\n");
> +}
> +static DEVICE_ATTR_RO(flush);
> +
>  static ssize_t mappings_show(struct device *dev,
>  		struct device_attribute *attr, char *buf)
>  {
> @@ -474,6 +487,7 @@ static DEVICE_ATTR_RO(resource);
>  
>  static struct attribute *nd_region_attributes[] = {
>  	&dev_attr_size.attr,
> +	&dev_attr_flush.attr,
>  	&dev_attr_nstype.attr,
>  	&dev_attr_mappings.attr,
>  	&dev_attr_btt_seed.attr,
> @@ -508,6 +522,9 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
>  	if (!is_nd_pmem(dev) && a == &dev_attr_resource.attr)
>  		return 0;
>  
> +	if (a == &dev_attr_flush.attr && nvdimm_has_flush(nd_region) < 0)
> +		return 0;
> +
>  	if (a != &dev_attr_set_cookie.attr
>  			&& a != &dev_attr_available_size.attr)
>  		return a->mode;
>
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 16:26   ` Jeff Moyer
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff Moyer @ 2017-04-24 16:26 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, linux-acpi, linux-kernel

Dan Williams <dan.j.williams@intel.com> writes:

> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
> platform WPQ (write-pending-queue) buffers when power is removed. The
> nvdimm_flush() mechanism performs that same function on-demand.
>
> When a pmem namespace is associated with a block device, an
> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
> request. However, when a namespace is in device-dax mode, or namespaces
> are disabled, userspace needs another path.
>
> The new 'flush' attribute is visible when it can be determined that the
> interleave-set either does, or does not have DIMMs that expose WPQ-flush
> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

NACK.  This should function the same way it does for a pmem device.
Wire up sync.

-Jeff

> ---
>  drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 8de5a04644a1..3495b4c23941 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(size);
>  
> +static ssize_t flush_show(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +	struct nd_region *nd_region = to_nd_region(dev);
> +
> +	if (nvdimm_has_flush(nd_region)) {
> +		nvdimm_flush(nd_region);
> +		return sprintf(buf, "1\n");
> +	}
> +	return sprintf(buf, "0\n");
> +}
> +static DEVICE_ATTR_RO(flush);
> +
>  static ssize_t mappings_show(struct device *dev,
>  		struct device_attribute *attr, char *buf)
>  {
> @@ -474,6 +487,7 @@ static DEVICE_ATTR_RO(resource);
>  
>  static struct attribute *nd_region_attributes[] = {
>  	&dev_attr_size.attr,
> +	&dev_attr_flush.attr,
>  	&dev_attr_nstype.attr,
>  	&dev_attr_mappings.attr,
>  	&dev_attr_btt_seed.attr,
> @@ -508,6 +522,9 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
>  	if (!is_nd_pmem(dev) && a == &dev_attr_resource.attr)
>  		return 0;
>  
> +	if (a == &dev_attr_flush.attr && nvdimm_has_flush(nd_region) < 0)
> +		return 0;
> +
>  	if (a != &dev_attr_set_cookie.attr
>  			&& a != &dev_attr_available_size.attr)
>  		return a->mode;
>
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
  2017-04-24 16:26   ` Jeff Moyer
  (?)
@ 2017-04-24 16:36     ` Dan Williams
  -1 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-24 16:36 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Linux ACPI, linux-kernel, linux-nvdimm

On Mon, Apr 24, 2017 at 9:26 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
>
>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>> platform WPQ (write-pending-queue) buffers when power is removed. The
>> nvdimm_flush() mechanism performs that same function on-demand.
>>
>> When a pmem namespace is associated with a block device, an
>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>> request. However, when a namespace is in device-dax mode, or namespaces
>> are disabled, userspace needs another path.
>>
>> The new 'flush' attribute is visible when it can be determined that the
>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> NACK.  This should function the same way it does for a pmem device.
> Wire up sync.

We don't have dirty page tracking for device-dax, without that I don't
think we should wire up the current sync calls. I do think we need a
more sophisticated sync syscall interface eventually that can select
which level of flushing is being performed (page cache vs cpu cache vs
platform-write-buffers). Until then I think this sideband interface
makes sense and sysfs is more usable than an ioctl.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 16:36     ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-24 16:36 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-nvdimm, Linux ACPI, linux-kernel

On Mon, Apr 24, 2017 at 9:26 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
>
>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>> platform WPQ (write-pending-queue) buffers when power is removed. The
>> nvdimm_flush() mechanism performs that same function on-demand.
>>
>> When a pmem namespace is associated with a block device, an
>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>> request. However, when a namespace is in device-dax mode, or namespaces
>> are disabled, userspace needs another path.
>>
>> The new 'flush' attribute is visible when it can be determined that the
>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> NACK.  This should function the same way it does for a pmem device.
> Wire up sync.

We don't have dirty page tracking for device-dax, without that I don't
think we should wire up the current sync calls. I do think we need a
more sophisticated sync syscall interface eventually that can select
which level of flushing is being performed (page cache vs cpu cache vs
platform-write-buffers). Until then I think this sideband interface
makes sense and sysfs is more usable than an ioctl.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 16:36     ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-24 16:36 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-nvdimm@lists.01.org, Linux ACPI, linux-kernel

On Mon, Apr 24, 2017 at 9:26 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
>
>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>> platform WPQ (write-pending-queue) buffers when power is removed. The
>> nvdimm_flush() mechanism performs that same function on-demand.
>>
>> When a pmem namespace is associated with a block device, an
>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>> request. However, when a namespace is in device-dax mode, or namespaces
>> are disabled, userspace needs another path.
>>
>> The new 'flush' attribute is visible when it can be determined that the
>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> NACK.  This should function the same way it does for a pmem device.
> Wire up sync.

We don't have dirty page tracking for device-dax, without that I don't
think we should wire up the current sync calls. I do think we need a
more sophisticated sync syscall interface eventually that can select
which level of flushing is being performed (page cache vs cpu cache vs
platform-write-buffers). Until then I think this sideband interface
makes sense and sysfs is more usable than an ioctl.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
  2017-04-24 16:36     ` Dan Williams
  (?)
@ 2017-04-24 16:43       ` Jeff Moyer
  -1 siblings, 0 replies; 29+ messages in thread
From: Jeff Moyer @ 2017-04-24 16:43 UTC (permalink / raw)
  To: Dan Williams; +Cc: Linux ACPI, linux-kernel, linux-nvdimm

Dan Williams <dan.j.williams@intel.com> writes:

> On Mon, Apr 24, 2017 at 9:26 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>>> platform WPQ (write-pending-queue) buffers when power is removed. The
>>> nvdimm_flush() mechanism performs that same function on-demand.
>>>
>>> When a pmem namespace is associated with a block device, an
>>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>>> request. However, when a namespace is in device-dax mode, or namespaces
>>> are disabled, userspace needs another path.
>>>
>>> The new 'flush' attribute is visible when it can be determined that the
>>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>>
>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>
>> NACK.  This should function the same way it does for a pmem device.
>> Wire up sync.
>
> We don't have dirty page tracking for device-dax, without that I don't
> think we should wire up the current sync calls.

Why not?  Device dax is meant for the "flush from userspace" paradigm.
There's enough special casing around device dax that I think you can get
away with implementing *sync as call to nvdimm_flush.

> I do think we need a more sophisticated sync syscall interface
> eventually that can select which level of flushing is being performed
> (page cache vs cpu cache vs platform-write-buffers).

I don't.  I think this whole notion of flush, and flush harder is
brain-dead.  How do you explain to applications when they should use
each one?

> Until then I think this sideband interface makes sense and sysfs is
> more usable than an ioctl.

Well, if you're totally against wiring up sync, then I say we forget
about the deep flush completely.  What's your use case?

Cheers,
Jeff
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 16:43       ` Jeff Moyer
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff Moyer @ 2017-04-24 16:43 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, Linux ACPI, linux-kernel

Dan Williams <dan.j.williams@intel.com> writes:

> On Mon, Apr 24, 2017 at 9:26 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>>> platform WPQ (write-pending-queue) buffers when power is removed. The
>>> nvdimm_flush() mechanism performs that same function on-demand.
>>>
>>> When a pmem namespace is associated with a block device, an
>>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>>> request. However, when a namespace is in device-dax mode, or namespaces
>>> are disabled, userspace needs another path.
>>>
>>> The new 'flush' attribute is visible when it can be determined that the
>>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>>
>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>
>> NACK.  This should function the same way it does for a pmem device.
>> Wire up sync.
>
> We don't have dirty page tracking for device-dax, without that I don't
> think we should wire up the current sync calls.

Why not?  Device dax is meant for the "flush from userspace" paradigm.
There's enough special casing around device dax that I think you can get
away with implementing *sync as call to nvdimm_flush.

> I do think we need a more sophisticated sync syscall interface
> eventually that can select which level of flushing is being performed
> (page cache vs cpu cache vs platform-write-buffers).

I don't.  I think this whole notion of flush, and flush harder is
brain-dead.  How do you explain to applications when they should use
each one?

> Until then I think this sideband interface makes sense and sysfs is
> more usable than an ioctl.

Well, if you're totally against wiring up sync, then I say we forget
about the deep flush completely.  What's your use case?

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 16:43       ` Jeff Moyer
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff Moyer @ 2017-04-24 16:43 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm@lists.01.org, Linux ACPI, linux-kernel

Dan Williams <dan.j.williams@intel.com> writes:

> On Mon, Apr 24, 2017 at 9:26 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>>> platform WPQ (write-pending-queue) buffers when power is removed. The
>>> nvdimm_flush() mechanism performs that same function on-demand.
>>>
>>> When a pmem namespace is associated with a block device, an
>>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>>> request. However, when a namespace is in device-dax mode, or namespaces
>>> are disabled, userspace needs another path.
>>>
>>> The new 'flush' attribute is visible when it can be determined that the
>>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>>
>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>
>> NACK.  This should function the same way it does for a pmem device.
>> Wire up sync.
>
> We don't have dirty page tracking for device-dax, without that I don't
> think we should wire up the current sync calls.

Why not?  Device dax is meant for the "flush from userspace" paradigm.
There's enough special casing around device dax that I think you can get
away with implementing *sync as call to nvdimm_flush.

> I do think we need a more sophisticated sync syscall interface
> eventually that can select which level of flushing is being performed
> (page cache vs cpu cache vs platform-write-buffers).

I don't.  I think this whole notion of flush, and flush harder is
brain-dead.  How do you explain to applications when they should use
each one?

> Until then I think this sideband interface makes sense and sysfs is
> more usable than an ioctl.

Well, if you're totally against wiring up sync, then I say we forget
about the deep flush completely.  What's your use case?

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 17:03   ` Linda Knippers
  0 siblings, 0 replies; 29+ messages in thread
From: Linda Knippers @ 2017-04-24 17:03 UTC (permalink / raw)
  To: Dan Williams, linux-nvdimm; +Cc: linux-acpi, linux-kernel

On 04/21/2017 07:48 PM, Dan Williams wrote:
> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
> platform WPQ (write-pending-queue) buffers when power is removed. The
> nvdimm_flush() mechanism performs that same function on-demand.
> 
> When a pmem namespace is associated with a block device, an
> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
> request. However, when a namespace is in device-dax mode, or namespaces
> are disabled, userspace needs another path.

Why would a user need to flush a disabled namespace?

> The new 'flush' attribute is visible when it can be determined that the
> interleave-set either does, or does not have DIMMs that expose WPQ-flush
> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
> flushes DIMMs, or returns "0" the flush operation is a platform nop.

It seems a little odd to me that reading a read-only attribute both
tells you that the device has flush hints and also triggers a flush.
This means that anyone at any time can cause a flush.  Do we want that?

-- ljk

> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 8de5a04644a1..3495b4c23941 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(size);
>  
> +static ssize_t flush_show(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +	struct nd_region *nd_region = to_nd_region(dev);
> +
> +	if (nvdimm_has_flush(nd_region)) {
> +		nvdimm_flush(nd_region);
> +		return sprintf(buf, "1\n");
> +	}
> +	return sprintf(buf, "0\n");
> +}
> +static DEVICE_ATTR_RO(flush);
> +
>  static ssize_t mappings_show(struct device *dev,
>  		struct device_attribute *attr, char *buf)
>  {
> @@ -474,6 +487,7 @@ static DEVICE_ATTR_RO(resource);
>  
>  static struct attribute *nd_region_attributes[] = {
>  	&dev_attr_size.attr,
> +	&dev_attr_flush.attr,
>  	&dev_attr_nstype.attr,
>  	&dev_attr_mappings.attr,
>  	&dev_attr_btt_seed.attr,
> @@ -508,6 +522,9 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
>  	if (!is_nd_pmem(dev) && a == &dev_attr_resource.attr)
>  		return 0;
>  
> +	if (a == &dev_attr_flush.attr && nvdimm_has_flush(nd_region) < 0)
> +		return 0;
> +
>  	if (a != &dev_attr_set_cookie.attr
>  			&& a != &dev_attr_available_size.attr)
>  		return a->mode;
> 
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
> 

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 17:03   ` Linda Knippers
  0 siblings, 0 replies; 29+ messages in thread
From: Linda Knippers @ 2017-04-24 17:03 UTC (permalink / raw)
  To: Dan Williams, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
  Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 04/21/2017 07:48 PM, Dan Williams wrote:
> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
> platform WPQ (write-pending-queue) buffers when power is removed. The
> nvdimm_flush() mechanism performs that same function on-demand.
> 
> When a pmem namespace is associated with a block device, an
> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
> request. However, when a namespace is in device-dax mode, or namespaces
> are disabled, userspace needs another path.

Why would a user need to flush a disabled namespace?

> The new 'flush' attribute is visible when it can be determined that the
> interleave-set either does, or does not have DIMMs that expose WPQ-flush
> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
> flushes DIMMs, or returns "0" the flush operation is a platform nop.

It seems a little odd to me that reading a read-only attribute both
tells you that the device has flush hints and also triggers a flush.
This means that anyone at any time can cause a flush.  Do we want that?

-- ljk

> 
> Signed-off-by: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 8de5a04644a1..3495b4c23941 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(size);
>  
> +static ssize_t flush_show(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +	struct nd_region *nd_region = to_nd_region(dev);
> +
> +	if (nvdimm_has_flush(nd_region)) {
> +		nvdimm_flush(nd_region);
> +		return sprintf(buf, "1\n");
> +	}
> +	return sprintf(buf, "0\n");
> +}
> +static DEVICE_ATTR_RO(flush);
> +
>  static ssize_t mappings_show(struct device *dev,
>  		struct device_attribute *attr, char *buf)
>  {
> @@ -474,6 +487,7 @@ static DEVICE_ATTR_RO(resource);
>  
>  static struct attribute *nd_region_attributes[] = {
>  	&dev_attr_size.attr,
> +	&dev_attr_flush.attr,
>  	&dev_attr_nstype.attr,
>  	&dev_attr_mappings.attr,
>  	&dev_attr_btt_seed.attr,
> @@ -508,6 +522,9 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
>  	if (!is_nd_pmem(dev) && a == &dev_attr_resource.attr)
>  		return 0;
>  
> +	if (a == &dev_attr_flush.attr && nvdimm_has_flush(nd_region) < 0)
> +		return 0;
> +
>  	if (a != &dev_attr_set_cookie.attr
>  			&& a != &dev_attr_available_size.attr)
>  		return a->mode;
> 
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 17:03   ` Linda Knippers
  0 siblings, 0 replies; 29+ messages in thread
From: Linda Knippers @ 2017-04-24 17:03 UTC (permalink / raw)
  To: Dan Williams, linux-nvdimm; +Cc: linux-acpi, linux-kernel

On 04/21/2017 07:48 PM, Dan Williams wrote:
> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
> platform WPQ (write-pending-queue) buffers when power is removed. The
> nvdimm_flush() mechanism performs that same function on-demand.
> 
> When a pmem namespace is associated with a block device, an
> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
> request. However, when a namespace is in device-dax mode, or namespaces
> are disabled, userspace needs another path.

Why would a user need to flush a disabled namespace?

> The new 'flush' attribute is visible when it can be determined that the
> interleave-set either does, or does not have DIMMs that expose WPQ-flush
> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
> flushes DIMMs, or returns "0" the flush operation is a platform nop.

It seems a little odd to me that reading a read-only attribute both
tells you that the device has flush hints and also triggers a flush.
This means that anyone at any time can cause a flush.  Do we want that?

-- ljk

> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/nvdimm/region_devs.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 8de5a04644a1..3495b4c23941 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -255,6 +255,19 @@ static ssize_t size_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(size);
>  
> +static ssize_t flush_show(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +	struct nd_region *nd_region = to_nd_region(dev);
> +
> +	if (nvdimm_has_flush(nd_region)) {
> +		nvdimm_flush(nd_region);
> +		return sprintf(buf, "1\n");
> +	}
> +	return sprintf(buf, "0\n");
> +}
> +static DEVICE_ATTR_RO(flush);
> +
>  static ssize_t mappings_show(struct device *dev,
>  		struct device_attribute *attr, char *buf)
>  {
> @@ -474,6 +487,7 @@ static DEVICE_ATTR_RO(resource);
>  
>  static struct attribute *nd_region_attributes[] = {
>  	&dev_attr_size.attr,
> +	&dev_attr_flush.attr,
>  	&dev_attr_nstype.attr,
>  	&dev_attr_mappings.attr,
>  	&dev_attr_btt_seed.attr,
> @@ -508,6 +522,9 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
>  	if (!is_nd_pmem(dev) && a == &dev_attr_resource.attr)
>  		return 0;
>  
> +	if (a == &dev_attr_flush.attr && nvdimm_has_flush(nd_region) < 0)
> +		return 0;
> +
>  	if (a != &dev_attr_set_cookie.attr
>  			&& a != &dev_attr_available_size.attr)
>  		return a->mode;
> 
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
  2017-04-24 17:03   ` Linda Knippers
  (?)
@ 2017-04-24 17:07     ` Dan Williams
  -1 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-24 17:07 UTC (permalink / raw)
  To: Linda Knippers; +Cc: Linux ACPI, linux-kernel, linux-nvdimm

On Mon, Apr 24, 2017 at 10:03 AM, Linda Knippers <linda.knippers@hpe.com> wrote:
> On 04/21/2017 07:48 PM, Dan Williams wrote:
>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>> platform WPQ (write-pending-queue) buffers when power is removed. The
>> nvdimm_flush() mechanism performs that same function on-demand.
>>
>> When a pmem namespace is associated with a block device, an
>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>> request. However, when a namespace is in device-dax mode, or namespaces
>> are disabled, userspace needs another path.
>
> Why would a user need to flush a disabled namespace?

For an application that wants to shutdown and sync. Basically I wanted
to make it clear that with this interface the buffers can be synced
regardless of any downstream namespace configuration or state.

>
>> The new 'flush' attribute is visible when it can be determined that the
>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>
> It seems a little odd to me that reading a read-only attribute both
> tells you that the device has flush hints and also triggers a flush.
> This means that anyone at any time can cause a flush.  Do we want that?

No, I'm making the change that Masayoshi-san suggested to move the
flush to a write operation... assuming we move forward given Jeff's
concern.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 17:07     ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-24 17:07 UTC (permalink / raw)
  To: Linda Knippers; +Cc: linux-nvdimm, Linux ACPI, linux-kernel

On Mon, Apr 24, 2017 at 10:03 AM, Linda Knippers <linda.knippers@hpe.com> wrote:
> On 04/21/2017 07:48 PM, Dan Williams wrote:
>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>> platform WPQ (write-pending-queue) buffers when power is removed. The
>> nvdimm_flush() mechanism performs that same function on-demand.
>>
>> When a pmem namespace is associated with a block device, an
>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>> request. However, when a namespace is in device-dax mode, or namespaces
>> are disabled, userspace needs another path.
>
> Why would a user need to flush a disabled namespace?

For an application that wants to shutdown and sync. Basically I wanted
to make it clear that with this interface the buffers can be synced
regardless of any downstream namespace configuration or state.

>
>> The new 'flush' attribute is visible when it can be determined that the
>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>
> It seems a little odd to me that reading a read-only attribute both
> tells you that the device has flush hints and also triggers a flush.
> This means that anyone at any time can cause a flush.  Do we want that?

No, I'm making the change that Masayoshi-san suggested to move the
flush to a write operation... assuming we move forward given Jeff's
concern.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 17:07     ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-24 17:07 UTC (permalink / raw)
  To: Linda Knippers; +Cc: linux-nvdimm@lists.01.org, Linux ACPI, linux-kernel

On Mon, Apr 24, 2017 at 10:03 AM, Linda Knippers <linda.knippers@hpe.com> wrote:
> On 04/21/2017 07:48 PM, Dan Williams wrote:
>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>> platform WPQ (write-pending-queue) buffers when power is removed. The
>> nvdimm_flush() mechanism performs that same function on-demand.
>>
>> When a pmem namespace is associated with a block device, an
>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>> request. However, when a namespace is in device-dax mode, or namespaces
>> are disabled, userspace needs another path.
>
> Why would a user need to flush a disabled namespace?

For an application that wants to shutdown and sync. Basically I wanted
to make it clear that with this interface the buffers can be synced
regardless of any downstream namespace configuration or state.

>
>> The new 'flush' attribute is visible when it can be determined that the
>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>
> It seems a little odd to me that reading a read-only attribute both
> tells you that the device has flush hints and also triggers a flush.
> This means that anyone at any time can cause a flush.  Do we want that?

No, I'm making the change that Masayoshi-san suggested to move the
flush to a write operation... assuming we move forward given Jeff's
concern.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
  2017-04-24 16:43       ` Jeff Moyer
  (?)
@ 2017-04-24 17:43         ` Dan Williams
  -1 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-24 17:43 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Linux ACPI, Christoph Hellwig, linux-kernel, linux-nvdimm

[ adding Christoph ]

On Mon, Apr 24, 2017 at 9:43 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
>
>> On Mon, Apr 24, 2017 at 9:26 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>> Dan Williams <dan.j.williams@intel.com> writes:
>>>
>>>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>>>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>>>> platform WPQ (write-pending-queue) buffers when power is removed. The
>>>> nvdimm_flush() mechanism performs that same function on-demand.
>>>>
>>>> When a pmem namespace is associated with a block device, an
>>>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>>>> request. However, when a namespace is in device-dax mode, or namespaces
>>>> are disabled, userspace needs another path.
>>>>
>>>> The new 'flush' attribute is visible when it can be determined that the
>>>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>>>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>>>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>>>
>>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>>
>>> NACK.  This should function the same way it does for a pmem device.
>>> Wire up sync.
>>
>> We don't have dirty page tracking for device-dax, without that I don't
>> think we should wire up the current sync calls.
>
> Why not?  Device dax is meant for the "flush from userspace" paradigm.
> There's enough special casing around device dax that I think you can get
> away with implementing *sync as call to nvdimm_flush.

I think its an abuse of fsync() and gets in the way of where we might
take userspace-pmem-flushing with new sync primitives as proposed here
[1].

I'm also conscious of the shade that hch threw the last time I tried
to abuse an existing syscall for device-dax [2].

>> I do think we need a more sophisticated sync syscall interface
>> eventually that can select which level of flushing is being performed
>> (page cache vs cpu cache vs platform-write-buffers).
>
> I don't.  I think this whole notion of flush, and flush harder is
> brain-dead.  How do you explain to applications when they should use
> each one?

You never need to use this mechanism to guarantee persistence, which
is counter to what fsync() is defined to provide. This mechanism is
only there to backstop against potential ADR failures.

>> Until then I think this sideband interface makes sense and sysfs is
>> more usable than an ioctl.
>
> Well, if you're totally against wiring up sync, then I say we forget
> about the deep flush completely.  What's your use case?

The use case is device-dax users that want to reduce the impact of an
ADR failure. Which also assumes that the platform has mechanisms to
communicate ADR failure. This is not an interface I expect to be used
for general purpose applications. All of those should be depending
solely on ADR semantics.

[1]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg444842.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2016-December/008299.html
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 17:43         ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-24 17:43 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-nvdimm, Linux ACPI, linux-kernel, Christoph Hellwig

[ adding Christoph ]

On Mon, Apr 24, 2017 at 9:43 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
>
>> On Mon, Apr 24, 2017 at 9:26 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>> Dan Williams <dan.j.williams@intel.com> writes:
>>>
>>>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>>>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>>>> platform WPQ (write-pending-queue) buffers when power is removed. The
>>>> nvdimm_flush() mechanism performs that same function on-demand.
>>>>
>>>> When a pmem namespace is associated with a block device, an
>>>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>>>> request. However, when a namespace is in device-dax mode, or namespaces
>>>> are disabled, userspace needs another path.
>>>>
>>>> The new 'flush' attribute is visible when it can be determined that the
>>>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>>>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>>>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>>>
>>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>>
>>> NACK.  This should function the same way it does for a pmem device.
>>> Wire up sync.
>>
>> We don't have dirty page tracking for device-dax, without that I don't
>> think we should wire up the current sync calls.
>
> Why not?  Device dax is meant for the "flush from userspace" paradigm.
> There's enough special casing around device dax that I think you can get
> away with implementing *sync as call to nvdimm_flush.

I think its an abuse of fsync() and gets in the way of where we might
take userspace-pmem-flushing with new sync primitives as proposed here
[1].

I'm also conscious of the shade that hch threw the last time I tried
to abuse an existing syscall for device-dax [2].

>> I do think we need a more sophisticated sync syscall interface
>> eventually that can select which level of flushing is being performed
>> (page cache vs cpu cache vs platform-write-buffers).
>
> I don't.  I think this whole notion of flush, and flush harder is
> brain-dead.  How do you explain to applications when they should use
> each one?

You never need to use this mechanism to guarantee persistence, which
is counter to what fsync() is defined to provide. This mechanism is
only there to backstop against potential ADR failures.

>> Until then I think this sideband interface makes sense and sysfs is
>> more usable than an ioctl.
>
> Well, if you're totally against wiring up sync, then I say we forget
> about the deep flush completely.  What's your use case?

The use case is device-dax users that want to reduce the impact of an
ADR failure. Which also assumes that the platform has mechanisms to
communicate ADR failure. This is not an interface I expect to be used
for general purpose applications. All of those should be depending
solely on ADR semantics.

[1]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg444842.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2016-December/008299.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 17:43         ` Dan Williams
  0 siblings, 0 replies; 29+ messages in thread
From: Dan Williams @ 2017-04-24 17:43 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: linux-nvdimm@lists.01.org, Linux ACPI, linux-kernel, Christoph Hellwig

[ adding Christoph ]

On Mon, Apr 24, 2017 at 9:43 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
>
>> On Mon, Apr 24, 2017 at 9:26 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>> Dan Williams <dan.j.williams@intel.com> writes:
>>>
>>>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>>>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>>>> platform WPQ (write-pending-queue) buffers when power is removed. The
>>>> nvdimm_flush() mechanism performs that same function on-demand.
>>>>
>>>> When a pmem namespace is associated with a block device, an
>>>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>>>> request. However, when a namespace is in device-dax mode, or namespaces
>>>> are disabled, userspace needs another path.
>>>>
>>>> The new 'flush' attribute is visible when it can be determined that the
>>>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>>>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>>>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>>>
>>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>>
>>> NACK.  This should function the same way it does for a pmem device.
>>> Wire up sync.
>>
>> We don't have dirty page tracking for device-dax, without that I don't
>> think we should wire up the current sync calls.
>
> Why not?  Device dax is meant for the "flush from userspace" paradigm.
> There's enough special casing around device dax that I think you can get
> away with implementing *sync as call to nvdimm_flush.

I think its an abuse of fsync() and gets in the way of where we might
take userspace-pmem-flushing with new sync primitives as proposed here
[1].

I'm also conscious of the shade that hch threw the last time I tried
to abuse an existing syscall for device-dax [2].

>> I do think we need a more sophisticated sync syscall interface
>> eventually that can select which level of flushing is being performed
>> (page cache vs cpu cache vs platform-write-buffers).
>
> I don't.  I think this whole notion of flush, and flush harder is
> brain-dead.  How do you explain to applications when they should use
> each one?

You never need to use this mechanism to guarantee persistence, which
is counter to what fsync() is defined to provide. This mechanism is
only there to backstop against potential ADR failures.

>> Until then I think this sideband interface makes sense and sysfs is
>> more usable than an ioctl.
>
> Well, if you're totally against wiring up sync, then I say we forget
> about the deep flush completely.  What's your use case?

The use case is device-dax users that want to reduce the impact of an
ADR failure. Which also assumes that the platform has mechanisms to
communicate ADR failure. This is not an interface I expect to be used
for general purpose applications. All of those should be depending
solely on ADR semantics.

[1]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg444842.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2016-December/008299.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
  2017-04-24 17:43         ` Dan Williams
  (?)
@ 2017-04-24 17:58           ` Jeff Moyer
  -1 siblings, 0 replies; 29+ messages in thread
From: Jeff Moyer @ 2017-04-24 17:58 UTC (permalink / raw)
  To: Dan Williams; +Cc: Linux ACPI, Christoph Hellwig, linux-kernel, linux-nvdimm

Dan Williams <dan.j.williams@intel.com> writes:

> [ adding Christoph ]
>
> On Mon, Apr 24, 2017 at 9:43 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>>> On Mon, Apr 24, 2017 at 9:26 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>>> Dan Williams <dan.j.williams@intel.com> writes:
>>>>
>>>>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>>>>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>>>>> platform WPQ (write-pending-queue) buffers when power is removed. The
>>>>> nvdimm_flush() mechanism performs that same function on-demand.
>>>>>
>>>>> When a pmem namespace is associated with a block device, an
>>>>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>>>>> request. However, when a namespace is in device-dax mode, or namespaces
>>>>> are disabled, userspace needs another path.
>>>>>
>>>>> The new 'flush' attribute is visible when it can be determined that the
>>>>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>>>>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>>>>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>>>>
>>>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>>>
>>>> NACK.  This should function the same way it does for a pmem device.
>>>> Wire up sync.
>>>
>>> We don't have dirty page tracking for device-dax, without that I don't
>>> think we should wire up the current sync calls.
>>
>> Why not?  Device dax is meant for the "flush from userspace" paradigm.
>> There's enough special casing around device dax that I think you can get
>> away with implementing *sync as call to nvdimm_flush.
>
> I think its an abuse of fsync() and gets in the way of where we might
> take userspace-pmem-flushing with new sync primitives as proposed here
> [1].

I agree that it's an abuse, and I'm happy to not go that route.  I am
still against using a sysfs file to do this WPQ flush, however.

> I'm also conscious of the shade that hch threw the last time I tried
> to abuse an existing syscall for device-dax [2].
>
>>> I do think we need a more sophisticated sync syscall interface
>>> eventually that can select which level of flushing is being performed
>>> (page cache vs cpu cache vs platform-write-buffers).
>>
>> I don't.  I think this whole notion of flush, and flush harder is
>> brain-dead.  How do you explain to applications when they should use
>> each one?
>
> You never need to use this mechanism to guarantee persistence, which
> is counter to what fsync() is defined to provide. This mechanism is
> only there to backstop against potential ADR failures.

You haven't answered my question.  Why should applications even need to
consider this?  Do you expect ADR to have a high failure rate?  If so,
shouldn't an application call this deep flush any time they would want
to make their state persistent?

>>> Until then I think this sideband interface makes sense and sysfs is
>>> more usable than an ioctl.
>>
>> Well, if you're totally against wiring up sync, then I say we forget
>> about the deep flush completely.  What's your use case?
>
> The use case is device-dax users that want to reduce the impact of an
> ADR failure. Which also assumes that the platform has mechanisms to
> communicate ADR failure. This is not an interface I expect to be used
> for general purpose applications. All of those should be depending
> solely on ADR semantics.

What applications?

I remain unconvinced of the utility of the WPQ flush separate from
msync/fsync.  Either you always do the WPQ flush, or you never do it.  I
don't see the use case for doing it sometimes, and no one I've asked has
managed to come up with a concrete use case.

Cheers,
Jeff
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 17:58           ` Jeff Moyer
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff Moyer @ 2017-04-24 17:58 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, Linux ACPI, linux-kernel, Christoph Hellwig

Dan Williams <dan.j.williams@intel.com> writes:

> [ adding Christoph ]
>
> On Mon, Apr 24, 2017 at 9:43 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>>> On Mon, Apr 24, 2017 at 9:26 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>>> Dan Williams <dan.j.williams@intel.com> writes:
>>>>
>>>>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>>>>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>>>>> platform WPQ (write-pending-queue) buffers when power is removed. The
>>>>> nvdimm_flush() mechanism performs that same function on-demand.
>>>>>
>>>>> When a pmem namespace is associated with a block device, an
>>>>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>>>>> request. However, when a namespace is in device-dax mode, or namespaces
>>>>> are disabled, userspace needs another path.
>>>>>
>>>>> The new 'flush' attribute is visible when it can be determined that the
>>>>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>>>>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>>>>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>>>>
>>>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>>>
>>>> NACK.  This should function the same way it does for a pmem device.
>>>> Wire up sync.
>>>
>>> We don't have dirty page tracking for device-dax, without that I don't
>>> think we should wire up the current sync calls.
>>
>> Why not?  Device dax is meant for the "flush from userspace" paradigm.
>> There's enough special casing around device dax that I think you can get
>> away with implementing *sync as call to nvdimm_flush.
>
> I think its an abuse of fsync() and gets in the way of where we might
> take userspace-pmem-flushing with new sync primitives as proposed here
> [1].

I agree that it's an abuse, and I'm happy to not go that route.  I am
still against using a sysfs file to do this WPQ flush, however.

> I'm also conscious of the shade that hch threw the last time I tried
> to abuse an existing syscall for device-dax [2].
>
>>> I do think we need a more sophisticated sync syscall interface
>>> eventually that can select which level of flushing is being performed
>>> (page cache vs cpu cache vs platform-write-buffers).
>>
>> I don't.  I think this whole notion of flush, and flush harder is
>> brain-dead.  How do you explain to applications when they should use
>> each one?
>
> You never need to use this mechanism to guarantee persistence, which
> is counter to what fsync() is defined to provide. This mechanism is
> only there to backstop against potential ADR failures.

You haven't answered my question.  Why should applications even need to
consider this?  Do you expect ADR to have a high failure rate?  If so,
shouldn't an application call this deep flush any time they would want
to make their state persistent?

>>> Until then I think this sideband interface makes sense and sysfs is
>>> more usable than an ioctl.
>>
>> Well, if you're totally against wiring up sync, then I say we forget
>> about the deep flush completely.  What's your use case?
>
> The use case is device-dax users that want to reduce the impact of an
> ADR failure. Which also assumes that the platform has mechanisms to
> communicate ADR failure. This is not an interface I expect to be used
> for general purpose applications. All of those should be depending
> solely on ADR semantics.

What applications?

I remain unconvinced of the utility of the WPQ flush separate from
msync/fsync.  Either you always do the WPQ flush, or you never do it.  I
don't see the use case for doing it sometimes, and no one I've asked has
managed to come up with a concrete use case.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush()
@ 2017-04-24 17:58           ` Jeff Moyer
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff Moyer @ 2017-04-24 17:58 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm@lists.01.org, Linux ACPI, linux-kernel, Christoph Hellwig

Dan Williams <dan.j.williams@intel.com> writes:

> [ adding Christoph ]
>
> On Mon, Apr 24, 2017 at 9:43 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>>> On Mon, Apr 24, 2017 at 9:26 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>>> Dan Williams <dan.j.williams@intel.com> writes:
>>>>
>>>>> The nvdimm_flush() mechanism helps to reduce the impact of an ADR
>>>>> (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
>>>>> platform WPQ (write-pending-queue) buffers when power is removed. The
>>>>> nvdimm_flush() mechanism performs that same function on-demand.
>>>>>
>>>>> When a pmem namespace is associated with a block device, an
>>>>> nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
>>>>> request. However, when a namespace is in device-dax mode, or namespaces
>>>>> are disabled, userspace needs another path.
>>>>>
>>>>> The new 'flush' attribute is visible when it can be determined that the
>>>>> interleave-set either does, or does not have DIMMs that expose WPQ-flush
>>>>> addresses, "flush-hints" in ACPI NFIT terminology. It returns "1" and
>>>>> flushes DIMMs, or returns "0" the flush operation is a platform nop.
>>>>>
>>>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>>>
>>>> NACK.  This should function the same way it does for a pmem device.
>>>> Wire up sync.
>>>
>>> We don't have dirty page tracking for device-dax, without that I don't
>>> think we should wire up the current sync calls.
>>
>> Why not?  Device dax is meant for the "flush from userspace" paradigm.
>> There's enough special casing around device dax that I think you can get
>> away with implementing *sync as call to nvdimm_flush.
>
> I think its an abuse of fsync() and gets in the way of where we might
> take userspace-pmem-flushing with new sync primitives as proposed here
> [1].

I agree that it's an abuse, and I'm happy to not go that route.  I am
still against using a sysfs file to do this WPQ flush, however.

> I'm also conscious of the shade that hch threw the last time I tried
> to abuse an existing syscall for device-dax [2].
>
>>> I do think we need a more sophisticated sync syscall interface
>>> eventually that can select which level of flushing is being performed
>>> (page cache vs cpu cache vs platform-write-buffers).
>>
>> I don't.  I think this whole notion of flush, and flush harder is
>> brain-dead.  How do you explain to applications when they should use
>> each one?
>
> You never need to use this mechanism to guarantee persistence, which
> is counter to what fsync() is defined to provide. This mechanism is
> only there to backstop against potential ADR failures.

You haven't answered my question.  Why should applications even need to
consider this?  Do you expect ADR to have a high failure rate?  If so,
shouldn't an application call this deep flush any time they would want
to make their state persistent?

>>> Until then I think this sideband interface makes sense and sysfs is
>>> more usable than an ioctl.
>>
>> Well, if you're totally against wiring up sync, then I say we forget
>> about the deep flush completely.  What's your use case?
>
> The use case is device-dax users that want to reduce the impact of an
> ADR failure. Which also assumes that the platform has mechanisms to
> communicate ADR failure. This is not an interface I expect to be used
> for general purpose applications. All of those should be depending
> solely on ADR semantics.

What applications?

I remain unconvinced of the utility of the WPQ flush separate from
msync/fsync.  Either you always do the WPQ flush, or you never do it.  I
don't see the use case for doing it sometimes, and no one I've asked has
managed to come up with a concrete use case.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-04-24 17:58 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-21 23:48 [PATCH] libnvdimm, region: sysfs trigger for nvdimm_flush() Dan Williams
2017-04-21 23:48 ` Dan Williams
2017-04-21 23:48 ` Dan Williams
2017-04-24  5:31 ` Masayoshi Mizuma
2017-04-24  5:31   ` Masayoshi Mizuma
2017-04-24  7:04   ` Dan Williams
2017-04-24  7:04     ` Dan Williams
2017-04-24  7:04     ` Dan Williams
2017-04-24 16:26 ` Jeff Moyer
2017-04-24 16:26   ` Jeff Moyer
2017-04-24 16:26   ` Jeff Moyer
2017-04-24 16:36   ` Dan Williams
2017-04-24 16:36     ` Dan Williams
2017-04-24 16:36     ` Dan Williams
2017-04-24 16:43     ` Jeff Moyer
2017-04-24 16:43       ` Jeff Moyer
2017-04-24 16:43       ` Jeff Moyer
2017-04-24 17:43       ` Dan Williams
2017-04-24 17:43         ` Dan Williams
2017-04-24 17:43         ` Dan Williams
2017-04-24 17:58         ` Jeff Moyer
2017-04-24 17:58           ` Jeff Moyer
2017-04-24 17:58           ` Jeff Moyer
2017-04-24 17:03 ` Linda Knippers
2017-04-24 17:03   ` Linda Knippers
2017-04-24 17:03   ` Linda Knippers
2017-04-24 17:07   ` Dan Williams
2017-04-24 17:07     ` Dan Williams
2017-04-24 17:07     ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.