linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] libnvdimm: Update persistence domain value for of_pmem and papr_scm device
@ 2020-02-05  5:20 Aneesh Kumar K.V
  2020-02-09 16:12 ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Aneesh Kumar K.V @ 2020-02-05  5:20 UTC (permalink / raw)
  To: dan.j.williams; +Cc: linux-nvdimm, linuxppc-dev, Aneesh Kumar K.V

Currently, kernel shows the below values
	"persistence_domain":"cpu_cache"
	"persistence_domain":"memory_controller"
	"persistence_domain":"unknown"

"cpu_cache" indicates no extra instructions is needed to ensure the persistence
of data in the pmem media on power failure.

"memory_controller" indicates platform provided instructions need to be issued
as per documented sequence to make sure data get flushed so that it is
guaranteed to be on pmem media in case of system power loss.

Based on the above use memory_controller for non volatile regions on ppc64.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/papr_scm.c | 7 ++++++-
 drivers/nvdimm/of_pmem.c                  | 4 +++-
 include/linux/libnvdimm.h                 | 1 -
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 7525635a8536..ffcd0d7a867c 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -359,8 +359,13 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 
 	if (p->is_volatile)
 		p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
-	else
+	else {
+		/*
+		 * We need to flush things correctly to guarantee persistance
+		 */
+		set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
 		p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc);
+	}
 	if (!p->region) {
 		dev_err(dev, "Error registering region %pR from %pOF\n",
 				ndr_desc.res, p->dn);
diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index 8224d1431ea9..6826a274a1f1 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -62,8 +62,10 @@ static int of_pmem_region_probe(struct platform_device *pdev)
 
 		if (is_volatile)
 			region = nvdimm_volatile_region_create(bus, &ndr_desc);
-		else
+		else {
+			set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
 			region = nvdimm_pmem_region_create(bus, &ndr_desc);
+		}
 
 		if (!region)
 			dev_warn(&pdev->dev, "Unable to register region %pR from %pOF\n",
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 0f366706b0aa..771d888a5ed7 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -54,7 +54,6 @@ enum {
 	/*
 	 * Platform provides mechanisms to automatically flush outstanding
 	 * write data from memory controler to pmem on system power loss.
-	 * (ADR)
 	 */
 	ND_REGION_PERSIST_MEMCTRL = 2,
 
-- 
2.24.1
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] libnvdimm: Update persistence domain value for of_pmem and papr_scm device
  2020-02-05  5:20 [PATCH v2] libnvdimm: Update persistence domain value for of_pmem and papr_scm device Aneesh Kumar K.V
@ 2020-02-09 16:12 ` Dan Williams
  2020-02-10 14:20   ` Aneesh Kumar K.V
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2020-02-09 16:12 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm, linuxppc-dev

On Tue, Feb 4, 2020 at 9:21 PM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Currently, kernel shows the below values
>         "persistence_domain":"cpu_cache"
>         "persistence_domain":"memory_controller"
>         "persistence_domain":"unknown"
>
> "cpu_cache" indicates no extra instructions is needed to ensure the persistence
> of data in the pmem media on power failure.
>
> "memory_controller" indicates platform provided instructions need to be issued

No, it does not. The only requirement implied by "memory_controller"
is global visibility outside the cpu cache. If there are special
instructions beyond that then it isn't persistent memory, at least not
pmem that is safe for dax. virtio-pmem is an example of pmem-like
memory that is not enabled for userspace flushing (MAP_SYNC disabled).

> as per documented sequence to make sure data get flushed so that it is
> guaranteed to be on pmem media in case of system power loss.
>
> Based on the above use memory_controller for non volatile regions on ppc64.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/papr_scm.c | 7 ++++++-
>  drivers/nvdimm/of_pmem.c                  | 4 +++-
>  include/linux/libnvdimm.h                 | 1 -
>  3 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index 7525635a8536..ffcd0d7a867c 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -359,8 +359,13 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>
>         if (p->is_volatile)
>                 p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
> -       else
> +       else {
> +               /*
> +                * We need to flush things correctly to guarantee persistance
> +                */

There are never guarantees. If you're going to comment what does
software need to flush, and how?

> +               set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
>                 p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc);
> +       }
>         if (!p->region) {
>                 dev_err(dev, "Error registering region %pR from %pOF\n",
>                                 ndr_desc.res, p->dn);
> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> index 8224d1431ea9..6826a274a1f1 100644
> --- a/drivers/nvdimm/of_pmem.c
> +++ b/drivers/nvdimm/of_pmem.c
> @@ -62,8 +62,10 @@ static int of_pmem_region_probe(struct platform_device *pdev)
>
>                 if (is_volatile)
>                         region = nvdimm_volatile_region_create(bus, &ndr_desc);
> -               else
> +               else {
> +                       set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
>                         region = nvdimm_pmem_region_create(bus, &ndr_desc);
> +               }
>
>                 if (!region)
>                         dev_warn(&pdev->dev, "Unable to register region %pR from %pOF\n",
> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> index 0f366706b0aa..771d888a5ed7 100644
> --- a/include/linux/libnvdimm.h
> +++ b/include/linux/libnvdimm.h
> @@ -54,7 +54,6 @@ enum {
>         /*
>          * Platform provides mechanisms to automatically flush outstanding
>          * write data from memory controler to pmem on system power loss.
> -        * (ADR)

I'd rather not delete critical terminology for a developer / platform
owner to be able to consult documentation, or their vendor. Can you
instead add the PowerPC equivalent term for this capability? I.e. list
(x86: ADR PowerPC: foo ...).
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] libnvdimm: Update persistence domain value for of_pmem and papr_scm device
  2020-02-09 16:12 ` Dan Williams
@ 2020-02-10 14:20   ` Aneesh Kumar K.V
  2020-02-10 18:18     ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Aneesh Kumar K.V @ 2020-02-10 14:20 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, linuxppc-dev

Dan Williams <dan.j.williams@intel.com> writes:

> On Tue, Feb 4, 2020 at 9:21 PM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> Currently, kernel shows the below values
>>         "persistence_domain":"cpu_cache"
>>         "persistence_domain":"memory_controller"
>>         "persistence_domain":"unknown"
>>
>> "cpu_cache" indicates no extra instructions is needed to ensure the persistence
>> of data in the pmem media on power failure.
>>
>> "memory_controller" indicates platform provided instructions need to be issued
>
> No, it does not. The only requirement implied by "memory_controller"
> is global visibility outside the cpu cache. If there are special
> instructions beyond that then it isn't persistent memory, at least not
> pmem that is safe for dax. virtio-pmem is an example of pmem-like
> memory that is not enabled for userspace flushing (MAP_SYNC disabled).
>

Can you explain this more? The way I was expecting the application to
interpret the value was, a regular store instruction doesn't guarantee
persistence if you find the "memory_controller" value for
persistence_domain. Instead, we need to make sure we flush data to the
controller at which point the platform will take care of the persistence in
case of power loss. How we flush data to the controller will also be
defined by the platform.


>> as per documented sequence to make sure data get flushed so that it is
>> guaranteed to be on pmem media in case of system power loss.
>>
>> Based on the above use memory_controller for non volatile regions on ppc64.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/papr_scm.c | 7 ++++++-
>>  drivers/nvdimm/of_pmem.c                  | 4 +++-
>>  include/linux/libnvdimm.h                 | 1 -
>>  3 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> index 7525635a8536..ffcd0d7a867c 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -359,8 +359,13 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>>
>>         if (p->is_volatile)
>>                 p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
>> -       else
>> +       else {
>> +               /*
>> +                * We need to flush things correctly to guarantee persistance
>> +                */
>
> There are never guarantees. If you're going to comment what does
> software need to flush, and how?

Can you explain why you say there are never guarantees? If you follow the platform
recommended instruction sequence to flush data, we can be sure of data
persistence in the pmem media.


>
>> +               set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
>>                 p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc);
>> +       }
>>         if (!p->region) {
>>                 dev_err(dev, "Error registering region %pR from %pOF\n",
>>                                 ndr_desc.res, p->dn);
>> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
>> index 8224d1431ea9..6826a274a1f1 100644
>> --- a/drivers/nvdimm/of_pmem.c
>> +++ b/drivers/nvdimm/of_pmem.c
>> @@ -62,8 +62,10 @@ static int of_pmem_region_probe(struct platform_device *pdev)
>>
>>                 if (is_volatile)
>>                         region = nvdimm_volatile_region_create(bus, &ndr_desc);
>> -               else
>> +               else {
>> +                       set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
>>                         region = nvdimm_pmem_region_create(bus, &ndr_desc);
>> +               }
>>
>>                 if (!region)
>>                         dev_warn(&pdev->dev, "Unable to register region %pR from %pOF\n",
>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
>> index 0f366706b0aa..771d888a5ed7 100644
>> --- a/include/linux/libnvdimm.h
>> +++ b/include/linux/libnvdimm.h
>> @@ -54,7 +54,6 @@ enum {
>>         /*
>>          * Platform provides mechanisms to automatically flush outstanding
>>          * write data from memory controler to pmem on system power loss.
>> -        * (ADR)
>
> I'd rather not delete critical terminology for a developer / platform
> owner to be able to consult documentation, or their vendor. Can you
> instead add the PowerPC equivalent term for this capability? I.e. list
> (x86: ADR PowerPC: foo ...).

Power ISA doesn't clearly call out what mechanism will be used to ensure
that a load following power loss will return the previously flushed
data. Hence there is no description of details like Asynchronous DRAM
Refresh. Only details specified is with respect to flush sequence that ensures
that a load following power loss will return the value stored.

-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] libnvdimm: Update persistence domain value for of_pmem and papr_scm device
  2020-02-10 14:20   ` Aneesh Kumar K.V
@ 2020-02-10 18:18     ` Dan Williams
  2020-02-11 14:55       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2020-02-10 18:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm, linuxppc-dev

On Mon, Feb 10, 2020 at 6:20 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Dan Williams <dan.j.williams@intel.com> writes:
>
> > On Tue, Feb 4, 2020 at 9:21 PM Aneesh Kumar K.V
> > <aneesh.kumar@linux.ibm.com> wrote:
> >>
> >> Currently, kernel shows the below values
> >>         "persistence_domain":"cpu_cache"
> >>         "persistence_domain":"memory_controller"
> >>         "persistence_domain":"unknown"
> >>
> >> "cpu_cache" indicates no extra instructions is needed to ensure the persistence
> >> of data in the pmem media on power failure.
> >>
> >> "memory_controller" indicates platform provided instructions need to be issued
> >
> > No, it does not. The only requirement implied by "memory_controller"
> > is global visibility outside the cpu cache. If there are special
> > instructions beyond that then it isn't persistent memory, at least not
> > pmem that is safe for dax. virtio-pmem is an example of pmem-like
> > memory that is not enabled for userspace flushing (MAP_SYNC disabled).
> >
>
> Can you explain this more? The way I was expecting the application to
> interpret the value was, a regular store instruction doesn't guarantee
> persistence if you find the "memory_controller" value for
> persistence_domain. Instead, we need to make sure we flush data to the
> controller at which point the platform will take care of the persistence in
> case of power loss. How we flush data to the controller will also be
> defined by the platform.

If the platform requires any flush mechanism outside of the base cpu
ISA of cache flushes and memory barriers then MAP_SYNC needs to be
explicitly disabled to force the application to call fsync()/msync().
Then those platform specific mechanisms need to be triggered through a
platform-aware driver.

>
>
> >> as per documented sequence to make sure data get flushed so that it is
> >> guaranteed to be on pmem media in case of system power loss.
> >>
> >> Based on the above use memory_controller for non volatile regions on ppc64.
> >>
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> >> ---
> >>  arch/powerpc/platforms/pseries/papr_scm.c | 7 ++++++-
> >>  drivers/nvdimm/of_pmem.c                  | 4 +++-
> >>  include/linux/libnvdimm.h                 | 1 -
> >>  3 files changed, 9 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> >> index 7525635a8536..ffcd0d7a867c 100644
> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> >> @@ -359,8 +359,13 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
> >>
> >>         if (p->is_volatile)
> >>                 p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
> >> -       else
> >> +       else {
> >> +               /*
> >> +                * We need to flush things correctly to guarantee persistance
> >> +                */
> >
> > There are never guarantees. If you're going to comment what does
> > software need to flush, and how?
>
> Can you explain why you say there are never guarantees? If you follow the platform
> recommended instruction sequence to flush data, we can be sure of data
> persistence in the pmem media.

Because storage can always fail. You can reduce risk, but never
eliminate it. This is similar to SSDs that use latent capacitance to
flush their write caches on driver power loss. Even if the application
successfully flushes its writes to buffers that are protected by that
capacitance that power source can still (and in practice does) fail.

>
>
> >
> >> +               set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
> >>                 p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc);
> >> +       }
> >>         if (!p->region) {
> >>                 dev_err(dev, "Error registering region %pR from %pOF\n",
> >>                                 ndr_desc.res, p->dn);
> >> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> >> index 8224d1431ea9..6826a274a1f1 100644
> >> --- a/drivers/nvdimm/of_pmem.c
> >> +++ b/drivers/nvdimm/of_pmem.c
> >> @@ -62,8 +62,10 @@ static int of_pmem_region_probe(struct platform_device *pdev)
> >>
> >>                 if (is_volatile)
> >>                         region = nvdimm_volatile_region_create(bus, &ndr_desc);
> >> -               else
> >> +               else {
> >> +                       set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
> >>                         region = nvdimm_pmem_region_create(bus, &ndr_desc);
> >> +               }
> >>
> >>                 if (!region)
> >>                         dev_warn(&pdev->dev, "Unable to register region %pR from %pOF\n",
> >> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> >> index 0f366706b0aa..771d888a5ed7 100644
> >> --- a/include/linux/libnvdimm.h
> >> +++ b/include/linux/libnvdimm.h
> >> @@ -54,7 +54,6 @@ enum {
> >>         /*
> >>          * Platform provides mechanisms to automatically flush outstanding
> >>          * write data from memory controler to pmem on system power loss.
> >> -        * (ADR)
> >
> > I'd rather not delete critical terminology for a developer / platform
> > owner to be able to consult documentation, or their vendor. Can you
> > instead add the PowerPC equivalent term for this capability? I.e. list
> > (x86: ADR PowerPC: foo ...).
>
> Power ISA doesn't clearly call out what mechanism will be used to ensure
> that a load following power loss will return the previously flushed
> data. Hence there is no description of details like Asynchronous DRAM
> Refresh. Only details specified is with respect to flush sequence that ensures
> that a load following power loss will return the value stored.

What is this "flush sequence"?
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] libnvdimm: Update persistence domain value for of_pmem and papr_scm device
  2020-02-10 18:18     ` Dan Williams
@ 2020-02-11 14:55       ` Aneesh Kumar K.V
  2020-02-11 16:38         ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Aneesh Kumar K.V @ 2020-02-11 14:55 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, linuxppc-dev

On 2/10/20 11:48 PM, Dan Williams wrote:
> On Mon, Feb 10, 2020 at 6:20 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>>> On Tue, Feb 4, 2020 at 9:21 PM Aneesh Kumar K.V
>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>
>>>> Currently, kernel shows the below values
>>>>          "persistence_domain":"cpu_cache"
>>>>          "persistence_domain":"memory_controller"
>>>>          "persistence_domain":"unknown"
>>>>
>>>> "cpu_cache" indicates no extra instructions is needed to ensure the persistence
>>>> of data in the pmem media on power failure.
>>>>
>>>> "memory_controller" indicates platform provided instructions need to be issued
>>>
>>> No, it does not. The only requirement implied by "memory_controller"
>>> is global visibility outside the cpu cache. If there are special
>>> instructions beyond that then it isn't persistent memory, at least not
>>> pmem that is safe for dax. virtio-pmem is an example of pmem-like
>>> memory that is not enabled for userspace flushing (MAP_SYNC disabled).
>>>
>>
>> Can you explain this more? The way I was expecting the application to
>> interpret the value was, a regular store instruction doesn't guarantee
>> persistence if you find the "memory_controller" value for
>> persistence_domain. Instead, we need to make sure we flush data to the
>> controller at which point the platform will take care of the persistence in
>> case of power loss. How we flush data to the controller will also be
>> defined by the platform.
> 
> If the platform requires any flush mechanism outside of the base cpu
> ISA of cache flushes and memory barriers then MAP_SYNC needs to be
> explicitly disabled to force the application to call fsync()/msync().
> Then those platform specific mechanisms need to be triggered through a
> platform-aware driver.
> 


Agreed. I was thinking we mark the persistence_domain: "Unknown" in that 
case. virtio-pmem mark it that way.


>>
>>
>>>> as per documented sequence to make sure data get flushed so that it is
>>>> guaranteed to be on pmem media in case of system power loss.
>>>>
>>>> Based on the above use memory_controller for non volatile regions on ppc64.
>>>>
>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> ---
>>>>   arch/powerpc/platforms/pseries/papr_scm.c | 7 ++++++-
>>>>   drivers/nvdimm/of_pmem.c                  | 4 +++-
>>>>   include/linux/libnvdimm.h                 | 1 -
>>>>   3 files changed, 9 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>>>> index 7525635a8536..ffcd0d7a867c 100644
>>>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>>>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>>>> @@ -359,8 +359,13 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>>>>
>>>>          if (p->is_volatile)
>>>>                  p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
>>>> -       else
>>>> +       else {
>>>> +               /*
>>>> +                * We need to flush things correctly to guarantee persistance
>>>> +                */
>>>
>>> There are never guarantees. If you're going to comment what does
>>> software need to flush, and how?
>>
>> Can you explain why you say there are never guarantees? If you follow the platform
>> recommended instruction sequence to flush data, we can be sure of data
>> persistence in the pmem media.
> 
> Because storage can always fail. You can reduce risk, but never
> eliminate it. This is similar to SSDs that use latent capacitance to
> flush their write caches on driver power loss. Even if the application
> successfully flushes its writes to buffers that are protected by that
> capacitance that power source can still (and in practice does) fail.
> 

ok guarantee is not the right term there. Can we say

/* We need to flush tings correctly to ensure persistence */


What I was trying to understand/clarify was the detail an application 
can infer looking at the value of persistence_domain ?

Do you agree that below can be inferred from the "memory_controller" 
value of persistence_domain

1) Application needs to use cache flush instructions and that ensures 
data is persistent across power failure.


Or are you suggesting that application should not infer any of those 
details looking at persistence_domain value? If so what is the purpose 
of exporting that attribute?


>>
>>
>>>
>>>> +               set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
>>>>                  p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc);
>>>> +       }
>>>>          if (!p->region) {
>>>>                  dev_err(dev, "Error registering region %pR from %pOF\n",
>>>>                                  ndr_desc.res, p->dn);
>>>> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
>>>> index 8224d1431ea9..6826a274a1f1 100644
>>>> --- a/drivers/nvdimm/of_pmem.c
>>>> +++ b/drivers/nvdimm/of_pmem.c
>>>> @@ -62,8 +62,10 @@ static int of_pmem_region_probe(struct platform_device *pdev)
>>>>
>>>>                  if (is_volatile)
>>>>                          region = nvdimm_volatile_region_create(bus, &ndr_desc);
>>>> -               else
>>>> +               else {
>>>> +                       set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
>>>>                          region = nvdimm_pmem_region_create(bus, &ndr_desc);
>>>> +               }
>>>>
>>>>                  if (!region)
>>>>                          dev_warn(&pdev->dev, "Unable to register region %pR from %pOF\n",
>>>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
>>>> index 0f366706b0aa..771d888a5ed7 100644
>>>> --- a/include/linux/libnvdimm.h
>>>> +++ b/include/linux/libnvdimm.h
>>>> @@ -54,7 +54,6 @@ enum {
>>>>          /*
>>>>           * Platform provides mechanisms to automatically flush outstanding
>>>>           * write data from memory controler to pmem on system power loss.
>>>> -        * (ADR)
>>>
>>> I'd rather not delete critical terminology for a developer / platform
>>> owner to be able to consult documentation, or their vendor. Can you
>>> instead add the PowerPC equivalent term for this capability? I.e. list
>>> (x86: ADR PowerPC: foo ...).
>>
>> Power ISA doesn't clearly call out what mechanism will be used to ensure
>> that a load following power loss will return the previously flushed
>> data. Hence there is no description of details like Asynchronous DRAM
>> Refresh. Only details specified is with respect to flush sequence that ensures
>> that a load following power loss will return the value stored.
> 
> What is this "flush sequence"?
> 

cpu cache flush instructions "dcbf; hwsync"

-aneesh
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] libnvdimm: Update persistence domain value for of_pmem and papr_scm device
  2020-02-11 14:55       ` Aneesh Kumar K.V
@ 2020-02-11 16:38         ` Dan Williams
  2020-03-20  9:25           ` Aneesh Kumar K.V
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2020-02-11 16:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm, linuxppc-dev

On Tue, Feb 11, 2020 at 6:57 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> On 2/10/20 11:48 PM, Dan Williams wrote:
> > On Mon, Feb 10, 2020 at 6:20 AM Aneesh Kumar K.V
> > <aneesh.kumar@linux.ibm.com> wrote:
> >>
> >> Dan Williams <dan.j.williams@intel.com> writes:
> >>
> >>> On Tue, Feb 4, 2020 at 9:21 PM Aneesh Kumar K.V
> >>> <aneesh.kumar@linux.ibm.com> wrote:
> >>>>
> >>>> Currently, kernel shows the below values
> >>>>          "persistence_domain":"cpu_cache"
> >>>>          "persistence_domain":"memory_controller"
> >>>>          "persistence_domain":"unknown"
> >>>>
> >>>> "cpu_cache" indicates no extra instructions is needed to ensure the persistence
> >>>> of data in the pmem media on power failure.
> >>>>
> >>>> "memory_controller" indicates platform provided instructions need to be issued
> >>>
> >>> No, it does not. The only requirement implied by "memory_controller"
> >>> is global visibility outside the cpu cache. If there are special
> >>> instructions beyond that then it isn't persistent memory, at least not
> >>> pmem that is safe for dax. virtio-pmem is an example of pmem-like
> >>> memory that is not enabled for userspace flushing (MAP_SYNC disabled).
> >>>
> >>
> >> Can you explain this more? The way I was expecting the application to
> >> interpret the value was, a regular store instruction doesn't guarantee
> >> persistence if you find the "memory_controller" value for
> >> persistence_domain. Instead, we need to make sure we flush data to the
> >> controller at which point the platform will take care of the persistence in
> >> case of power loss. How we flush data to the controller will also be
> >> defined by the platform.
> >
> > If the platform requires any flush mechanism outside of the base cpu
> > ISA of cache flushes and memory barriers then MAP_SYNC needs to be
> > explicitly disabled to force the application to call fsync()/msync().
> > Then those platform specific mechanisms need to be triggered through a
> > platform-aware driver.
> >
>
>
> Agreed. I was thinking we mark the persistence_domain: "Unknown" in that
> case. virtio-pmem mark it that way.

I would say the driver requirement case is persistence_domain "None",
not "Unknown". I.e. the platform provides no mechanism to flush data
to the persistence domain on power loss, it's back to typical storage
semantics.

>
>
> >>
> >>
> >>>> as per documented sequence to make sure data get flushed so that it is
> >>>> guaranteed to be on pmem media in case of system power loss.
> >>>>
> >>>> Based on the above use memory_controller for non volatile regions on ppc64.
> >>>>
> >>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> >>>> ---
> >>>>   arch/powerpc/platforms/pseries/papr_scm.c | 7 ++++++-
> >>>>   drivers/nvdimm/of_pmem.c                  | 4 +++-
> >>>>   include/linux/libnvdimm.h                 | 1 -
> >>>>   3 files changed, 9 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> >>>> index 7525635a8536..ffcd0d7a867c 100644
> >>>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> >>>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> >>>> @@ -359,8 +359,13 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
> >>>>
> >>>>          if (p->is_volatile)
> >>>>                  p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
> >>>> -       else
> >>>> +       else {
> >>>> +               /*
> >>>> +                * We need to flush things correctly to guarantee persistance
> >>>> +                */
> >>>
> >>> There are never guarantees. If you're going to comment what does
> >>> software need to flush, and how?
> >>
> >> Can you explain why you say there are never guarantees? If you follow the platform
> >> recommended instruction sequence to flush data, we can be sure of data
> >> persistence in the pmem media.
> >
> > Because storage can always fail. You can reduce risk, but never
> > eliminate it. This is similar to SSDs that use latent capacitance to
> > flush their write caches on driver power loss. Even if the application
> > successfully flushes its writes to buffers that are protected by that
> > capacitance that power source can still (and in practice does) fail.
> >
>
> ok guarantee is not the right term there. Can we say
>
> /* We need to flush tings correctly to ensure persistence */

The definition of the "memory_controller" persistence domain is: "the
platform takes care to flush writes to media once they are globally
visible outside the cache".

>
>
> What I was trying to understand/clarify was the detail an application
> can infer looking at the value of persistence_domain ?
>
> Do you agree that below can be inferred from the "memory_controller"
> value of persistence_domain
>
> 1) Application needs to use cache flush instructions and that ensures
> data is persistent across power failure.
>
>
> Or are you suggesting that application should not infer any of those
> details looking at persistence_domain value? If so what is the purpose
> of exporting that attribute?

The way the patch was worded I thought it was referring to an explicit
mechanism outside cpu cache flushes, i.e. a mechanism that required a
driver call.

>
>
> >>
> >>
> >>>
> >>>> +               set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
> >>>>                  p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc);
> >>>> +       }
> >>>>          if (!p->region) {
> >>>>                  dev_err(dev, "Error registering region %pR from %pOF\n",
> >>>>                                  ndr_desc.res, p->dn);
> >>>> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> >>>> index 8224d1431ea9..6826a274a1f1 100644
> >>>> --- a/drivers/nvdimm/of_pmem.c
> >>>> +++ b/drivers/nvdimm/of_pmem.c
> >>>> @@ -62,8 +62,10 @@ static int of_pmem_region_probe(struct platform_device *pdev)
> >>>>
> >>>>                  if (is_volatile)
> >>>>                          region = nvdimm_volatile_region_create(bus, &ndr_desc);
> >>>> -               else
> >>>> +               else {
> >>>> +                       set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
> >>>>                          region = nvdimm_pmem_region_create(bus, &ndr_desc);
> >>>> +               }
> >>>>
> >>>>                  if (!region)
> >>>>                          dev_warn(&pdev->dev, "Unable to register region %pR from %pOF\n",
> >>>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> >>>> index 0f366706b0aa..771d888a5ed7 100644
> >>>> --- a/include/linux/libnvdimm.h
> >>>> +++ b/include/linux/libnvdimm.h
> >>>> @@ -54,7 +54,6 @@ enum {
> >>>>          /*
> >>>>           * Platform provides mechanisms to automatically flush outstanding
> >>>>           * write data from memory controler to pmem on system power loss.
> >>>> -        * (ADR)
> >>>
> >>> I'd rather not delete critical terminology for a developer / platform
> >>> owner to be able to consult documentation, or their vendor. Can you
> >>> instead add the PowerPC equivalent term for this capability? I.e. list
> >>> (x86: ADR PowerPC: foo ...).
> >>
> >> Power ISA doesn't clearly call out what mechanism will be used to ensure
> >> that a load following power loss will return the previously flushed
> >> data. Hence there is no description of details like Asynchronous DRAM
> >> Refresh. Only details specified is with respect to flush sequence that ensures
> >> that a load following power loss will return the value stored.
> >
> > What is this "flush sequence"?
> >
>
> cpu cache flush instructions "dcbf; hwsync"

Looks good, as long as the flush mechanism is defined by the cpu ISA
then MAP_SYNC is viable.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] libnvdimm: Update persistence domain value for of_pmem and papr_scm device
  2020-02-11 16:38         ` Dan Williams
@ 2020-03-20  9:25           ` Aneesh Kumar K.V
  2020-03-24  1:21             ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-20  9:25 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm, linuxppc-dev


Hi Dan,


Dan Williams <dan.j.williams@intel.com> writes:

...


>
>>
>> Or are you suggesting that application should not infer any of those
>> details looking at persistence_domain value? If so what is the purpose
>> of exporting that attribute?
>
> The way the patch was worded I thought it was referring to an explicit
> mechanism outside cpu cache flushes, i.e. a mechanism that required a
> driver call.
>

This patch is blocked because I am not expressing the details correctly.
I updates this as below. Can you suggest if this is ok? If not what
alternate wording do you suggest to document "memory controller"


commit 329b46e88f8cd30eee4776b0de7913ab4d496bd8
Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Date:   Wed Dec 18 13:53:16 2019 +0530

    libnvdimm: Update persistence domain value for of_pmem and papr_scm device
    
    Currently, kernel shows the below values
            "persistence_domain":"cpu_cache"
            "persistence_domain":"memory_controller"
            "persistence_domain":"unknown"
    
    "cpu_cache" indicates no extra instructions is needed to ensure the persistence
    of data in the pmem media on power failure.
    
    "memory_controller" indicates cpu cache flush instructions is required to flush
    the data. Platform provides mechanisms to automatically flush outstanding
    write data from memory controler to pmem on system power loss.
    
    Based on the above use memory_controller for non volatile regions on ppc64.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 0b4467e378e5..922a4fc3b61b 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -361,8 +361,10 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 
 	if (p->is_volatile)
 		p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
-	else
+	else {
+		set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
 		p->region = nvdimm_pmem_region_create(p->bus, &ndr_desc);
+	}
 	if (!p->region) {
 		dev_err(dev, "Error registering region %pR from %pOF\n",
 				ndr_desc.res, p->dn);
diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index 8224d1431ea9..6826a274a1f1 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -62,8 +62,10 @@ static int of_pmem_region_probe(struct platform_device *pdev)
 
 		if (is_volatile)
 			region = nvdimm_volatile_region_create(bus, &ndr_desc);
-		else
+		else {
+			set_bit(ND_REGION_PERSIST_MEMCTRL, &ndr_desc.flags);
 			region = nvdimm_pmem_region_create(bus, &ndr_desc);
+		}
 
 		if (!region)
 			dev_warn(&pdev->dev, "Unable to register region %pR from %pOF\n",
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] libnvdimm: Update persistence domain value for of_pmem and papr_scm device
  2020-03-20  9:25           ` Aneesh Kumar K.V
@ 2020-03-24  1:21             ` Dan Williams
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2020-03-24  1:21 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-nvdimm, linuxppc-dev

On Fri, Mar 20, 2020 at 2:25 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
>
> Hi Dan,
>
>
> Dan Williams <dan.j.williams@intel.com> writes:
>
> ...
>
>
> >
> >>
> >> Or are you suggesting that application should not infer any of those
> >> details looking at persistence_domain value? If so what is the purpose
> >> of exporting that attribute?
> >
> > The way the patch was worded I thought it was referring to an explicit
> > mechanism outside cpu cache flushes, i.e. a mechanism that required a
> > driver call.
> >
>
> This patch is blocked because I am not expressing the details correctly.
> I updates this as below. Can you suggest if this is ok? If not what
> alternate wording do you suggest to document "memory controller"
>
>
> commit 329b46e88f8cd30eee4776b0de7913ab4d496bd8
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Date:   Wed Dec 18 13:53:16 2019 +0530
>
>     libnvdimm: Update persistence domain value for of_pmem and papr_scm device
>
>     Currently, kernel shows the below values
>             "persistence_domain":"cpu_cache"
>             "persistence_domain":"memory_controller"
>             "persistence_domain":"unknown"
>
>     "cpu_cache" indicates no extra instructions is needed to ensure the persistence
>     of data in the pmem media on power failure.
>
>     "memory_controller" indicates cpu cache flush instructions is required to flush
>     the data. Platform provides mechanisms to automatically flush outstanding
>     write data from memory controler to pmem on system power loss.
>
>     Based on the above use memory_controller for non volatile regions on ppc64.

Looks good to me, want to resend via git-format-patch?
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-03-24  1:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-05  5:20 [PATCH v2] libnvdimm: Update persistence domain value for of_pmem and papr_scm device Aneesh Kumar K.V
2020-02-09 16:12 ` Dan Williams
2020-02-10 14:20   ` Aneesh Kumar K.V
2020-02-10 18:18     ` Dan Williams
2020-02-11 14:55       ` Aneesh Kumar K.V
2020-02-11 16:38         ` Dan Williams
2020-03-20  9:25           ` Aneesh Kumar K.V
2020-03-24  1:21             ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).