linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] nvdimm, btt: remove unnecessary code in btt_freelist_init
@ 2019-02-25 22:43 Vishal Verma
  2019-02-25 22:43 ` [PATCH 2/2] nvdimm, btt: fix LBA masking during 'free list' population Vishal Verma
  0 siblings, 1 reply; 10+ messages in thread
From: Vishal Verma @ 2019-02-25 22:43 UTC (permalink / raw)
  To: linux-nvdimm

We call btt_log_read() twice, once to get the 'old' log entry, and again
to get the 'new' entry. However, we have no use for the 'old' entry, so
remove it.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index b123b0dcf274..cd4fa87ea48c 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -541,9 +541,9 @@ static int arena_clear_freelist_error(struct arena_info *arena, u32 lane)
 
 static int btt_freelist_init(struct arena_info *arena)
 {
-	int old, new, ret;
+	int new, ret;
 	u32 i, map_entry;
-	struct log_entry log_new, log_old;
+	struct log_entry log_new;
 
 	arena->freelist = kcalloc(arena->nfree, sizeof(struct free_entry),
 					GFP_KERNEL);
@@ -551,10 +551,6 @@ static int btt_freelist_init(struct arena_info *arena)
 		return -ENOMEM;
 
 	for (i = 0; i < arena->nfree; i++) {
-		old = btt_log_read(arena, i, &log_old, LOG_OLD_ENT);
-		if (old < 0)
-			return old;
-
 		new = btt_log_read(arena, i, &log_new, LOG_NEW_ENT);
 		if (new < 0)
 			return new;
-- 
2.20.1

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] nvdimm, btt: fix LBA masking during 'free list' population
  2019-02-25 22:43 [PATCH 1/2] nvdimm, btt: remove unnecessary code in btt_freelist_init Vishal Verma
@ 2019-02-25 22:43 ` Vishal Verma
  2019-02-25 22:48   ` Verma, Vishal L
  0 siblings, 1 reply; 10+ messages in thread
From: Vishal Verma @ 2019-02-25 22:43 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Pedro d'Aquino Filocre F S Barbuda

The Linux BTT implementation assumes that log entries will never have
the 'zero' flag set, and indeed it never sets that flag for log entries
itself.

However, the UEFI spec is ambiguous on the exact format of the LBA field
of a log entry, specifically as to whether it should include the
additional flag bits or not. While a zero bit doesn't make sense in the
context of a log entry, other BTT implementations might still have it set.

If an implementation does happen to have it set, we would happily read
it in as the next block to write to for writes. Since a high bit is set,
it pushes the block number out of the range of an 'arena', and we fail
such a write with an EIO.

Follow the robustness principle, and tolerate such implementations by
stripping out the zero flag when populating the free list during
initialization. Additionally, use the same stripped out entries for
detection of incomplete writes and map restoration that happens at this
stage.

Cc: Dan Williams <dan.j.williams@intel.com>
Reported-by: Dexuan Cui <decui@microsoft.com>
Reported-by: Pedro d'Aquino Filocre F S Barbuda <pbarbuda@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index cd4fa87ea48c..294c48e45e74 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -542,8 +542,8 @@ static int arena_clear_freelist_error(struct arena_info *arena, u32 lane)
 static int btt_freelist_init(struct arena_info *arena)
 {
 	int new, ret;
-	u32 i, map_entry;
 	struct log_entry log_new;
+	u32 i, map_entry, log_oldmap, log_newmap;
 
 	arena->freelist = kcalloc(arena->nfree, sizeof(struct free_entry),
 					GFP_KERNEL);
@@ -555,16 +555,21 @@ static int btt_freelist_init(struct arena_info *arena)
 		if (new < 0)
 			return new;
 
+		/* old and new map entries with any flags stripped out */
+		log_oldmap = ent_lba(le32_to_cpu(log_new.old_map));
+		log_newmap = ent_lba(le32_to_cpu(log_new.new_map));
+
 		/* sub points to the next one to be overwritten */
 		arena->freelist[i].sub = 1 - new;
 		arena->freelist[i].seq = nd_inc_seq(le32_to_cpu(log_new.seq));
-		arena->freelist[i].block = le32_to_cpu(log_new.old_map);
+		arena->freelist[i].block = log_oldmap;
 
 		/*
 		 * FIXME: if error clearing fails during init, we want to make
 		 * the BTT read-only
 		 */
 		if (ent_e_flag(log_new.old_map)) {
+			set_e_flag(arena->freelist[i].block);
 			ret = arena_clear_freelist_error(arena, i);
 			if (ret)
 				dev_err_ratelimited(to_dev(arena),
@@ -572,7 +577,7 @@ static int btt_freelist_init(struct arena_info *arena)
 		}
 
 		/* This implies a newly created or untouched flog entry */
-		if (log_new.old_map == log_new.new_map)
+		if (log_oldmap == log_newmap)
 			continue;
 
 		/* Check if map recovery is needed */
@@ -580,8 +585,15 @@ static int btt_freelist_init(struct arena_info *arena)
 				NULL, NULL, 0);
 		if (ret)
 			return ret;
-		if ((le32_to_cpu(log_new.new_map) != map_entry) &&
-				(le32_to_cpu(log_new.old_map) == map_entry)) {
+
+		/*
+		 * The map_entry from btt_read_map is stripped of any flag bits,
+		 * so use the stripped out versions from the log as well for
+		 * testing whether recovery is needed. For restoration, use the
+		 * 'raw' version of the log entries as that captured what we
+		 * were going to write originally.
+		 */
+		if ((log_newmap != map_entry) && (log_oldmap == map_entry)) {
 			/*
 			 * Last transaction wrote the flog, but wasn't able
 			 * to complete the map write. So fix up the map.
-- 
2.20.1

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] nvdimm, btt: fix LBA masking during 'free list' population
  2019-02-25 22:43 ` [PATCH 2/2] nvdimm, btt: fix LBA masking during 'free list' population Vishal Verma
@ 2019-02-25 22:48   ` Verma, Vishal L
  2019-02-26 22:52     ` Verma, Vishal L
  0 siblings, 1 reply; 10+ messages in thread
From: Verma, Vishal L @ 2019-02-25 22:48 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: pbarbuda


On Mon, 2019-02-25 at 15:43 -0700, Vishal Verma wrote:
> The Linux BTT implementation assumes that log entries will never have
> the 'zero' flag set, and indeed it never sets that flag for log entries
> itself.
> 
> However, the UEFI spec is ambiguous on the exact format of the LBA field
> of a log entry, specifically as to whether it should include the
> additional flag bits or not. While a zero bit doesn't make sense in the
> context of a log entry, other BTT implementations might still have it set.
> 
> If an implementation does happen to have it set, we would happily read
> it in as the next block to write to for writes. Since a high bit is set,
> it pushes the block number out of the range of an 'arena', and we fail
> such a write with an EIO.
> 
> Follow the robustness principle, and tolerate such implementations by
> stripping out the zero flag when populating the free list during
> initialization. Additionally, use the same stripped out entries for
> detection of incomplete writes and map restoration that happens at this
> stage.
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Reported-by: Dexuan Cui <decui@microsoft.com>
> Reported-by: Pedro d'Aquino Filocre F S Barbuda <pbarbuda@microsoft.com>
> Tested-by: Dexuan Cui <decui@microsoft.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---

Forgot to include a change description:

v1 -> v2:

- Add a patch to remove unused code getting the old log entry
- Also use the stripped out versions of the log entries when testing for
incomplete writes.

>  drivers/nvdimm/btt.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index cd4fa87ea48c..294c48e45e74 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -542,8 +542,8 @@ static int arena_clear_freelist_error(struct arena_info *arena, u32 lane)
>  static int btt_freelist_init(struct arena_info *arena)
>  {
>  	int new, ret;
> -	u32 i, map_entry;
>  	struct log_entry log_new;
> +	u32 i, map_entry, log_oldmap, log_newmap;
>  
>  	arena->freelist = kcalloc(arena->nfree, sizeof(struct free_entry),
>  					GFP_KERNEL);
> @@ -555,16 +555,21 @@ static int btt_freelist_init(struct arena_info *arena)
>  		if (new < 0)
>  			return new;
>  
> +		/* old and new map entries with any flags stripped out */
> +		log_oldmap = ent_lba(le32_to_cpu(log_new.old_map));
> +		log_newmap = ent_lba(le32_to_cpu(log_new.new_map));
> +
>  		/* sub points to the next one to be overwritten */
>  		arena->freelist[i].sub = 1 - new;
>  		arena->freelist[i].seq = nd_inc_seq(le32_to_cpu(log_new.seq));
> -		arena->freelist[i].block = le32_to_cpu(log_new.old_map);
> +		arena->freelist[i].block = log_oldmap;
>  
>  		/*
>  		 * FIXME: if error clearing fails during init, we want to make
>  		 * the BTT read-only
>  		 */
>  		if (ent_e_flag(log_new.old_map)) {
> +			set_e_flag(arena->freelist[i].block);
>  			ret = arena_clear_freelist_error(arena, i);
>  			if (ret)
>  				dev_err_ratelimited(to_dev(arena),
> @@ -572,7 +577,7 @@ static int btt_freelist_init(struct arena_info *arena)
>  		}
>  
>  		/* This implies a newly created or untouched flog entry */
> -		if (log_new.old_map == log_new.new_map)
> +		if (log_oldmap == log_newmap)
>  			continue;
>  
>  		/* Check if map recovery is needed */
> @@ -580,8 +585,15 @@ static int btt_freelist_init(struct arena_info *arena)
>  				NULL, NULL, 0);
>  		if (ret)
>  			return ret;
> -		if ((le32_to_cpu(log_new.new_map) != map_entry) &&
> -				(le32_to_cpu(log_new.old_map) == map_entry)) {
> +
> +		/*
> +		 * The map_entry from btt_read_map is stripped of any flag bits,
> +		 * so use the stripped out versions from the log as well for
> +		 * testing whether recovery is needed. For restoration, use the
> +		 * 'raw' version of the log entries as that captured what we
> +		 * were going to write originally.
> +		 */
> +		if ((log_newmap != map_entry) && (log_oldmap == map_entry)) {
>  			/*
>  			 * Last transaction wrote the flog, but wasn't able
>  			 * to complete the map write. So fix up the map.

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] nvdimm, btt: fix LBA masking during 'free list' population
  2019-02-25 22:48   ` Verma, Vishal L
@ 2019-02-26 22:52     ` Verma, Vishal L
  2019-02-26 23:03       ` Dan Williams
  0 siblings, 1 reply; 10+ messages in thread
From: Verma, Vishal L @ 2019-02-26 22:52 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: pbarbuda

On Mon, 2019-02-25 at 22:48 +0000, Verma, Vishal L wrote:
> On Mon, 2019-02-25 at 15:43 -0700, Vishal Verma wrote:
> > The Linux BTT implementation assumes that log entries will never have
> > the 'zero' flag set, and indeed it never sets that flag for log entries
> > itself.
> > 
> > However, the UEFI spec is ambiguous on the exact format of the LBA field
> > of a log entry, specifically as to whether it should include the
> > additional flag bits or not. While a zero bit doesn't make sense in the
> > context of a log entry, other BTT implementations might still have it set.
> > 
> > If an implementation does happen to have it set, we would happily read
> > it in as the next block to write to for writes. Since a high bit is set,
> > it pushes the block number out of the range of an 'arena', and we fail
> > such a write with an EIO.
> > 
> > Follow the robustness principle, and tolerate such implementations by
> > stripping out the zero flag when populating the free list during
> > initialization. Additionally, use the same stripped out entries for
> > detection of incomplete writes and map restoration that happens at this
> > stage.
> > 
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Reported-by: Dexuan Cui <decui@microsoft.com>
> > Reported-by: Pedro d'Aquino Filocre F S Barbuda <pbarbuda@microsoft.com>
> > Tested-by: Dexuan Cui <decui@microsoft.com>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> 
> Forgot to include a change description:
> 
> v1 -> v2:
> 
> - Add a patch to remove unused code getting the old log entry
> - Also use the stripped out versions of the log entries when testing for
> incomplete writes.
> 
v3: Add a sysfs file to indicate that the BTT is capable of handling
  layouts that have zero flags set. This allows userspace tooling such
  as 'ndctl check-namespace' perform repairs if needed based on kernel
  support, and without relying on kernel version numbers.

8<-----


>From cc2a015dafd880f9419911079634af1a19d2eb94 Mon Sep 17 00:00:00 2001
From: Vishal Verma <vishal.l.verma@intel.com>
Date: Mon, 25 Feb 2019 12:00:32 -0700
Subject: [PATCH v3] nvdimm, btt: fix LBA masking during 'free list' population

The Linux BTT implementation assumes that log entries will never have
the 'zero' flag set, and indeed it never sets that flag for log entries
itself.

However, the UEFI spec is ambiguous on the exact format of the LBA field
of a log entry, specifically as to whether it should include the
additional flag bits or not. While a zero bit doesn't make sense in the
context of a log entry, other BTT implementations might still have it set.

If an implementation does happen to have it set, we would happily read
it in as the next block to write to for writes. Since a high bit is set,
it pushes the block number out of the range of an 'arena', and we fail
such a write with an EIO.

Follow the robustness principle, and tolerate such implementations by
stripping out the zero flag when populating the free list during
initialization. Additionally, use the same stripped out entries for
detection of incomplete writes and map restoration that happens at this
stage.

Add a sysfs file 'log_zero_flags' that indicates the ability to accept
such a layout to userspace applications. This enables 'ndctl
check-namespace' to recognize whether the kernel is able to handle zero
flags, or whether it should attempt a fix-up under the --repair option.

Cc: Dan Williams <dan.j.williams@intel.com>
Reported-by: Dexuan Cui <decui@microsoft.com>
Reported-by: Pedro d'Aquino Filocre F S Barbuda <pbarbuda@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c      | 22 +++++++++++++++++-----
 drivers/nvdimm/btt_devs.c |  8 ++++++++
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index cd4fa87ea48c..294c48e45e74 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -542,8 +542,8 @@ static int arena_clear_freelist_error(struct arena_info *arena, u32 lane)
 static int btt_freelist_init(struct arena_info *arena)
 {
 	int new, ret;
-	u32 i, map_entry;
 	struct log_entry log_new;
+	u32 i, map_entry, log_oldmap, log_newmap;
 
 	arena->freelist = kcalloc(arena->nfree, sizeof(struct free_entry),
 					GFP_KERNEL);
@@ -555,16 +555,21 @@ static int btt_freelist_init(struct arena_info *arena)
 		if (new < 0)
 			return new;
 
+		/* old and new map entries with any flags stripped out */
+		log_oldmap = ent_lba(le32_to_cpu(log_new.old_map));
+		log_newmap = ent_lba(le32_to_cpu(log_new.new_map));
+
 		/* sub points to the next one to be overwritten */
 		arena->freelist[i].sub = 1 - new;
 		arena->freelist[i].seq = nd_inc_seq(le32_to_cpu(log_new.seq));
-		arena->freelist[i].block = le32_to_cpu(log_new.old_map);
+		arena->freelist[i].block = log_oldmap;
 
 		/*
 		 * FIXME: if error clearing fails during init, we want to make
 		 * the BTT read-only
 		 */
 		if (ent_e_flag(log_new.old_map)) {
+			set_e_flag(arena->freelist[i].block);
 			ret = arena_clear_freelist_error(arena, i);
 			if (ret)
 				dev_err_ratelimited(to_dev(arena),
@@ -572,7 +577,7 @@ static int btt_freelist_init(struct arena_info *arena)
 		}
 
 		/* This implies a newly created or untouched flog entry */
-		if (log_new.old_map == log_new.new_map)
+		if (log_oldmap == log_newmap)
 			continue;
 
 		/* Check if map recovery is needed */
@@ -580,8 +585,15 @@ static int btt_freelist_init(struct arena_info *arena)
 				NULL, NULL, 0);
 		if (ret)
 			return ret;
-		if ((le32_to_cpu(log_new.new_map) != map_entry) &&
-				(le32_to_cpu(log_new.old_map) == map_entry)) {
+
+		/*
+		 * The map_entry from btt_read_map is stripped of any flag bits,
+		 * so use the stripped out versions from the log as well for
+		 * testing whether recovery is needed. For restoration, use the
+		 * 'raw' version of the log entries as that captured what we
+		 * were going to write originally.
+		 */
+		if ((log_newmap != map_entry) && (log_oldmap == map_entry)) {
 			/*
 			 * Last transaction wrote the flog, but wasn't able
 			 * to complete the map write. So fix up the map.
diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
index 795ad4ff35ca..90aad8af58e9 100644
--- a/drivers/nvdimm/btt_devs.c
+++ b/drivers/nvdimm/btt_devs.c
@@ -159,11 +159,19 @@ static ssize_t size_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(size);
 
+static ssize_t log_zero_flags_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	return 0;
+}
+static DEVICE_ATTR_RO(log_zero_flags);
+
 static struct attribute *nd_btt_attributes[] = {
 	&dev_attr_sector_size.attr,
 	&dev_attr_namespace.attr,
 	&dev_attr_uuid.attr,
 	&dev_attr_size.attr,
+	&dev_attr_log_zero_flags.attr,
 	NULL,
 };
 
-- 
2.20.1


_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] nvdimm, btt: fix LBA masking during 'free list' population
  2019-02-26 22:52     ` Verma, Vishal L
@ 2019-02-26 23:03       ` Dan Williams
  2019-02-26 23:13         ` [PATCH v4] " Vishal Verma
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2019-02-26 23:03 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: pbarbuda, linux-nvdimm

On Tue, Feb 26, 2019 at 2:52 PM Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
>
> On Mon, 2019-02-25 at 22:48 +0000, Verma, Vishal L wrote:
> > On Mon, 2019-02-25 at 15:43 -0700, Vishal Verma wrote:
> > > The Linux BTT implementation assumes that log entries will never have
> > > the 'zero' flag set, and indeed it never sets that flag for log entries
> > > itself.
> > >
> > > However, the UEFI spec is ambiguous on the exact format of the LBA field
> > > of a log entry, specifically as to whether it should include the
> > > additional flag bits or not. While a zero bit doesn't make sense in the
> > > context of a log entry, other BTT implementations might still have it set.
> > >
> > > If an implementation does happen to have it set, we would happily read
> > > it in as the next block to write to for writes. Since a high bit is set,
> > > it pushes the block number out of the range of an 'arena', and we fail
> > > such a write with an EIO.
> > >
> > > Follow the robustness principle, and tolerate such implementations by
> > > stripping out the zero flag when populating the free list during
> > > initialization. Additionally, use the same stripped out entries for
> > > detection of incomplete writes and map restoration that happens at this
> > > stage.
> > >
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Reported-by: Dexuan Cui <decui@microsoft.com>
> > > Reported-by: Pedro d'Aquino Filocre F S Barbuda <pbarbuda@microsoft.com>
> > > Tested-by: Dexuan Cui <decui@microsoft.com>
> > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > > ---
> >
> > Forgot to include a change description:
> >
> > v1 -> v2:
> >
> > - Add a patch to remove unused code getting the old log entry
> > - Also use the stripped out versions of the log entries when testing for
> > incomplete writes.
> >
> v3: Add a sysfs file to indicate that the BTT is capable of handling
>   layouts that have zero flags set. This allows userspace tooling such
>   as 'ndctl check-namespace' perform repairs if needed based on kernel
>   support, and without relying on kernel version numbers.
>
> 8<-----
>
>
> From cc2a015dafd880f9419911079634af1a19d2eb94 Mon Sep 17 00:00:00 2001
> From: Vishal Verma <vishal.l.verma@intel.com>
> Date: Mon, 25 Feb 2019 12:00:32 -0700
> Subject: [PATCH v3] nvdimm, btt: fix LBA masking during 'free list' population
>
> The Linux BTT implementation assumes that log entries will never have
> the 'zero' flag set, and indeed it never sets that flag for log entries
> itself.
>
> However, the UEFI spec is ambiguous on the exact format of the LBA field
> of a log entry, specifically as to whether it should include the
> additional flag bits or not. While a zero bit doesn't make sense in the
> context of a log entry, other BTT implementations might still have it set.
>
> If an implementation does happen to have it set, we would happily read
> it in as the next block to write to for writes. Since a high bit is set,
> it pushes the block number out of the range of an 'arena', and we fail
> such a write with an EIO.
>
> Follow the robustness principle, and tolerate such implementations by
> stripping out the zero flag when populating the free list during
> initialization. Additionally, use the same stripped out entries for
> detection of incomplete writes and map restoration that happens at this
> stage.
>
> Add a sysfs file 'log_zero_flags' that indicates the ability to accept
> such a layout to userspace applications. This enables 'ndctl
> check-namespace' to recognize whether the kernel is able to handle zero
> flags, or whether it should attempt a fix-up under the --repair option.
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Reported-by: Dexuan Cui <decui@microsoft.com>
> Reported-by: Pedro d'Aquino Filocre F S Barbuda <pbarbuda@microsoft.com>
> Tested-by: Dexuan Cui <decui@microsoft.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  drivers/nvdimm/btt.c      | 22 +++++++++++++++++-----
>  drivers/nvdimm/btt_devs.c |  8 ++++++++
>  2 files changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index cd4fa87ea48c..294c48e45e74 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -542,8 +542,8 @@ static int arena_clear_freelist_error(struct arena_info *arena, u32 lane)
>  static int btt_freelist_init(struct arena_info *arena)
>  {
>         int new, ret;
> -       u32 i, map_entry;
>         struct log_entry log_new;
> +       u32 i, map_entry, log_oldmap, log_newmap;
>
>         arena->freelist = kcalloc(arena->nfree, sizeof(struct free_entry),
>                                         GFP_KERNEL);
> @@ -555,16 +555,21 @@ static int btt_freelist_init(struct arena_info *arena)
>                 if (new < 0)
>                         return new;
>
> +               /* old and new map entries with any flags stripped out */
> +               log_oldmap = ent_lba(le32_to_cpu(log_new.old_map));
> +               log_newmap = ent_lba(le32_to_cpu(log_new.new_map));
> +
>                 /* sub points to the next one to be overwritten */
>                 arena->freelist[i].sub = 1 - new;
>                 arena->freelist[i].seq = nd_inc_seq(le32_to_cpu(log_new.seq));
> -               arena->freelist[i].block = le32_to_cpu(log_new.old_map);
> +               arena->freelist[i].block = log_oldmap;
>
>                 /*
>                  * FIXME: if error clearing fails during init, we want to make
>                  * the BTT read-only
>                  */
>                 if (ent_e_flag(log_new.old_map)) {
> +                       set_e_flag(arena->freelist[i].block);
>                         ret = arena_clear_freelist_error(arena, i);
>                         if (ret)
>                                 dev_err_ratelimited(to_dev(arena),
> @@ -572,7 +577,7 @@ static int btt_freelist_init(struct arena_info *arena)
>                 }
>
>                 /* This implies a newly created or untouched flog entry */
> -               if (log_new.old_map == log_new.new_map)
> +               if (log_oldmap == log_newmap)
>                         continue;
>
>                 /* Check if map recovery is needed */
> @@ -580,8 +585,15 @@ static int btt_freelist_init(struct arena_info *arena)
>                                 NULL, NULL, 0);
>                 if (ret)
>                         return ret;
> -               if ((le32_to_cpu(log_new.new_map) != map_entry) &&
> -                               (le32_to_cpu(log_new.old_map) == map_entry)) {
> +
> +               /*
> +                * The map_entry from btt_read_map is stripped of any flag bits,
> +                * so use the stripped out versions from the log as well for
> +                * testing whether recovery is needed. For restoration, use the
> +                * 'raw' version of the log entries as that captured what we
> +                * were going to write originally.
> +                */
> +               if ((log_newmap != map_entry) && (log_oldmap == map_entry)) {
>                         /*
>                          * Last transaction wrote the flog, but wasn't able
>                          * to complete the map write. So fix up the map.
> diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
> index 795ad4ff35ca..90aad8af58e9 100644
> --- a/drivers/nvdimm/btt_devs.c
> +++ b/drivers/nvdimm/btt_devs.c
> @@ -159,11 +159,19 @@ static ssize_t size_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(size);
>
> +static ssize_t log_zero_flags_show(struct device *dev,
> +               struct device_attribute *attr, char *buf)
> +{
> +       return 0;

Lets do:

    return sprintf("Y\n");

...to match boolean flag attributes (see param_get_bool()).
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4] nvdimm, btt: fix LBA masking during 'free list' population
  2019-02-26 23:03       ` Dan Williams
@ 2019-02-26 23:13         ` Vishal Verma
       [not found]           ` <20190226231331.8730-1-vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Vishal Verma @ 2019-02-26 23:13 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: pbarbuda

The Linux BTT implementation assumes that log entries will never have
the 'zero' flag set, and indeed it never sets that flag for log entries
itself.

However, the UEFI spec is ambiguous on the exact format of the LBA field
of a log entry, specifically as to whether it should include the
additional flag bits or not. While a zero bit doesn't make sense in the
context of a log entry, other BTT implementations might still have it set.

If an implementation does happen to have it set, we would happily read
it in as the next block to write to for writes. Since a high bit is set,
it pushes the block number out of the range of an 'arena', and we fail
such a write with an EIO.

Follow the robustness principle, and tolerate such implementations by
stripping out the zero flag when populating the free list during
initialization. Additionally, use the same stripped out entries for
detection of incomplete writes and map restoration that happens at this
stage.

Add a sysfs file 'log_zero_flags' that indicates the ability to accept
such a layout to userspace applications. This enables 'ndctl
check-namespace' to recognize whether the kernel is able to handle zero
flags, or whether it should attempt a fix-up under the --repair option.

Cc: Dan Williams <dan.j.williams@intel.com>
Reported-by: Dexuan Cui <decui@microsoft.com>
Reported-by: Pedro d'Aquino Filocre F S Barbuda <pbarbuda@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---

v4: Have the sysfs attribute read a 'Y' to match boolean flag attributes

 drivers/nvdimm/btt.c      | 22 +++++++++++++++++-----
 drivers/nvdimm/btt_devs.c |  8 ++++++++
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index cd4fa87ea48c..294c48e45e74 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -542,8 +542,8 @@ static int arena_clear_freelist_error(struct arena_info *arena, u32 lane)
 static int btt_freelist_init(struct arena_info *arena)
 {
 	int new, ret;
-	u32 i, map_entry;
 	struct log_entry log_new;
+	u32 i, map_entry, log_oldmap, log_newmap;
 
 	arena->freelist = kcalloc(arena->nfree, sizeof(struct free_entry),
 					GFP_KERNEL);
@@ -555,16 +555,21 @@ static int btt_freelist_init(struct arena_info *arena)
 		if (new < 0)
 			return new;
 
+		/* old and new map entries with any flags stripped out */
+		log_oldmap = ent_lba(le32_to_cpu(log_new.old_map));
+		log_newmap = ent_lba(le32_to_cpu(log_new.new_map));
+
 		/* sub points to the next one to be overwritten */
 		arena->freelist[i].sub = 1 - new;
 		arena->freelist[i].seq = nd_inc_seq(le32_to_cpu(log_new.seq));
-		arena->freelist[i].block = le32_to_cpu(log_new.old_map);
+		arena->freelist[i].block = log_oldmap;
 
 		/*
 		 * FIXME: if error clearing fails during init, we want to make
 		 * the BTT read-only
 		 */
 		if (ent_e_flag(log_new.old_map)) {
+			set_e_flag(arena->freelist[i].block);
 			ret = arena_clear_freelist_error(arena, i);
 			if (ret)
 				dev_err_ratelimited(to_dev(arena),
@@ -572,7 +577,7 @@ static int btt_freelist_init(struct arena_info *arena)
 		}
 
 		/* This implies a newly created or untouched flog entry */
-		if (log_new.old_map == log_new.new_map)
+		if (log_oldmap == log_newmap)
 			continue;
 
 		/* Check if map recovery is needed */
@@ -580,8 +585,15 @@ static int btt_freelist_init(struct arena_info *arena)
 				NULL, NULL, 0);
 		if (ret)
 			return ret;
-		if ((le32_to_cpu(log_new.new_map) != map_entry) &&
-				(le32_to_cpu(log_new.old_map) == map_entry)) {
+
+		/*
+		 * The map_entry from btt_read_map is stripped of any flag bits,
+		 * so use the stripped out versions from the log as well for
+		 * testing whether recovery is needed. For restoration, use the
+		 * 'raw' version of the log entries as that captured what we
+		 * were going to write originally.
+		 */
+		if ((log_newmap != map_entry) && (log_oldmap == map_entry)) {
 			/*
 			 * Last transaction wrote the flog, but wasn't able
 			 * to complete the map write. So fix up the map.
diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
index 795ad4ff35ca..b72a303176c7 100644
--- a/drivers/nvdimm/btt_devs.c
+++ b/drivers/nvdimm/btt_devs.c
@@ -159,11 +159,19 @@ static ssize_t size_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(size);
 
+static ssize_t log_zero_flags_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "Y\n");
+}
+static DEVICE_ATTR_RO(log_zero_flags);
+
 static struct attribute *nd_btt_attributes[] = {
 	&dev_attr_sector_size.attr,
 	&dev_attr_namespace.attr,
 	&dev_attr_uuid.attr,
 	&dev_attr_size.attr,
+	&dev_attr_log_zero_flags.attr,
 	NULL,
 };
 
-- 
2.20.1

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: [PATCH v4] nvdimm, btt: fix LBA masking during 'free list' population
       [not found]           ` <20190226231331.8730-1-vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2019-02-27  5:55             ` Dexuan Cui
  2019-02-27 18:12               ` Verma, Vishal L
  0 siblings, 1 reply; 10+ messages in thread
From: Dexuan Cui @ 2019-02-27  5:55 UTC (permalink / raw)
  To: Vishal Verma, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
  Cc: Pedro d'Aquino Filocre F S Barbuda, Michael Kelley

> From: Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Sent: Tuesday, February 26, 2019 3:14 PM
> ...
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> @@ -555,16 +555,21 @@ static int btt_freelist_init(struct arena_info *arena)
>  		if (new < 0)
>  			return new;
> 
> +		/* old and new map entries with any flags stripped out */
> +		log_oldmap = ent_lba(le32_to_cpu(log_new.old_map));
> +		log_newmap = ent_lba(le32_to_cpu(log_new.new_map));
> +
>  		/* sub points to the next one to be overwritten */
>  		arena->freelist[i].sub = 1 - new;
>  		arena->freelist[i].seq = nd_inc_seq(le32_to_cpu(log_new.seq));
> -		arena->freelist[i].block = le32_to_cpu(log_new.old_map);
> +		arena->freelist[i].block = log_oldmap;
> 
>  		/*
>  		 * FIXME: if error clearing fails during init, we want to make
>  		 * the BTT read-only
>  		 */
>  		if (ent_e_flag(log_new.old_map)) {
> +			set_e_flag(arena->freelist[i].block);
Hi Vishal,
The logic doesn't look quite right to me: as I understand, if both the zero flag and
the error flag are set, it means a normal map entry rather than an error.

Windows can indeed set both flags: 
[    3.756239] freelist_init: i=0, log_old: lba=bcc01, old_map=c00bcc30, new_map=c00bcc02, seq=2
[    3.765583] freelist_init: i=0, log_new: lba=bcc00, old_map=c0001b7b, new_map=c00bcc30, seq=3
(For the full log, see https://github.com/dcui/linux/commit/b472968e01d72be3bd42e4568befc4602f9ac396 )

Due to this new line, the 'block' can exceed the normal internal_nlba, and cause I/O failure:

[  103.589766] btt_write_pg failed: lane=0x0, new_postmap=0x40001b7b, internal_nlba=0x1ff8018
[  103.602387] nd_pmem btt1.1: io error in WRITE sector 33056, len 4096,
[  103.611662] Buffer I/O error on dev pmem1s2, logical block 36, lost async page write 

I suppose the new line should not be added, and the line 
  		if (ent_e_flag(log_new.old_map)) {
should be changed to
  		if (ent_e_flag(log_new.old_map) && ! ent_z_flag(log_new.old_map)) {
?

>  			ret = arena_clear_freelist_error(arena, i);
>  			if (ret)
>  				dev_err_ratelimited(to_dev(arena),

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] nvdimm, btt: fix LBA masking during 'free list' population
  2019-02-27  5:55             ` Dexuan Cui
@ 2019-02-27 18:12               ` Verma, Vishal L
       [not found]                 ` <34fc552cb992d62e14b4edda8edc1534b32277b4.camel-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Verma, Vishal L @ 2019-02-27 18:12 UTC (permalink / raw)
  To: decui, linux-nvdimm; +Cc: pbarbuda, mikelley

On Wed, 2019-02-27 at 05:55 +0000, Dexuan Cui wrote:
> > From: Vishal Verma <vishal.l.verma@intel.com>
> > Sent: Tuesday, February 26, 2019 3:14 PM
> > ...
> > diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> > @@ -555,16 +555,21 @@ static int btt_freelist_init(struct arena_info
*arena)
> >  		if (new < 0)
> >  			return new;
> > 
> > +		/* old and new map entries with any flags stripped out
*/
> > +		log_oldmap = ent_lba(le32_to_cpu(log_new.old_map));
> > +		log_newmap = ent_lba(le32_to_cpu(log_new.new_map));
> > +
> >  		/* sub points to the next one to be overwritten */
> >  		arena->freelist[i].sub = 1 - new;
> >  		arena->freelist[i].seq =
nd_inc_seq(le32_to_cpu(log_new.seq));
> > -		arena->freelist[i].block = le32_to_cpu(log_new.old_map);
> > +		arena->freelist[i].block = log_oldmap;
> > 
> >  		/*
> >  		 * FIXME: if error clearing fails during init, we want
to make
> >  		 * the BTT read-only
> >  		 */
> >  		if (ent_e_flag(log_new.old_map)) {
> > +			set_e_flag(arena->freelist[i].block);
> Hi Vishal,
> The logic doesn't look quite right to me: as I understand, if both the
zero flag and
> the error flag are set, it means a normal map entry rather than an
error.
> 
> Windows can indeed set both flags: 
> [    3.756239] freelist_init: i=0, log_old: lba=bcc01,
old_map=c00bcc30, new_map=c00bcc02, seq=2
> [    3.765583] freelist_init: i=0, log_new: lba=bcc00,
old_map=c0001b7b, new_map=c00bcc30, seq=3
> (For the full log, see 
https://github.com/dcui/linux/commit/b472968e01d72be3bd42e4568befc4602f9ac396
)
> 
> Due to this new line, the 'block' can exceed the normal internal_nlba,
and cause I/O failure:
> 
> [  103.589766] btt_write_pg failed: lane=0x0, new_postmap=0x40001b7b,
internal_nlba=0x1ff8018
> [  103.602387] nd_pmem btt1.1: io error in WRITE sector 33056, len
4096,
> [  103.611662] Buffer I/O error on dev pmem1s2, logical block 36, lost
async page write 
> 
> I suppose the new line should not be added, and the line 
>   		if (ent_e_flag(log_new.old_map)) {
> should be changed to
>   		if (ent_e_flag(log_new.old_map) && !
ent_z_flag(log_new.old_map)) {
> ?
> 
Ah yes good catch, I broke my own rule, in that freelist[i].block should
not have flag bits since it is used as is. The freelist has a separate
has_err field to allow for error clearing, and we should be setting
that.

Does the following incremental patch fix it? Let me know and I can send
a new version including it.

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 294c48e45e74..1e7c1a66cef8 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -569,7 +569,7 @@ static int btt_freelist_init(struct arena_info
*arena)
 		 * the BTT read-only
 		 */
 		if (ent_e_flag(log_new.old_map)) {
-			set_e_flag(arena->freelist[i].block);
+			arena->freelist[i].has_err = 1;
 			ret = arena_clear_freelist_error(arena, i);
 			if (ret)
 				dev_err_ratelimited(to_dev(arena),

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: [PATCH v4] nvdimm, btt: fix LBA masking during 'free list' population
       [not found]                 ` <34fc552cb992d62e14b4edda8edc1534b32277b4.camel-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2019-02-27 21:06                   ` Dexuan Cui
  2019-02-27 22:51                     ` Verma, Vishal L
  0 siblings, 1 reply; 10+ messages in thread
From: Dexuan Cui @ 2019-02-27 21:06 UTC (permalink / raw)
  To: Verma, Vishal L, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
  Cc: Pedro d'Aquino Filocre F S Barbuda, Michael Kelley

> From: Verma, Vishal L <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Sent: Wednesday, February 27, 2019 10:13 AM
> ...
> > I suppose the new line should not be added, and the line
> >   		if (ent_e_flag(log_new.old_map)) {
> > should be changed to
> >   		if (ent_e_flag(log_new.old_map) && !
> ent_z_flag(log_new.old_map)) {
> > ?
> >
> Ah yes good catch, I broke my own rule, in that freelist[i].block should
> not have flag bits since it is used as is. The freelist has a separate
> has_err field to allow for error clearing, and we should be setting
> that.
> 
> Does the following incremental patch fix it? Let me know and I can send
> a new version including it.

Yes, the patch works for me. Thank you!

But, should we really set the has_err field? Here the enry has both
the zere/error flags set. As I understand, this is not an error, because
the UEFI spec says "... both Error and Zero are set to indicate a map entry
in its normal, non-error state".
 
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index 294c48e45e74..1e7c1a66cef8 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -569,7 +569,7 @@ static int btt_freelist_init(struct arena_info
> *arena)
>  		 * the BTT read-only
>  		 */
>  		if (ent_e_flag(log_new.old_map)) {
> -			set_e_flag(arena->freelist[i].block);
> +			arena->freelist[i].has_err = 1;
>  			ret = arena_clear_freelist_error(arena, i);
>  			if (ret)
>  				dev_err_ratelimited(to_dev(arena),


Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] nvdimm, btt: fix LBA masking during 'free list' population
  2019-02-27 21:06                   ` Dexuan Cui
@ 2019-02-27 22:51                     ` Verma, Vishal L
  0 siblings, 0 replies; 10+ messages in thread
From: Verma, Vishal L @ 2019-02-27 22:51 UTC (permalink / raw)
  To: decui, linux-nvdimm; +Cc: pbarbuda, mikelley


On Wed, 2019-02-27 at 21:06 +0000, Dexuan Cui wrote:
> > From: Verma, Vishal L <vishal.l.verma@intel.com>
> > Sent: Wednesday, February 27, 2019 10:13 AM
> > ...
> > > I suppose the new line should not be added, and the line
> > >   		if (ent_e_flag(log_new.old_map)) {
> > > should be changed to
> > >   		if (ent_e_flag(log_new.old_map) && !
> > ent_z_flag(log_new.old_map)) {
> > > ?
> > > 
> > Ah yes good catch, I broke my own rule, in that freelist[i].block should
> > not have flag bits since it is used as is. The freelist has a separate
> > has_err field to allow for error clearing, and we should be setting
> > that.
> > 
> > Does the following incremental patch fix it? Let me know and I can send
> > a new version including it.
> 
> Yes, the patch works for me. Thank you!
> 
> But, should we really set the has_err field? Here the enry has both
> the zere/error flags set. As I understand, this is not an error, because
> the UEFI spec says "... both Error and Zero are set to indicate a map entry
> in its normal, non-error state".
>  
Ah yes I see what you mean - we'd be pointlessly trying to clear those
blocks. I'll fix this in v5.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-02-27 22:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-25 22:43 [PATCH 1/2] nvdimm, btt: remove unnecessary code in btt_freelist_init Vishal Verma
2019-02-25 22:43 ` [PATCH 2/2] nvdimm, btt: fix LBA masking during 'free list' population Vishal Verma
2019-02-25 22:48   ` Verma, Vishal L
2019-02-26 22:52     ` Verma, Vishal L
2019-02-26 23:03       ` Dan Williams
2019-02-26 23:13         ` [PATCH v4] " Vishal Verma
     [not found]           ` <20190226231331.8730-1-vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2019-02-27  5:55             ` Dexuan Cui
2019-02-27 18:12               ` Verma, Vishal L
     [not found]                 ` <34fc552cb992d62e14b4edda8edc1534b32277b4.camel-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2019-02-27 21:06                   ` Dexuan Cui
2019-02-27 22:51                     ` Verma, Vishal L

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).