All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/7] BTT error clearing rework
@ 2017-08-09  0:07 ` Vishal Verma
  0 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Rafael J. Wysocki, linux-acpi

changes in v5:
 - Add patch 6 that refactors initial_offset calculations, and fix a bug
   that caused error clearing to be missed in some cases (Toshi)
   (I have a unit test for this that is mostly ready, but it depends
   in a better error injection capability in nfit_test, so I will send
   it out once that is ready).

Changes in v4:
 - move the deadlock fix to before enabling the BTT error clear paths (Dan)
 - No need for an error lock per freelist entry, just have one per arena (Dan)

Changes in v3:
 - Change the dynamically allocated (during IO) zerobuf to the kernel's
   ZERO_PAGE for error clearing (patch 5) (Dan).
 - Move the NOIO fixes a level down into nvdimm_clear_poison since both
   btt and pmem poison clearing goes through that (Dan).

Changes in v2:
 - Drop the ACPI allocation change patch. Instead use
   memalloc_noio_{save,restore} to set the GFP_NOIO flag around anything
   that can be expected to call into ACPI for clearing errors. (Rafael, Dan).

Clearing errors or badblocks during a BTT write requires sending an ACPI
DSM, which means potentially sleeping. Since a BTT IO happens in atomic
context (preemption disabled, spinlocks may be held), we cannot perform
error clearing in the course of an IO. Due to this error clearing for
BTT IOs has hitherto been disabled.

This series fixes these problems by moving the error clearing out of
the atomic sections in the BTT.

Also fix a potential deadlock that can occur while clearing errors
from either BTT or pmem due to memory allocations in the IO path.



Vishal Verma (7):
  btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
  btt: refactor map entry operations with macros
  btt: ensure that flags were also unchanged during a map_read
  btt: cache sector_size in arena_info
  libnvdimm: fix potential deadlock while clearing errors
  libnvdimm, btt: refactor initial_offset calculations
  libnvdimm, btt: rework error clearing

 drivers/nvdimm/btt.c   | 181 +++++++++++++++++++++++++++++++++++++------------
 drivers/nvdimm/btt.h   |  11 +++
 drivers/nvdimm/bus.c   |   6 ++
 drivers/nvdimm/claim.c |   9 +--
 4 files changed, 157 insertions(+), 50 deletions(-)

-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v5 0/7] BTT error clearing rework
@ 2017-08-09  0:07 ` Vishal Verma
  0 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: linux-acpi, Dan Williams, Jeff Moyer, Rafael J. Wysocki,
	Toshi Kani, Vishal Verma

changes in v5:
 - Add patch 6 that refactors initial_offset calculations, and fix a bug
   that caused error clearing to be missed in some cases (Toshi)
   (I have a unit test for this that is mostly ready, but it depends
   in a better error injection capability in nfit_test, so I will send
   it out once that is ready).

Changes in v4:
 - move the deadlock fix to before enabling the BTT error clear paths (Dan)
 - No need for an error lock per freelist entry, just have one per arena (Dan)

Changes in v3:
 - Change the dynamically allocated (during IO) zerobuf to the kernel's
   ZERO_PAGE for error clearing (patch 5) (Dan).
 - Move the NOIO fixes a level down into nvdimm_clear_poison since both
   btt and pmem poison clearing goes through that (Dan).

Changes in v2:
 - Drop the ACPI allocation change patch. Instead use
   memalloc_noio_{save,restore} to set the GFP_NOIO flag around anything
   that can be expected to call into ACPI for clearing errors. (Rafael, Dan).

Clearing errors or badblocks during a BTT write requires sending an ACPI
DSM, which means potentially sleeping. Since a BTT IO happens in atomic
context (preemption disabled, spinlocks may be held), we cannot perform
error clearing in the course of an IO. Due to this error clearing for
BTT IOs has hitherto been disabled.

This series fixes these problems by moving the error clearing out of
the atomic sections in the BTT.

Also fix a potential deadlock that can occur while clearing errors
from either BTT or pmem due to memory allocations in the IO path.



Vishal Verma (7):
  btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
  btt: refactor map entry operations with macros
  btt: ensure that flags were also unchanged during a map_read
  btt: cache sector_size in arena_info
  libnvdimm: fix potential deadlock while clearing errors
  libnvdimm, btt: refactor initial_offset calculations
  libnvdimm, btt: rework error clearing

 drivers/nvdimm/btt.c   | 181 +++++++++++++++++++++++++++++++++++++------------
 drivers/nvdimm/btt.h   |  11 +++
 drivers/nvdimm/bus.c   |   6 ++
 drivers/nvdimm/claim.c |   9 +--
 4 files changed, 157 insertions(+), 50 deletions(-)

-- 
2.9.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v5 1/7] btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
  2017-08-09  0:07 ` Vishal Verma
@ 2017-08-09  0:07   ` Vishal Verma
  -1 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Rafael J. Wysocki, linux-acpi

The IO context conversion for rw_bytes missed a case in the BTT write
path (btt_map_write) which should've been marked as atomic.

In reality this should not cause a problem, because map writes are to
small for nsio_rw_bytes to attempt error clearing, but it should be
fixed for posterity.

Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that
things like the nfit unit tests, which don't actually sleep, can catch
bugs like this.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c   | 3 ++-
 drivers/nvdimm/claim.c | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 14323fa..f6ea82f 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1156,7 +1156,8 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 		if (ret)
 			goto out_map;
 
-		ret = btt_map_write(arena, premap, new_postmap, 0, 0, 0);
+		ret = btt_map_write(arena, premap, new_postmap, 0, 0,
+			NVDIMM_IO_ATOMIC);
 		if (ret)
 			goto out_map;
 
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 4777046..3e6404f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -292,6 +292,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 				&& !(flags & NVDIMM_IO_ATOMIC)) {
 			long cleared;
 
+			might_sleep();
 			cleared = nvdimm_clear_poison(&ndns->dev,
 					nsio->res.start + offset, size);
 			if (cleared < size)
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 1/7] btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
@ 2017-08-09  0:07   ` Vishal Verma
  0 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: linux-acpi, Dan Williams, Jeff Moyer, Rafael J. Wysocki,
	Toshi Kani, Vishal Verma

The IO context conversion for rw_bytes missed a case in the BTT write
path (btt_map_write) which should've been marked as atomic.

In reality this should not cause a problem, because map writes are to
small for nsio_rw_bytes to attempt error clearing, but it should be
fixed for posterity.

Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that
things like the nfit unit tests, which don't actually sleep, can catch
bugs like this.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c   | 3 ++-
 drivers/nvdimm/claim.c | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 14323fa..f6ea82f 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1156,7 +1156,8 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 		if (ret)
 			goto out_map;
 
-		ret = btt_map_write(arena, premap, new_postmap, 0, 0, 0);
+		ret = btt_map_write(arena, premap, new_postmap, 0, 0,
+			NVDIMM_IO_ATOMIC);
 		if (ret)
 			goto out_map;
 
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 4777046..3e6404f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -292,6 +292,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 				&& !(flags & NVDIMM_IO_ATOMIC)) {
 			long cleared;
 
+			might_sleep();
 			cleared = nvdimm_clear_poison(&ndns->dev,
 					nsio->res.start + offset, size);
 			if (cleared < size)
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 2/7] btt: refactor map entry operations with macros
  2017-08-09  0:07 ` Vishal Verma
@ 2017-08-09  0:07   ` Vishal Verma
  -1 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Rafael J. Wysocki, linux-acpi

Add helpers for converting a raw map entry to just the block number, or
either of the 'e' or 'z' flags in preparation for actually using the
error flag to mark blocks with media errors.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c | 8 ++++----
 drivers/nvdimm/btt.h | 4 ++++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index f6ea82f..8842baa 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -106,7 +106,7 @@ static int btt_map_write(struct arena_info *arena, u32 lba, u32 mapping,
 	 * This 'mapping' is supposed to be just the LBA mapping, without
 	 * any flags set, so strip the flag bits.
 	 */
-	mapping &= MAP_LBA_MASK;
+	mapping = ent_lba(mapping);
 
 	ze = (z_flag << 1) + e_flag;
 	switch (ze) {
@@ -155,10 +155,10 @@ static int btt_map_read(struct arena_info *arena, u32 lba, u32 *mapping,
 
 	raw_mapping = le32_to_cpu(in);
 
-	z_flag = (raw_mapping & MAP_TRIM_MASK) >> MAP_TRIM_SHIFT;
-	e_flag = (raw_mapping & MAP_ERR_MASK) >> MAP_ERR_SHIFT;
+	z_flag = ent_z_flag(raw_mapping);
+	e_flag = ent_e_flag(raw_mapping);
 	ze = (z_flag << 1) + e_flag;
-	postmap = raw_mapping & MAP_LBA_MASK;
+	postmap = ent_lba(raw_mapping);
 
 	/* Reuse the {z,e}_flag variables for *trim and *error */
 	z_flag = 0;
diff --git a/drivers/nvdimm/btt.h b/drivers/nvdimm/btt.h
index 888e862..09fabf5 100644
--- a/drivers/nvdimm/btt.h
+++ b/drivers/nvdimm/btt.h
@@ -38,6 +38,10 @@
 #define IB_FLAG_ERROR 0x00000001
 #define IB_FLAG_ERROR_MASK 0x00000001
 
+#define ent_lba(ent) (ent & MAP_LBA_MASK)
+#define ent_e_flag(ent) (!!(ent & MAP_ERR_MASK))
+#define ent_z_flag(ent) (!!(ent & MAP_TRIM_MASK))
+
 enum btt_init_state {
 	INIT_UNCHECKED = 0,
 	INIT_NOTFOUND,
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 2/7] btt: refactor map entry operations with macros
@ 2017-08-09  0:07   ` Vishal Verma
  0 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: linux-acpi, Dan Williams, Jeff Moyer, Rafael J. Wysocki,
	Toshi Kani, Vishal Verma

Add helpers for converting a raw map entry to just the block number, or
either of the 'e' or 'z' flags in preparation for actually using the
error flag to mark blocks with media errors.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c | 8 ++++----
 drivers/nvdimm/btt.h | 4 ++++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index f6ea82f..8842baa 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -106,7 +106,7 @@ static int btt_map_write(struct arena_info *arena, u32 lba, u32 mapping,
 	 * This 'mapping' is supposed to be just the LBA mapping, without
 	 * any flags set, so strip the flag bits.
 	 */
-	mapping &= MAP_LBA_MASK;
+	mapping = ent_lba(mapping);
 
 	ze = (z_flag << 1) + e_flag;
 	switch (ze) {
@@ -155,10 +155,10 @@ static int btt_map_read(struct arena_info *arena, u32 lba, u32 *mapping,
 
 	raw_mapping = le32_to_cpu(in);
 
-	z_flag = (raw_mapping & MAP_TRIM_MASK) >> MAP_TRIM_SHIFT;
-	e_flag = (raw_mapping & MAP_ERR_MASK) >> MAP_ERR_SHIFT;
+	z_flag = ent_z_flag(raw_mapping);
+	e_flag = ent_e_flag(raw_mapping);
 	ze = (z_flag << 1) + e_flag;
-	postmap = raw_mapping & MAP_LBA_MASK;
+	postmap = ent_lba(raw_mapping);
 
 	/* Reuse the {z,e}_flag variables for *trim and *error */
 	z_flag = 0;
diff --git a/drivers/nvdimm/btt.h b/drivers/nvdimm/btt.h
index 888e862..09fabf5 100644
--- a/drivers/nvdimm/btt.h
+++ b/drivers/nvdimm/btt.h
@@ -38,6 +38,10 @@
 #define IB_FLAG_ERROR 0x00000001
 #define IB_FLAG_ERROR_MASK 0x00000001
 
+#define ent_lba(ent) (ent & MAP_LBA_MASK)
+#define ent_e_flag(ent) (!!(ent & MAP_ERR_MASK))
+#define ent_z_flag(ent) (!!(ent & MAP_TRIM_MASK))
+
 enum btt_init_state {
 	INIT_UNCHECKED = 0,
 	INIT_NOTFOUND,
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 3/7] btt: ensure that flags were also unchanged during a map_read
  2017-08-09  0:07 ` Vishal Verma
@ 2017-08-09  0:07   ` Vishal Verma
  -1 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Rafael J. Wysocki, linux-acpi

In btt_map_read, we read the map twice to make sure that the map entry
didn't change after we added it to the read tracking table. In
anticipation of expanding the use of the error bit, also make sure that
the error and zero flags are constant across the two map reads.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 8842baa..374ae62 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1032,6 +1032,7 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip,
 		 */
 		while (1) {
 			u32 new_map;
+			int new_t, new_e;
 
 			if (t_flag) {
 				zero_fill_data(page, off, cur_len);
@@ -1050,15 +1051,18 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip,
 			 */
 			barrier();
 
-			ret = btt_map_read(arena, premap, &new_map, &t_flag,
-						&e_flag, NVDIMM_IO_ATOMIC);
+			ret = btt_map_read(arena, premap, &new_map, &new_t,
+						&new_e, NVDIMM_IO_ATOMIC);
 			if (ret)
 				goto out_rtt;
 
-			if (postmap == new_map)
+			if ((postmap == new_map) && (t_flag == new_t) &&
+					(e_flag == new_e))
 				break;
 
 			postmap = new_map;
+			t_flag = new_t;
+			e_flag = new_e;
 		}
 
 		ret = btt_data_read(arena, page, off, postmap, cur_len);
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 3/7] btt: ensure that flags were also unchanged during a map_read
@ 2017-08-09  0:07   ` Vishal Verma
  0 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: linux-acpi, Dan Williams, Jeff Moyer, Rafael J. Wysocki,
	Toshi Kani, Vishal Verma

In btt_map_read, we read the map twice to make sure that the map entry
didn't change after we added it to the read tracking table. In
anticipation of expanding the use of the error bit, also make sure that
the error and zero flags are constant across the two map reads.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 8842baa..374ae62 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1032,6 +1032,7 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip,
 		 */
 		while (1) {
 			u32 new_map;
+			int new_t, new_e;
 
 			if (t_flag) {
 				zero_fill_data(page, off, cur_len);
@@ -1050,15 +1051,18 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip,
 			 */
 			barrier();
 
-			ret = btt_map_read(arena, premap, &new_map, &t_flag,
-						&e_flag, NVDIMM_IO_ATOMIC);
+			ret = btt_map_read(arena, premap, &new_map, &new_t,
+						&new_e, NVDIMM_IO_ATOMIC);
 			if (ret)
 				goto out_rtt;
 
-			if (postmap == new_map)
+			if ((postmap == new_map) && (t_flag == new_t) &&
+					(e_flag == new_e))
 				break;
 
 			postmap = new_map;
+			t_flag = new_t;
+			e_flag = new_e;
 		}
 
 		ret = btt_data_read(arena, page, off, postmap, cur_len);
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 4/7] btt: cache sector_size in arena_info
  2017-08-09  0:07 ` Vishal Verma
@ 2017-08-09  0:07   ` Vishal Verma
  -1 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Rafael J. Wysocki, linux-acpi

In preparation for the error clearing rework, add sector_size in the
arena_info struct.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c | 1 +
 drivers/nvdimm/btt.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 374ae62..e9dd651 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -566,6 +566,7 @@ static struct arena_info *alloc_arena(struct btt *btt, size_t size,
 	if (!arena)
 		return NULL;
 	arena->nd_btt = btt->nd_btt;
+	arena->sector_size = btt->sector_size;
 
 	if (!size)
 		return arena;
diff --git a/drivers/nvdimm/btt.h b/drivers/nvdimm/btt.h
index 09fabf5..2bc0d10b 100644
--- a/drivers/nvdimm/btt.h
+++ b/drivers/nvdimm/btt.h
@@ -108,6 +108,7 @@ struct aligned_lock {
  *			handle incoming writes.
  * @version_major:	Metadata layout version major.
  * @version_minor:	Metadata layout version minor.
+ * @sector_size:	The Linux sector size - 512 or 4096
  * @nextoff:		Offset in bytes to the start of the next arena.
  * @infooff:		Offset in bytes to the info block of this arena.
  * @dataoff:		Offset in bytes to the data area of this arena.
@@ -135,6 +136,7 @@ struct arena_info {
 	u32 nfree;
 	u16 version_major;
 	u16 version_minor;
+	u32 sector_size;
 	/* Byte offsets to the different on-media structures */
 	u64 nextoff;
 	u64 infooff;
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 4/7] btt: cache sector_size in arena_info
@ 2017-08-09  0:07   ` Vishal Verma
  0 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: linux-acpi, Dan Williams, Jeff Moyer, Rafael J. Wysocki,
	Toshi Kani, Vishal Verma

In preparation for the error clearing rework, add sector_size in the
arena_info struct.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c | 1 +
 drivers/nvdimm/btt.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 374ae62..e9dd651 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -566,6 +566,7 @@ static struct arena_info *alloc_arena(struct btt *btt, size_t size,
 	if (!arena)
 		return NULL;
 	arena->nd_btt = btt->nd_btt;
+	arena->sector_size = btt->sector_size;
 
 	if (!size)
 		return arena;
diff --git a/drivers/nvdimm/btt.h b/drivers/nvdimm/btt.h
index 09fabf5..2bc0d10b 100644
--- a/drivers/nvdimm/btt.h
+++ b/drivers/nvdimm/btt.h
@@ -108,6 +108,7 @@ struct aligned_lock {
  *			handle incoming writes.
  * @version_major:	Metadata layout version major.
  * @version_minor:	Metadata layout version minor.
+ * @sector_size:	The Linux sector size - 512 or 4096
  * @nextoff:		Offset in bytes to the start of the next arena.
  * @infooff:		Offset in bytes to the info block of this arena.
  * @dataoff:		Offset in bytes to the data area of this arena.
@@ -135,6 +136,7 @@ struct arena_info {
 	u32 nfree;
 	u16 version_major;
 	u16 version_minor;
+	u32 sector_size;
 	/* Byte offsets to the different on-media structures */
 	u64 nextoff;
 	u64 infooff;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 5/7] libnvdimm: fix potential deadlock while clearing errors
  2017-08-09  0:07 ` Vishal Verma
@ 2017-08-09  0:07   ` Vishal Verma
  -1 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Rafael J. Wysocki, Robert Moore, linux-acpi

With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths.
Specifically, the DSM to clear media errors is called during writes, so
that we can provide a writes-fix-errors model.

However it is easy to imagine a scenario like:
 -> write through the nvdimm driver
   -> acpi allocation
     -> writeback, causes more IO through the nvdimm driver
       -> deadlock

Fix this by using memalloc_noio_{save,restore}, which sets the GFP_NOIO
flag for the current scope when issuing commands/IOs that are expected
to clear errors.

Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/bus.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 937fafa..a18c291 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -11,6 +11,7 @@
  * General Public License for more details.
  */
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/sched/mm.h>
 #include <linux/vmalloc.h>
 #include <linux/uaccess.h>
 #include <linux/module.h>
@@ -234,6 +235,7 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
 	struct nd_cmd_clear_error clear_err;
 	struct nd_cmd_ars_cap ars_cap;
 	u32 clear_err_unit, mask;
+	unsigned int noio_flag;
 	int cmd_rc, rc;
 
 	if (!nvdimm_bus)
@@ -250,8 +252,10 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
 	memset(&ars_cap, 0, sizeof(ars_cap));
 	ars_cap.address = phys;
 	ars_cap.length = len;
+	noio_flag = memalloc_noio_save();
 	rc = nd_desc->ndctl(nd_desc, NULL, ND_CMD_ARS_CAP, &ars_cap,
 			sizeof(ars_cap), &cmd_rc);
+	memalloc_noio_restore(noio_flag);
 	if (rc < 0)
 		return rc;
 	if (cmd_rc < 0)
@@ -266,8 +270,10 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
 	memset(&clear_err, 0, sizeof(clear_err));
 	clear_err.address = phys;
 	clear_err.length = len;
+	noio_flag = memalloc_noio_save();
 	rc = nd_desc->ndctl(nd_desc, NULL, ND_CMD_CLEAR_ERROR, &clear_err,
 			sizeof(clear_err), &cmd_rc);
+	memalloc_noio_restore(noio_flag);
 	if (rc < 0)
 		return rc;
 	if (cmd_rc < 0)
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 5/7] libnvdimm: fix potential deadlock while clearing errors
@ 2017-08-09  0:07   ` Vishal Verma
  0 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: linux-acpi, Dan Williams, Jeff Moyer, Rafael J. Wysocki,
	Toshi Kani, Vishal Verma, Robert Moore

With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths.
Specifically, the DSM to clear media errors is called during writes, so
that we can provide a writes-fix-errors model.

However it is easy to imagine a scenario like:
 -> write through the nvdimm driver
   -> acpi allocation
     -> writeback, causes more IO through the nvdimm driver
       -> deadlock

Fix this by using memalloc_noio_{save,restore}, which sets the GFP_NOIO
flag for the current scope when issuing commands/IOs that are expected
to clear errors.

Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/bus.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 937fafa..a18c291 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -11,6 +11,7 @@
  * General Public License for more details.
  */
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/sched/mm.h>
 #include <linux/vmalloc.h>
 #include <linux/uaccess.h>
 #include <linux/module.h>
@@ -234,6 +235,7 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
 	struct nd_cmd_clear_error clear_err;
 	struct nd_cmd_ars_cap ars_cap;
 	u32 clear_err_unit, mask;
+	unsigned int noio_flag;
 	int cmd_rc, rc;
 
 	if (!nvdimm_bus)
@@ -250,8 +252,10 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
 	memset(&ars_cap, 0, sizeof(ars_cap));
 	ars_cap.address = phys;
 	ars_cap.length = len;
+	noio_flag = memalloc_noio_save();
 	rc = nd_desc->ndctl(nd_desc, NULL, ND_CMD_ARS_CAP, &ars_cap,
 			sizeof(ars_cap), &cmd_rc);
+	memalloc_noio_restore(noio_flag);
 	if (rc < 0)
 		return rc;
 	if (cmd_rc < 0)
@@ -266,8 +270,10 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
 	memset(&clear_err, 0, sizeof(clear_err));
 	clear_err.address = phys;
 	clear_err.length = len;
+	noio_flag = memalloc_noio_save();
 	rc = nd_desc->ndctl(nd_desc, NULL, ND_CMD_CLEAR_ERROR, &clear_err,
 			sizeof(clear_err), &cmd_rc);
+	memalloc_noio_restore(noio_flag);
 	if (rc < 0)
 		return rc;
 	if (cmd_rc < 0)
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 6/7] libnvdimm, btt: refactor initial_offset calculations
  2017-08-09  0:07 ` Vishal Verma
@ 2017-08-09  0:07   ` Vishal Verma
  -1 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Rafael J. Wysocki, linux-acpi

In preparation for BTT error clearing, refactor the initial offset
calculations. Until now, all callers of arena_{read,write}_bytes assumed
a relative offset to the arena, and it was later adjusted for the
initial offset. Until now, every time we calculated a relative offset,
we passed it to these functions to do reads or writes, and so where the
offset calculations happened didn't matter.

For error clearing, there will be places where we calculate offsets to
check for the presence of media errors, and the above scheme becomes
error prone.

Make the initial_offset calculations explicit for all callers of
arena_{read,write}_bytes, and provide a helper to do them. The error
checking/clearing code can then use these same helpers.

Reported-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c | 69 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 42 insertions(+), 27 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index e9dd651..9acf06b 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -31,14 +31,17 @@ enum log_ent_request {
 	LOG_OLD_ENT
 };
 
+static inline u64 calc_nsoff(struct nd_btt *nd_btt, u64 relative_offset)
+{
+	return relative_offset + nd_btt->initial_offset;
+}
+
 static int arena_read_bytes(struct arena_info *arena, resource_size_t offset,
 		void *buf, size_t n, unsigned long flags)
 {
 	struct nd_btt *nd_btt = arena->nd_btt;
 	struct nd_namespace_common *ndns = nd_btt->ndns;
 
-	/* arena offsets may be shifted from the base of the device */
-	offset += arena->nd_btt->initial_offset;
 	return nvdimm_read_bytes(ndns, offset, buf, n, flags);
 }
 
@@ -48,14 +51,13 @@ static int arena_write_bytes(struct arena_info *arena, resource_size_t offset,
 	struct nd_btt *nd_btt = arena->nd_btt;
 	struct nd_namespace_common *ndns = nd_btt->ndns;
 
-	/* arena offsets may be shifted from the base of the device */
-	offset += arena->nd_btt->initial_offset;
 	return nvdimm_write_bytes(ndns, offset, buf, n, flags);
 }
 
 static int btt_info_write(struct arena_info *arena, struct btt_sb *super)
 {
 	int ret;
+	u64 nsoff;
 
 	/*
 	 * infooff and info2off should always be at least 512B aligned.
@@ -65,19 +67,23 @@ static int btt_info_write(struct arena_info *arena, struct btt_sb *super)
 	WARN_ON_ONCE(!IS_ALIGNED(arena->infooff, 512));
 	WARN_ON_ONCE(!IS_ALIGNED(arena->info2off, 512));
 
-	ret = arena_write_bytes(arena, arena->info2off, super,
+	nsoff = calc_nsoff(arena->nd_btt, arena->info2off);
+	ret = arena_write_bytes(arena, nsoff, super,
 			sizeof(struct btt_sb), 0);
 	if (ret)
 		return ret;
 
-	return arena_write_bytes(arena, arena->infooff, super,
+	nsoff = calc_nsoff(arena->nd_btt, arena->infooff);
+	return arena_write_bytes(arena, nsoff, super,
 			sizeof(struct btt_sb), 0);
 }
 
 static int btt_info_read(struct arena_info *arena, struct btt_sb *super)
 {
+	u64 nsoff = calc_nsoff(arena->nd_btt, arena->infooff);
+
 	WARN_ON(!super);
-	return arena_read_bytes(arena, arena->infooff, super,
+	return arena_read_bytes(arena, nsoff, super,
 			sizeof(struct btt_sb), 0);
 }
 
@@ -90,10 +96,11 @@ static int btt_info_read(struct arena_info *arena, struct btt_sb *super)
 static int __btt_map_write(struct arena_info *arena, u32 lba, __le32 mapping,
 		unsigned long flags)
 {
-	u64 ns_off = arena->mapoff + (lba * MAP_ENT_SIZE);
+	u64 nsoff = calc_nsoff(arena->nd_btt,
+		arena->mapoff + (lba * MAP_ENT_SIZE));
 
 	WARN_ON(lba >= arena->external_nlba);
-	return arena_write_bytes(arena, ns_off, &mapping, MAP_ENT_SIZE, flags);
+	return arena_write_bytes(arena, nsoff, &mapping, MAP_ENT_SIZE, flags);
 }
 
 static int btt_map_write(struct arena_info *arena, u32 lba, u32 mapping,
@@ -145,11 +152,12 @@ static int btt_map_read(struct arena_info *arena, u32 lba, u32 *mapping,
 	int ret;
 	__le32 in;
 	u32 raw_mapping, postmap, ze, z_flag, e_flag;
-	u64 ns_off = arena->mapoff + (lba * MAP_ENT_SIZE);
+	u64 nsoff = calc_nsoff(arena->nd_btt,
+		arena->mapoff + (lba * MAP_ENT_SIZE));
 
 	WARN_ON(lba >= arena->external_nlba);
 
-	ret = arena_read_bytes(arena, ns_off, &in, MAP_ENT_SIZE, rwb_flags);
+	ret = arena_read_bytes(arena, nsoff, &in, MAP_ENT_SIZE, rwb_flags);
 	if (ret)
 		return ret;
 
@@ -195,10 +203,11 @@ static int btt_map_read(struct arena_info *arena, u32 lba, u32 *mapping,
 static int btt_log_read_pair(struct arena_info *arena, u32 lane,
 			struct log_entry *ent)
 {
+	u64 nsoff = calc_nsoff(arena->nd_btt,
+		arena->logoff + (2 * lane * LOG_ENT_SIZE));
+
 	WARN_ON(!ent);
-	return arena_read_bytes(arena,
-			arena->logoff + (2 * lane * LOG_ENT_SIZE), ent,
-			2 * LOG_ENT_SIZE, 0);
+	return arena_read_bytes(arena, nsoff, ent, 2 * LOG_ENT_SIZE, 0);
 }
 
 static struct dentry *debugfs_root;
@@ -355,17 +364,20 @@ static int __btt_log_write(struct arena_info *arena, u32 lane,
 	 * media wear and write amplification
 	 */
 	unsigned int log_half = (LOG_ENT_SIZE - 2 * sizeof(u64)) / 2;
-	u64 ns_off = arena->logoff + (((2 * lane) + sub) * LOG_ENT_SIZE);
 	void *src = ent;
+	u64 nsoff;
+
+	nsoff = calc_nsoff(arena->nd_btt,
+		arena->logoff + (((2 * lane) + sub) * LOG_ENT_SIZE));
 
 	/* split the 16B write into atomic, durable halves */
-	ret = arena_write_bytes(arena, ns_off, src, log_half, flags);
+	ret = arena_write_bytes(arena, nsoff, src, log_half, flags);
 	if (ret)
 		return ret;
 
-	ns_off += log_half;
+	nsoff += log_half;
 	src += log_half;
-	return arena_write_bytes(arena, ns_off, src, log_half, flags);
+	return arena_write_bytes(arena, nsoff, src, log_half, flags);
 }
 
 static int btt_flog_write(struct arena_info *arena, u32 lane, u32 sub,
@@ -413,8 +425,9 @@ static int btt_map_init(struct arena_info *arena)
 		size_t size = min(mapsize, chunk_size);
 
 		WARN_ON_ONCE(size < 512);
-		ret = arena_write_bytes(arena, arena->mapoff + offset, zerobuf,
-				size, 0);
+		ret = arena_write_bytes(arena,
+			calc_nsoff(arena->nd_btt, arena->mapoff + offset),
+			zerobuf, size, 0);
 		if (ret)
 			goto free;
 
@@ -455,8 +468,9 @@ static int btt_log_init(struct arena_info *arena)
 		size_t size = min(logsize, chunk_size);
 
 		WARN_ON_ONCE(size < 512);
-		ret = arena_write_bytes(arena, arena->logoff + offset, zerobuf,
-				size, 0);
+		ret = arena_write_bytes(arena,
+			calc_nsoff(arena->nd_btt, arena->logoff + offset),
+			zerobuf, size, 0);
 		if (ret)
 			goto free;
 
@@ -905,16 +919,17 @@ static void unlock_map(struct arena_info *arena, u32 premap)
 	spin_unlock(&arena->map_locks[idx].lock);
 }
 
-static u64 to_namespace_offset(struct arena_info *arena, u64 lba)
+static u64 lba_to_nsoff(struct arena_info *arena, u64 lba)
 {
-	return arena->dataoff + ((u64)lba * arena->internal_lbasize);
+	return calc_nsoff(arena->nd_btt,
+		arena->dataoff + ((u64)lba * arena->internal_lbasize));
 }
 
 static int btt_data_read(struct arena_info *arena, struct page *page,
 			unsigned int off, u32 lba, u32 len)
 {
 	int ret;
-	u64 nsoff = to_namespace_offset(arena, lba);
+	u64 nsoff = lba_to_nsoff(arena, lba);
 	void *mem = kmap_atomic(page);
 
 	ret = arena_read_bytes(arena, nsoff, mem + off, len, NVDIMM_IO_ATOMIC);
@@ -927,7 +942,7 @@ static int btt_data_write(struct arena_info *arena, u32 lba,
 			struct page *page, unsigned int off, u32 len)
 {
 	int ret;
-	u64 nsoff = to_namespace_offset(arena, lba);
+	u64 nsoff = lba_to_nsoff(arena, lba);
 	void *mem = kmap_atomic(page);
 
 	ret = arena_write_bytes(arena, nsoff, mem + off, len, NVDIMM_IO_ATOMIC);
@@ -955,7 +970,7 @@ static int btt_rw_integrity(struct btt *btt, struct bio_integrity_payload *bip,
 	if (bip == NULL)
 		return 0;
 
-	meta_nsoff = to_namespace_offset(arena, postmap) + btt->sector_size;
+	meta_nsoff = lba_to_nsoff(arena, postmap) + btt->sector_size;
 
 	while (len) {
 		unsigned int cur_len;
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 6/7] libnvdimm, btt: refactor initial_offset calculations
@ 2017-08-09  0:07   ` Vishal Verma
  0 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: linux-acpi, Dan Williams, Jeff Moyer, Rafael J. Wysocki,
	Toshi Kani, Vishal Verma

In preparation for BTT error clearing, refactor the initial offset
calculations. Until now, all callers of arena_{read,write}_bytes assumed
a relative offset to the arena, and it was later adjusted for the
initial offset. Until now, every time we calculated a relative offset,
we passed it to these functions to do reads or writes, and so where the
offset calculations happened didn't matter.

For error clearing, there will be places where we calculate offsets to
check for the presence of media errors, and the above scheme becomes
error prone.

Make the initial_offset calculations explicit for all callers of
arena_{read,write}_bytes, and provide a helper to do them. The error
checking/clearing code can then use these same helpers.

Reported-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c | 69 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 42 insertions(+), 27 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index e9dd651..9acf06b 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -31,14 +31,17 @@ enum log_ent_request {
 	LOG_OLD_ENT
 };
 
+static inline u64 calc_nsoff(struct nd_btt *nd_btt, u64 relative_offset)
+{
+	return relative_offset + nd_btt->initial_offset;
+}
+
 static int arena_read_bytes(struct arena_info *arena, resource_size_t offset,
 		void *buf, size_t n, unsigned long flags)
 {
 	struct nd_btt *nd_btt = arena->nd_btt;
 	struct nd_namespace_common *ndns = nd_btt->ndns;
 
-	/* arena offsets may be shifted from the base of the device */
-	offset += arena->nd_btt->initial_offset;
 	return nvdimm_read_bytes(ndns, offset, buf, n, flags);
 }
 
@@ -48,14 +51,13 @@ static int arena_write_bytes(struct arena_info *arena, resource_size_t offset,
 	struct nd_btt *nd_btt = arena->nd_btt;
 	struct nd_namespace_common *ndns = nd_btt->ndns;
 
-	/* arena offsets may be shifted from the base of the device */
-	offset += arena->nd_btt->initial_offset;
 	return nvdimm_write_bytes(ndns, offset, buf, n, flags);
 }
 
 static int btt_info_write(struct arena_info *arena, struct btt_sb *super)
 {
 	int ret;
+	u64 nsoff;
 
 	/*
 	 * infooff and info2off should always be at least 512B aligned.
@@ -65,19 +67,23 @@ static int btt_info_write(struct arena_info *arena, struct btt_sb *super)
 	WARN_ON_ONCE(!IS_ALIGNED(arena->infooff, 512));
 	WARN_ON_ONCE(!IS_ALIGNED(arena->info2off, 512));
 
-	ret = arena_write_bytes(arena, arena->info2off, super,
+	nsoff = calc_nsoff(arena->nd_btt, arena->info2off);
+	ret = arena_write_bytes(arena, nsoff, super,
 			sizeof(struct btt_sb), 0);
 	if (ret)
 		return ret;
 
-	return arena_write_bytes(arena, arena->infooff, super,
+	nsoff = calc_nsoff(arena->nd_btt, arena->infooff);
+	return arena_write_bytes(arena, nsoff, super,
 			sizeof(struct btt_sb), 0);
 }
 
 static int btt_info_read(struct arena_info *arena, struct btt_sb *super)
 {
+	u64 nsoff = calc_nsoff(arena->nd_btt, arena->infooff);
+
 	WARN_ON(!super);
-	return arena_read_bytes(arena, arena->infooff, super,
+	return arena_read_bytes(arena, nsoff, super,
 			sizeof(struct btt_sb), 0);
 }
 
@@ -90,10 +96,11 @@ static int btt_info_read(struct arena_info *arena, struct btt_sb *super)
 static int __btt_map_write(struct arena_info *arena, u32 lba, __le32 mapping,
 		unsigned long flags)
 {
-	u64 ns_off = arena->mapoff + (lba * MAP_ENT_SIZE);
+	u64 nsoff = calc_nsoff(arena->nd_btt,
+		arena->mapoff + (lba * MAP_ENT_SIZE));
 
 	WARN_ON(lba >= arena->external_nlba);
-	return arena_write_bytes(arena, ns_off, &mapping, MAP_ENT_SIZE, flags);
+	return arena_write_bytes(arena, nsoff, &mapping, MAP_ENT_SIZE, flags);
 }
 
 static int btt_map_write(struct arena_info *arena, u32 lba, u32 mapping,
@@ -145,11 +152,12 @@ static int btt_map_read(struct arena_info *arena, u32 lba, u32 *mapping,
 	int ret;
 	__le32 in;
 	u32 raw_mapping, postmap, ze, z_flag, e_flag;
-	u64 ns_off = arena->mapoff + (lba * MAP_ENT_SIZE);
+	u64 nsoff = calc_nsoff(arena->nd_btt,
+		arena->mapoff + (lba * MAP_ENT_SIZE));
 
 	WARN_ON(lba >= arena->external_nlba);
 
-	ret = arena_read_bytes(arena, ns_off, &in, MAP_ENT_SIZE, rwb_flags);
+	ret = arena_read_bytes(arena, nsoff, &in, MAP_ENT_SIZE, rwb_flags);
 	if (ret)
 		return ret;
 
@@ -195,10 +203,11 @@ static int btt_map_read(struct arena_info *arena, u32 lba, u32 *mapping,
 static int btt_log_read_pair(struct arena_info *arena, u32 lane,
 			struct log_entry *ent)
 {
+	u64 nsoff = calc_nsoff(arena->nd_btt,
+		arena->logoff + (2 * lane * LOG_ENT_SIZE));
+
 	WARN_ON(!ent);
-	return arena_read_bytes(arena,
-			arena->logoff + (2 * lane * LOG_ENT_SIZE), ent,
-			2 * LOG_ENT_SIZE, 0);
+	return arena_read_bytes(arena, nsoff, ent, 2 * LOG_ENT_SIZE, 0);
 }
 
 static struct dentry *debugfs_root;
@@ -355,17 +364,20 @@ static int __btt_log_write(struct arena_info *arena, u32 lane,
 	 * media wear and write amplification
 	 */
 	unsigned int log_half = (LOG_ENT_SIZE - 2 * sizeof(u64)) / 2;
-	u64 ns_off = arena->logoff + (((2 * lane) + sub) * LOG_ENT_SIZE);
 	void *src = ent;
+	u64 nsoff;
+
+	nsoff = calc_nsoff(arena->nd_btt,
+		arena->logoff + (((2 * lane) + sub) * LOG_ENT_SIZE));
 
 	/* split the 16B write into atomic, durable halves */
-	ret = arena_write_bytes(arena, ns_off, src, log_half, flags);
+	ret = arena_write_bytes(arena, nsoff, src, log_half, flags);
 	if (ret)
 		return ret;
 
-	ns_off += log_half;
+	nsoff += log_half;
 	src += log_half;
-	return arena_write_bytes(arena, ns_off, src, log_half, flags);
+	return arena_write_bytes(arena, nsoff, src, log_half, flags);
 }
 
 static int btt_flog_write(struct arena_info *arena, u32 lane, u32 sub,
@@ -413,8 +425,9 @@ static int btt_map_init(struct arena_info *arena)
 		size_t size = min(mapsize, chunk_size);
 
 		WARN_ON_ONCE(size < 512);
-		ret = arena_write_bytes(arena, arena->mapoff + offset, zerobuf,
-				size, 0);
+		ret = arena_write_bytes(arena,
+			calc_nsoff(arena->nd_btt, arena->mapoff + offset),
+			zerobuf, size, 0);
 		if (ret)
 			goto free;
 
@@ -455,8 +468,9 @@ static int btt_log_init(struct arena_info *arena)
 		size_t size = min(logsize, chunk_size);
 
 		WARN_ON_ONCE(size < 512);
-		ret = arena_write_bytes(arena, arena->logoff + offset, zerobuf,
-				size, 0);
+		ret = arena_write_bytes(arena,
+			calc_nsoff(arena->nd_btt, arena->logoff + offset),
+			zerobuf, size, 0);
 		if (ret)
 			goto free;
 
@@ -905,16 +919,17 @@ static void unlock_map(struct arena_info *arena, u32 premap)
 	spin_unlock(&arena->map_locks[idx].lock);
 }
 
-static u64 to_namespace_offset(struct arena_info *arena, u64 lba)
+static u64 lba_to_nsoff(struct arena_info *arena, u64 lba)
 {
-	return arena->dataoff + ((u64)lba * arena->internal_lbasize);
+	return calc_nsoff(arena->nd_btt,
+		arena->dataoff + ((u64)lba * arena->internal_lbasize));
 }
 
 static int btt_data_read(struct arena_info *arena, struct page *page,
 			unsigned int off, u32 lba, u32 len)
 {
 	int ret;
-	u64 nsoff = to_namespace_offset(arena, lba);
+	u64 nsoff = lba_to_nsoff(arena, lba);
 	void *mem = kmap_atomic(page);
 
 	ret = arena_read_bytes(arena, nsoff, mem + off, len, NVDIMM_IO_ATOMIC);
@@ -927,7 +942,7 @@ static int btt_data_write(struct arena_info *arena, u32 lba,
 			struct page *page, unsigned int off, u32 len)
 {
 	int ret;
-	u64 nsoff = to_namespace_offset(arena, lba);
+	u64 nsoff = lba_to_nsoff(arena, lba);
 	void *mem = kmap_atomic(page);
 
 	ret = arena_write_bytes(arena, nsoff, mem + off, len, NVDIMM_IO_ATOMIC);
@@ -955,7 +970,7 @@ static int btt_rw_integrity(struct btt *btt, struct bio_integrity_payload *bip,
 	if (bip == NULL)
 		return 0;
 
-	meta_nsoff = to_namespace_offset(arena, postmap) + btt->sector_size;
+	meta_nsoff = lba_to_nsoff(arena, postmap) + btt->sector_size;
 
 	while (len) {
 		unsigned int cur_len;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 7/7] libnvdimm, btt: rework error clearing
  2017-08-09  0:07 ` Vishal Verma
@ 2017-08-09  0:07   ` Vishal Verma
  -1 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Rafael J. Wysocki, linux-acpi

Clearing errors or badblocks during a BTT write requires sending an ACPI
DSM, which means potentially sleeping. Since a BTT IO happens in atomic
context (preemption disabled, spinlocks may be held), we cannot perform
error clearing in the course of an IO. Due to this error clearing for
BTT IOs has hitherto been disabled.

In this patch we move error clearing out of the atomic section, and thus
re-enable error clearing with BTTs. When we are about to add a block to
the free list, we check if it was previously marked as an error, and if
it was, we add it to the freelist, but also set a flag that says error
clearing will be required. We then drop the lane (ending the atomic
context), and send a zero buffer so that the error can be cleared. The
error flag in the free list is protected by the nd 'lane', and is set
only be a thread while it holds that lane. When the error is cleared,
the flag is cleared, but while holding a mutex for that freelist index.

When writing, we check for two things -
1/ If the freelist mutex is held or if the error flag is set. If so,
this is an error block that is being (or about to be) cleared.
2/ If the block is a known badblock based on nsio->bb

The second check is required because the BTT map error flag for a map
entry only gets set when an error LBA is read. If we write to a new
location that may not have the map error flag set, but still might be in
the region's badblock list, we can trigger an EIO on the write, which is
undesirable and completely avoidable.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c   | 96 ++++++++++++++++++++++++++++++++++++++++++++------
 drivers/nvdimm/btt.h   |  5 +++
 drivers/nvdimm/claim.c |  8 -----
 3 files changed, 91 insertions(+), 18 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 9acf06b..b211daa 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -393,7 +393,9 @@ static int btt_flog_write(struct arena_info *arena, u32 lane, u32 sub,
 	arena->freelist[lane].sub = 1 - arena->freelist[lane].sub;
 	if (++(arena->freelist[lane].seq) == 4)
 		arena->freelist[lane].seq = 1;
-	arena->freelist[lane].block = le32_to_cpu(ent->old_map);
+	if (ent_e_flag(ent->old_map))
+		arena->freelist[lane].has_err = 1;
+	arena->freelist[lane].block = le32_to_cpu(ent_lba(ent->old_map));
 
 	return ret;
 }
@@ -494,6 +496,41 @@ static int btt_log_init(struct arena_info *arena)
 	return ret;
 }
 
+static u64 lba_to_nsoff(struct arena_info *arena, u64 lba)
+{
+	return calc_nsoff(arena->nd_btt,
+		arena->dataoff + ((u64)lba * arena->internal_lbasize));
+}
+
+static int arena_clear_freelist_error(struct arena_info *arena, u32 lane)
+{
+	int ret = 0;
+
+	if (arena->freelist[lane].has_err) {
+		void *zero_page = page_address(ZERO_PAGE(0));
+		u32 lba = arena->freelist[lane].block;
+		u64 nsoff = lba_to_nsoff(arena, lba);
+		unsigned long len = arena->sector_size;
+
+		mutex_lock(&arena->err_lock);
+
+		while (len) {
+			unsigned long chunk = min(len, PAGE_SIZE);
+
+			ret = arena_write_bytes(arena, nsoff, zero_page,
+				chunk, 0);
+			if (ret)
+				break;
+			len -= chunk;
+			nsoff += chunk;
+			if (len == 0)
+				arena->freelist[lane].has_err = 0;
+		}
+		mutex_unlock(&arena->err_lock);
+	}
+	return ret;
+}
+
 static int btt_freelist_init(struct arena_info *arena)
 {
 	int old, new, ret;
@@ -519,6 +556,9 @@ static int btt_freelist_init(struct arena_info *arena)
 		arena->freelist[i].seq = nd_inc_seq(le32_to_cpu(log_new.seq));
 		arena->freelist[i].block = le32_to_cpu(log_new.old_map);
 
+		if (ent_e_flag(log_new.old_map))
+			arena_clear_freelist_error(arena, i);
+
 		/* This implies a newly created or untouched flog entry */
 		if (log_new.old_map == log_new.new_map)
 			continue;
@@ -539,7 +579,6 @@ static int btt_freelist_init(struct arena_info *arena)
 			if (ret)
 				return ret;
 		}
-
 	}
 
 	return 0;
@@ -709,6 +748,7 @@ static int discover_arenas(struct btt *btt)
 		arena->external_lba_start = cur_nlba;
 		parse_arena_meta(arena, super, cur_off);
 
+		mutex_init(&arena->err_lock);
 		ret = btt_freelist_init(arena);
 		if (ret)
 			goto out;
@@ -919,12 +959,6 @@ static void unlock_map(struct arena_info *arena, u32 premap)
 	spin_unlock(&arena->map_locks[idx].lock);
 }
 
-static u64 lba_to_nsoff(struct arena_info *arena, u64 lba)
-{
-	return calc_nsoff(arena->nd_btt,
-		arena->dataoff + ((u64)lba * arena->internal_lbasize));
-}
-
 static int btt_data_read(struct arena_info *arena, struct page *page,
 			unsigned int off, u32 lba, u32 len)
 {
@@ -1082,8 +1116,14 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip,
 		}
 
 		ret = btt_data_read(arena, page, off, postmap, cur_len);
-		if (ret)
+		if (ret) {
+			int rc;
+
+			/* Media error - set the e_flag */
+			rc = btt_map_write(arena, premap, postmap, 0, 1,
+				NVDIMM_IO_ATOMIC);
 			goto out_rtt;
+		}
 
 		if (bip) {
 			ret = btt_rw_integrity(btt, bip, arena, postmap, READ);
@@ -1108,6 +1148,15 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip,
 	return ret;
 }
 
+static bool btt_is_badblock(struct btt *btt, struct arena_info *arena,
+		u32 postmap)
+{
+	u64 nsoff = lba_to_nsoff(arena, postmap);
+	sector_t phys_sector = nsoff >> 9;
+
+	return is_bad_pmem(btt->phys_bb, phys_sector, arena->internal_lbasize);
+}
+
 static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 			sector_t sector, struct page *page, unsigned int off,
 			unsigned int len)
@@ -1120,7 +1169,9 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 
 	while (len) {
 		u32 cur_len;
+		int e_flag;
 
+ retry:
 		lane = nd_region_acquire_lane(btt->nd_region);
 
 		ret = lba_to_arena(btt, sector, &premap, &arena);
@@ -1133,6 +1184,23 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 			goto out_lane;
 		}
 
+		/* The block had a media error, and is being cleared */
+		if (mutex_is_locked(&arena->err_lock)
+				|| arena->freelist[lane].has_err) {
+			nd_region_release_lane(btt->nd_region, lane);
+			goto retry;
+		}
+
+		/* The block had a media error, and needs to be cleared */
+		if (btt_is_badblock(btt, arena, arena->freelist[lane].block)) {
+			arena->freelist[lane].has_err = 1;
+			nd_region_release_lane(btt->nd_region, lane);
+
+			arena_clear_freelist_error(arena, lane);
+			/* OK to acquire a different lane/free block */
+			goto retry;
+		}
+
 		new_postmap = arena->freelist[lane].block;
 
 		/* Wait if the new block is being read from */
@@ -1158,7 +1226,7 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 		}
 
 		lock_map(arena, premap);
-		ret = btt_map_read(arena, premap, &old_postmap, NULL, NULL,
+		ret = btt_map_read(arena, premap, &old_postmap, NULL, &e_flag,
 				NVDIMM_IO_ATOMIC);
 		if (ret)
 			goto out_map;
@@ -1166,6 +1234,8 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 			ret = -EIO;
 			goto out_map;
 		}
+		if (e_flag)
+			set_e_flag(old_postmap);
 
 		log.lba = cpu_to_le32(premap);
 		log.old_map = cpu_to_le32(old_postmap);
@@ -1184,6 +1254,9 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 		unlock_map(arena, premap);
 		nd_region_release_lane(btt->nd_region, lane);
 
+		if (e_flag)
+			arena_clear_freelist_error(arena, lane);
+
 		len -= cur_len;
 		off += cur_len;
 		sector += btt->sector_size >> SECTOR_SHIFT;
@@ -1364,6 +1437,7 @@ static struct btt *btt_init(struct nd_btt *nd_btt, unsigned long long rawsize,
 {
 	int ret;
 	struct btt *btt;
+	struct nd_namespace_io *nsio;
 	struct device *dev = &nd_btt->dev;
 
 	btt = devm_kzalloc(dev, sizeof(struct btt), GFP_KERNEL);
@@ -1377,6 +1451,8 @@ static struct btt *btt_init(struct nd_btt *nd_btt, unsigned long long rawsize,
 	INIT_LIST_HEAD(&btt->arena_list);
 	mutex_init(&btt->init_lock);
 	btt->nd_region = nd_region;
+	nsio = to_nd_namespace_io(&nd_btt->ndns->dev);
+	btt->phys_bb = &nsio->bb;
 
 	ret = discover_arenas(btt);
 	if (ret) {
diff --git a/drivers/nvdimm/btt.h b/drivers/nvdimm/btt.h
index 2bc0d10b..578c205 100644
--- a/drivers/nvdimm/btt.h
+++ b/drivers/nvdimm/btt.h
@@ -15,6 +15,7 @@
 #ifndef _LINUX_BTT_H
 #define _LINUX_BTT_H
 
+#include <linux/badblocks.h>
 #include <linux/types.h>
 
 #define BTT_SIG_LEN 16
@@ -41,6 +42,7 @@
 #define ent_lba(ent) (ent & MAP_LBA_MASK)
 #define ent_e_flag(ent) (!!(ent & MAP_ERR_MASK))
 #define ent_z_flag(ent) (!!(ent & MAP_TRIM_MASK))
+#define set_e_flag(ent) (ent |= MAP_ERR_MASK)
 
 enum btt_init_state {
 	INIT_UNCHECKED = 0,
@@ -82,6 +84,7 @@ struct free_entry {
 	u32 block;
 	u8 sub;
 	u8 seq;
+	u8 has_err;
 };
 
 struct aligned_lock {
@@ -153,6 +156,7 @@ struct arena_info {
 	struct dentry *debugfs_dir;
 	/* Arena flags */
 	u32 flags;
+	struct mutex err_lock;
 };
 
 /**
@@ -187,6 +191,7 @@ struct btt {
 	struct mutex init_lock;
 	int init_state;
 	int num_arenas;
+	struct badblocks *phys_bb;
 };
 
 bool nd_btt_arena_is_valid(struct nd_btt *nd_btt, struct btt_sb *super);
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 3e6404f..b2fc29b 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -280,14 +280,6 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	}
 
 	if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) {
-		/*
-		 * FIXME: nsio_rw_bytes() may be called from atomic
-		 * context in the btt case and the ACPI DSM path for
-		 * clearing the error takes sleeping locks and allocates
-		 * memory. An explicit error clearing path, and support
-		 * for tracking badblocks in BTT metadata is needed to
-		 * work around this collision.
-		 */
 		if (IS_ALIGNED(offset, 512) && IS_ALIGNED(size, 512)
 				&& !(flags & NVDIMM_IO_ATOMIC)) {
 			long cleared;
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 7/7] libnvdimm, btt: rework error clearing
@ 2017-08-09  0:07   ` Vishal Verma
  0 siblings, 0 replies; 22+ messages in thread
From: Vishal Verma @ 2017-08-09  0:07 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: linux-acpi, Dan Williams, Jeff Moyer, Rafael J. Wysocki,
	Toshi Kani, Vishal Verma

Clearing errors or badblocks during a BTT write requires sending an ACPI
DSM, which means potentially sleeping. Since a BTT IO happens in atomic
context (preemption disabled, spinlocks may be held), we cannot perform
error clearing in the course of an IO. Due to this error clearing for
BTT IOs has hitherto been disabled.

In this patch we move error clearing out of the atomic section, and thus
re-enable error clearing with BTTs. When we are about to add a block to
the free list, we check if it was previously marked as an error, and if
it was, we add it to the freelist, but also set a flag that says error
clearing will be required. We then drop the lane (ending the atomic
context), and send a zero buffer so that the error can be cleared. The
error flag in the free list is protected by the nd 'lane', and is set
only be a thread while it holds that lane. When the error is cleared,
the flag is cleared, but while holding a mutex for that freelist index.

When writing, we check for two things -
1/ If the freelist mutex is held or if the error flag is set. If so,
this is an error block that is being (or about to be) cleared.
2/ If the block is a known badblock based on nsio->bb

The second check is required because the BTT map error flag for a map
entry only gets set when an error LBA is read. If we write to a new
location that may not have the map error flag set, but still might be in
the region's badblock list, we can trigger an EIO on the write, which is
undesirable and completely avoidable.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c   | 96 ++++++++++++++++++++++++++++++++++++++++++++------
 drivers/nvdimm/btt.h   |  5 +++
 drivers/nvdimm/claim.c |  8 -----
 3 files changed, 91 insertions(+), 18 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 9acf06b..b211daa 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -393,7 +393,9 @@ static int btt_flog_write(struct arena_info *arena, u32 lane, u32 sub,
 	arena->freelist[lane].sub = 1 - arena->freelist[lane].sub;
 	if (++(arena->freelist[lane].seq) == 4)
 		arena->freelist[lane].seq = 1;
-	arena->freelist[lane].block = le32_to_cpu(ent->old_map);
+	if (ent_e_flag(ent->old_map))
+		arena->freelist[lane].has_err = 1;
+	arena->freelist[lane].block = le32_to_cpu(ent_lba(ent->old_map));
 
 	return ret;
 }
@@ -494,6 +496,41 @@ static int btt_log_init(struct arena_info *arena)
 	return ret;
 }
 
+static u64 lba_to_nsoff(struct arena_info *arena, u64 lba)
+{
+	return calc_nsoff(arena->nd_btt,
+		arena->dataoff + ((u64)lba * arena->internal_lbasize));
+}
+
+static int arena_clear_freelist_error(struct arena_info *arena, u32 lane)
+{
+	int ret = 0;
+
+	if (arena->freelist[lane].has_err) {
+		void *zero_page = page_address(ZERO_PAGE(0));
+		u32 lba = arena->freelist[lane].block;
+		u64 nsoff = lba_to_nsoff(arena, lba);
+		unsigned long len = arena->sector_size;
+
+		mutex_lock(&arena->err_lock);
+
+		while (len) {
+			unsigned long chunk = min(len, PAGE_SIZE);
+
+			ret = arena_write_bytes(arena, nsoff, zero_page,
+				chunk, 0);
+			if (ret)
+				break;
+			len -= chunk;
+			nsoff += chunk;
+			if (len == 0)
+				arena->freelist[lane].has_err = 0;
+		}
+		mutex_unlock(&arena->err_lock);
+	}
+	return ret;
+}
+
 static int btt_freelist_init(struct arena_info *arena)
 {
 	int old, new, ret;
@@ -519,6 +556,9 @@ static int btt_freelist_init(struct arena_info *arena)
 		arena->freelist[i].seq = nd_inc_seq(le32_to_cpu(log_new.seq));
 		arena->freelist[i].block = le32_to_cpu(log_new.old_map);
 
+		if (ent_e_flag(log_new.old_map))
+			arena_clear_freelist_error(arena, i);
+
 		/* This implies a newly created or untouched flog entry */
 		if (log_new.old_map == log_new.new_map)
 			continue;
@@ -539,7 +579,6 @@ static int btt_freelist_init(struct arena_info *arena)
 			if (ret)
 				return ret;
 		}
-
 	}
 
 	return 0;
@@ -709,6 +748,7 @@ static int discover_arenas(struct btt *btt)
 		arena->external_lba_start = cur_nlba;
 		parse_arena_meta(arena, super, cur_off);
 
+		mutex_init(&arena->err_lock);
 		ret = btt_freelist_init(arena);
 		if (ret)
 			goto out;
@@ -919,12 +959,6 @@ static void unlock_map(struct arena_info *arena, u32 premap)
 	spin_unlock(&arena->map_locks[idx].lock);
 }
 
-static u64 lba_to_nsoff(struct arena_info *arena, u64 lba)
-{
-	return calc_nsoff(arena->nd_btt,
-		arena->dataoff + ((u64)lba * arena->internal_lbasize));
-}
-
 static int btt_data_read(struct arena_info *arena, struct page *page,
 			unsigned int off, u32 lba, u32 len)
 {
@@ -1082,8 +1116,14 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip,
 		}
 
 		ret = btt_data_read(arena, page, off, postmap, cur_len);
-		if (ret)
+		if (ret) {
+			int rc;
+
+			/* Media error - set the e_flag */
+			rc = btt_map_write(arena, premap, postmap, 0, 1,
+				NVDIMM_IO_ATOMIC);
 			goto out_rtt;
+		}
 
 		if (bip) {
 			ret = btt_rw_integrity(btt, bip, arena, postmap, READ);
@@ -1108,6 +1148,15 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip,
 	return ret;
 }
 
+static bool btt_is_badblock(struct btt *btt, struct arena_info *arena,
+		u32 postmap)
+{
+	u64 nsoff = lba_to_nsoff(arena, postmap);
+	sector_t phys_sector = nsoff >> 9;
+
+	return is_bad_pmem(btt->phys_bb, phys_sector, arena->internal_lbasize);
+}
+
 static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 			sector_t sector, struct page *page, unsigned int off,
 			unsigned int len)
@@ -1120,7 +1169,9 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 
 	while (len) {
 		u32 cur_len;
+		int e_flag;
 
+ retry:
 		lane = nd_region_acquire_lane(btt->nd_region);
 
 		ret = lba_to_arena(btt, sector, &premap, &arena);
@@ -1133,6 +1184,23 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 			goto out_lane;
 		}
 
+		/* The block had a media error, and is being cleared */
+		if (mutex_is_locked(&arena->err_lock)
+				|| arena->freelist[lane].has_err) {
+			nd_region_release_lane(btt->nd_region, lane);
+			goto retry;
+		}
+
+		/* The block had a media error, and needs to be cleared */
+		if (btt_is_badblock(btt, arena, arena->freelist[lane].block)) {
+			arena->freelist[lane].has_err = 1;
+			nd_region_release_lane(btt->nd_region, lane);
+
+			arena_clear_freelist_error(arena, lane);
+			/* OK to acquire a different lane/free block */
+			goto retry;
+		}
+
 		new_postmap = arena->freelist[lane].block;
 
 		/* Wait if the new block is being read from */
@@ -1158,7 +1226,7 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 		}
 
 		lock_map(arena, premap);
-		ret = btt_map_read(arena, premap, &old_postmap, NULL, NULL,
+		ret = btt_map_read(arena, premap, &old_postmap, NULL, &e_flag,
 				NVDIMM_IO_ATOMIC);
 		if (ret)
 			goto out_map;
@@ -1166,6 +1234,8 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 			ret = -EIO;
 			goto out_map;
 		}
+		if (e_flag)
+			set_e_flag(old_postmap);
 
 		log.lba = cpu_to_le32(premap);
 		log.old_map = cpu_to_le32(old_postmap);
@@ -1184,6 +1254,9 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip,
 		unlock_map(arena, premap);
 		nd_region_release_lane(btt->nd_region, lane);
 
+		if (e_flag)
+			arena_clear_freelist_error(arena, lane);
+
 		len -= cur_len;
 		off += cur_len;
 		sector += btt->sector_size >> SECTOR_SHIFT;
@@ -1364,6 +1437,7 @@ static struct btt *btt_init(struct nd_btt *nd_btt, unsigned long long rawsize,
 {
 	int ret;
 	struct btt *btt;
+	struct nd_namespace_io *nsio;
 	struct device *dev = &nd_btt->dev;
 
 	btt = devm_kzalloc(dev, sizeof(struct btt), GFP_KERNEL);
@@ -1377,6 +1451,8 @@ static struct btt *btt_init(struct nd_btt *nd_btt, unsigned long long rawsize,
 	INIT_LIST_HEAD(&btt->arena_list);
 	mutex_init(&btt->init_lock);
 	btt->nd_region = nd_region;
+	nsio = to_nd_namespace_io(&nd_btt->ndns->dev);
+	btt->phys_bb = &nsio->bb;
 
 	ret = discover_arenas(btt);
 	if (ret) {
diff --git a/drivers/nvdimm/btt.h b/drivers/nvdimm/btt.h
index 2bc0d10b..578c205 100644
--- a/drivers/nvdimm/btt.h
+++ b/drivers/nvdimm/btt.h
@@ -15,6 +15,7 @@
 #ifndef _LINUX_BTT_H
 #define _LINUX_BTT_H
 
+#include <linux/badblocks.h>
 #include <linux/types.h>
 
 #define BTT_SIG_LEN 16
@@ -41,6 +42,7 @@
 #define ent_lba(ent) (ent & MAP_LBA_MASK)
 #define ent_e_flag(ent) (!!(ent & MAP_ERR_MASK))
 #define ent_z_flag(ent) (!!(ent & MAP_TRIM_MASK))
+#define set_e_flag(ent) (ent |= MAP_ERR_MASK)
 
 enum btt_init_state {
 	INIT_UNCHECKED = 0,
@@ -82,6 +84,7 @@ struct free_entry {
 	u32 block;
 	u8 sub;
 	u8 seq;
+	u8 has_err;
 };
 
 struct aligned_lock {
@@ -153,6 +156,7 @@ struct arena_info {
 	struct dentry *debugfs_dir;
 	/* Arena flags */
 	u32 flags;
+	struct mutex err_lock;
 };
 
 /**
@@ -187,6 +191,7 @@ struct btt {
 	struct mutex init_lock;
 	int init_state;
 	int num_arenas;
+	struct badblocks *phys_bb;
 };
 
 bool nd_btt_arena_is_valid(struct nd_btt *nd_btt, struct btt_sb *super);
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 3e6404f..b2fc29b 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -280,14 +280,6 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	}
 
 	if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) {
-		/*
-		 * FIXME: nsio_rw_bytes() may be called from atomic
-		 * context in the btt case and the ACPI DSM path for
-		 * clearing the error takes sleeping locks and allocates
-		 * memory. An explicit error clearing path, and support
-		 * for tracking badblocks in BTT metadata is needed to
-		 * work around this collision.
-		 */
 		if (IS_ALIGNED(offset, 512) && IS_ALIGNED(size, 512)
 				&& !(flags & NVDIMM_IO_ATOMIC)) {
 			long cleared;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 6/7] libnvdimm, btt: refactor initial_offset calculations
@ 2017-08-12  1:39     ` Dan Williams
  0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2017-08-12  1:39 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Linux ACPI, Rafael J. Wysocki, linux-nvdimm

On Tue, Aug 8, 2017 at 5:07 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> In preparation for BTT error clearing, refactor the initial offset
> calculations. Until now, all callers of arena_{read,write}_bytes assumed
> a relative offset to the arena, and it was later adjusted for the
> initial offset. Until now, every time we calculated a relative offset,
> we passed it to these functions to do reads or writes, and so where the
> offset calculations happened didn't matter.
>
> For error clearing, there will be places where we calculate offsets to
> check for the presence of media errors, and the above scheme becomes
> error prone.
>
> Make the initial_offset calculations explicit for all callers of
> arena_{read,write}_bytes, and provide a helper to do them. The error
> checking/clearing code can then use these same helpers.
>
> Reported-by: Toshi Kani <toshi.kani@hpe.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  drivers/nvdimm/btt.c | 69 ++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 42 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index e9dd651..9acf06b 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -31,14 +31,17 @@ enum log_ent_request {
>         LOG_OLD_ENT
>  };
>
> +static inline u64 calc_nsoff(struct nd_btt *nd_btt, u64 relative_offset)
> +{
> +       return relative_offset + nd_btt->initial_offset;
> +}
> +
>  static int arena_read_bytes(struct arena_info *arena, resource_size_t offset,
>                 void *buf, size_t n, unsigned long flags)
>  {
>         struct nd_btt *nd_btt = arena->nd_btt;
>         struct nd_namespace_common *ndns = nd_btt->ndns;
>
> -       /* arena offsets may be shifted from the base of the device */
> -       offset += arena->nd_btt->initial_offset;
>         return nvdimm_read_bytes(ndns, offset, buf, n, flags);
>  }
>
> @@ -48,14 +51,13 @@ static int arena_write_bytes(struct arena_info *arena, resource_size_t offset,
>         struct nd_btt *nd_btt = arena->nd_btt;
>         struct nd_namespace_common *ndns = nd_btt->ndns;
>
> -       /* arena offsets may be shifted from the base of the device */
> -       offset += arena->nd_btt->initial_offset;
>         return nvdimm_write_bytes(ndns, offset, buf, n, flags);
>  }

So let's get rid of arena_{read,write}_bytes(). The only point of the
wrapper was to include an arena specific offset, but now we're pushing
all offset calculations to the caller so arena_{read,write}_bytes() is
equivalent to nvdimm_{read,write}_bytes(). However, since many of
these call sites now perform a common offset calculations maybe we can
provide a new helper for that to differentiate it from the error
clearing case? After patch 7 is applied it's no longer clear which
offset calculation is used where and for what reason, there's just too
many offset variables calculations and overload of the word 'offset'
for bugs to hide.

Maybe it's just late on a Friday and my eyes are going cross, but it
seems you have 2 classes of use cases between calc_nsoff and
lba_to_nsoff, perhaps separate read/write wrappers for those 2 cases?
Also, I find something unreadable about calc_nsoff, and reading the
prototype "u64 calc_nsoff(struct nd_btt *nd_btt, u64 relative_offset)"
doesn't help clarify. What is 'relative_offset' relative to? Is it the
base namespace / device offset?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 6/7] libnvdimm, btt: refactor initial_offset calculations
@ 2017-08-12  1:39     ` Dan Williams
  0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2017-08-12  1:39 UTC (permalink / raw)
  To: Vishal Verma
  Cc: Linux ACPI, Rafael J. Wysocki, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw

On Tue, Aug 8, 2017 at 5:07 PM, Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> In preparation for BTT error clearing, refactor the initial offset
> calculations. Until now, all callers of arena_{read,write}_bytes assumed
> a relative offset to the arena, and it was later adjusted for the
> initial offset. Until now, every time we calculated a relative offset,
> we passed it to these functions to do reads or writes, and so where the
> offset calculations happened didn't matter.
>
> For error clearing, there will be places where we calculate offsets to
> check for the presence of media errors, and the above scheme becomes
> error prone.
>
> Make the initial_offset calculations explicit for all callers of
> arena_{read,write}_bytes, and provide a helper to do them. The error
> checking/clearing code can then use these same helpers.
>
> Reported-by: Toshi Kani <toshi.kani-ZPxbGqLxI0U@public.gmane.org>
> Signed-off-by: Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/nvdimm/btt.c | 69 ++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 42 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index e9dd651..9acf06b 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -31,14 +31,17 @@ enum log_ent_request {
>         LOG_OLD_ENT
>  };
>
> +static inline u64 calc_nsoff(struct nd_btt *nd_btt, u64 relative_offset)
> +{
> +       return relative_offset + nd_btt->initial_offset;
> +}
> +
>  static int arena_read_bytes(struct arena_info *arena, resource_size_t offset,
>                 void *buf, size_t n, unsigned long flags)
>  {
>         struct nd_btt *nd_btt = arena->nd_btt;
>         struct nd_namespace_common *ndns = nd_btt->ndns;
>
> -       /* arena offsets may be shifted from the base of the device */
> -       offset += arena->nd_btt->initial_offset;
>         return nvdimm_read_bytes(ndns, offset, buf, n, flags);
>  }
>
> @@ -48,14 +51,13 @@ static int arena_write_bytes(struct arena_info *arena, resource_size_t offset,
>         struct nd_btt *nd_btt = arena->nd_btt;
>         struct nd_namespace_common *ndns = nd_btt->ndns;
>
> -       /* arena offsets may be shifted from the base of the device */
> -       offset += arena->nd_btt->initial_offset;
>         return nvdimm_write_bytes(ndns, offset, buf, n, flags);
>  }

So let's get rid of arena_{read,write}_bytes(). The only point of the
wrapper was to include an arena specific offset, but now we're pushing
all offset calculations to the caller so arena_{read,write}_bytes() is
equivalent to nvdimm_{read,write}_bytes(). However, since many of
these call sites now perform a common offset calculations maybe we can
provide a new helper for that to differentiate it from the error
clearing case? After patch 7 is applied it's no longer clear which
offset calculation is used where and for what reason, there's just too
many offset variables calculations and overload of the word 'offset'
for bugs to hide.

Maybe it's just late on a Friday and my eyes are going cross, but it
seems you have 2 classes of use cases between calc_nsoff and
lba_to_nsoff, perhaps separate read/write wrappers for those 2 cases?
Also, I find something unreadable about calc_nsoff, and reading the
prototype "u64 calc_nsoff(struct nd_btt *nd_btt, u64 relative_offset)"
doesn't help clarify. What is 'relative_offset' relative to? Is it the
base namespace / device offset?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 6/7] libnvdimm, btt: refactor initial_offset calculations
  2017-08-12  1:39     ` Dan Williams
@ 2017-08-14 19:47       ` Verma, Vishal L
  -1 siblings, 0 replies; 22+ messages in thread
From: Verma, Vishal L @ 2017-08-14 19:47 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-acpi, Wysocki, Rafael J  <rafael.j.wysocki@intel.com>,
	linux-nvdimm@lists.01.org

On Fri, 2017-08-11 at 18:39 -0700, Dan Williams wrote:
> On Tue, Aug 8, 2017 at 5:07 PM, Vishal Verma <vishal.l.verma@intel.com
> > wrote:
> > In preparation for BTT error clearing, refactor the initial offset
> > calculations. Until now, all callers of arena_{read,write}_bytes
> > assumed
> > a relative offset to the arena, and it was later adjusted for the
> > initial offset. Until now, every time we calculated a relative
> > offset,
> > we passed it to these functions to do reads or writes, and so where
> > the
> > offset calculations happened didn't matter.
> > 
> > For error clearing, there will be places where we calculate offsets
> > to
> > check for the presence of media errors, and the above scheme becomes
> > error prone.
> > 
> > Make the initial_offset calculations explicit for all callers of
> > arena_{read,write}_bytes, and provide a helper to do them. The error
> > checking/clearing code can then use these same helpers.
> > 
> > Reported-by: Toshi Kani <toshi.kani@hpe.com>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  drivers/nvdimm/btt.c | 69 ++++++++++++++++++++++++++++++++---------
> > -----------
> >  1 file changed, 42 insertions(+), 27 deletions(-)
> > 
> > diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> > index e9dd651..9acf06b 100644
> > --- a/drivers/nvdimm/btt.c
> > +++ b/drivers/nvdimm/btt.c
> > @@ -31,14 +31,17 @@ enum log_ent_request {
> >         LOG_OLD_ENT
> >  };
> > 
> > +static inline u64 calc_nsoff(struct nd_btt *nd_btt, u64
> > relative_offset)
> > +{
> > +       return relative_offset + nd_btt->initial_offset;
> > +}
> > +
> >  static int arena_read_bytes(struct arena_info *arena,
> > resource_size_t offset,
> >                 void *buf, size_t n, unsigned long flags)
> >  {
> >         struct nd_btt *nd_btt = arena->nd_btt;
> >         struct nd_namespace_common *ndns = nd_btt->ndns;
> > 
> > -       /* arena offsets may be shifted from the base of the device
> > */
> > -       offset += arena->nd_btt->initial_offset;
> >         return nvdimm_read_bytes(ndns, offset, buf, n, flags);
> >  }
> > 
> > @@ -48,14 +51,13 @@ static int arena_write_bytes(struct arena_info
> > *arena, resource_size_t offset,
> >         struct nd_btt *nd_btt = arena->nd_btt;
> >         struct nd_namespace_common *ndns = nd_btt->ndns;
> > 
> > -       /* arena offsets may be shifted from the base of the device
> > */
> > -       offset += arena->nd_btt->initial_offset;
> >         return nvdimm_write_bytes(ndns, offset, buf, n, flags);
> >  }
> 
> So let's get rid of arena_{read,write}_bytes(). The only point of the
> wrapper was to include an arena specific offset, but now we're pushing
> all offset calculations to the caller so arena_{read,write}_bytes() is
> equivalent to nvdimm_{read,write}_bytes(). 

That sounds fine, though that will require all callers to now
dereference arena->nd->btt->ndns to do reads/writes.. Is that ok?

> However, since many of
> these call sites now perform a common offset calculations maybe we can
> provide a new helper for that to differentiate it from the error
> clearing case? 

Not sure I follow - the offset calculations for the error clearing use
and for data reads/writes are exactly the same (lba_to_nsoff). (See
further below)

> After patch 7 is applied it's no longer clear which
> offset calculation is used where and for what reason, there's just too
> many offset variables calculations and overload of the word 'offset'
> for bugs to hide.

Agreed on the ambiguity of 'offset'. The implicit rule I followed was
any kind of 'nsoff' will mean the final namespace offset, after
accounting for arenas, any initial offset etc. Any other 'offset' is
relative to the context in which it is used.

> 
> Maybe it's just late on a Friday and my eyes are going cross, but it
> seems you have 2 classes of use cases between calc_nsoff and
> lba_to_nsoff, perhaps separate read/write wrappers for those 2 cases?

The difference between the two is that lba_to_nsoff is used when were
talking about reading/writing a 'data' block, and where we are talking
in terms of 'lba's, not raw offsets. calc_nsoff (which lba_to_nsoff
calls internally), is a 'last stage' calculation, essentially just to
add the initial_offset. Direct callers of calc_nsoff (i.e. functions
that access the log, map, or info blocks) do their offset calculations
'manually' based on arena->{map,log,info}off, but these offsets are
unaware of 'initial_offset', as they started out being relative to
'this' arena. So calc_nsoff is that last stage addition of
initial_offset.

I agree that if feels a bit clunky, but I'm not sure I see a clear way
to improve it..

> Also, I find something unreadable about calc_nsoff, and reading the
> prototype "u64 calc_nsoff(struct nd_btt *nd_btt, u64 relative_offset)"
> doesn't help clarify. What is 'relative_offset' relative to? Is it the
> base namespace / device offset?

Yeah, I cringed a bit too when I called it 'relative_offset'. But I
couldn't find a more descriptive name. It is 'almost' the raw nsoff,
with just the initial_offset calculation pending... Maybe
btt_relative_offset?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 6/7] libnvdimm, btt: refactor initial_offset calculations
@ 2017-08-14 19:47       ` Verma, Vishal L
  0 siblings, 0 replies; 22+ messages in thread
From: Verma, Vishal L @ 2017-08-14 19:47 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-nvdimm, toshi.kani, jmoyer, linux-acpi, Wysocki, Rafael J

On Fri, 2017-08-11 at 18:39 -0700, Dan Williams wrote:
> On Tue, Aug 8, 2017 at 5:07 PM, Vishal Verma <vishal.l.verma@intel.com
> > wrote:
> > In preparation for BTT error clearing, refactor the initial offset
> > calculations. Until now, all callers of arena_{read,write}_bytes
> > assumed
> > a relative offset to the arena, and it was later adjusted for the
> > initial offset. Until now, every time we calculated a relative
> > offset,
> > we passed it to these functions to do reads or writes, and so where
> > the
> > offset calculations happened didn't matter.
> > 
> > For error clearing, there will be places where we calculate offsets
> > to
> > check for the presence of media errors, and the above scheme becomes
> > error prone.
> > 
> > Make the initial_offset calculations explicit for all callers of
> > arena_{read,write}_bytes, and provide a helper to do them. The error
> > checking/clearing code can then use these same helpers.
> > 
> > Reported-by: Toshi Kani <toshi.kani@hpe.com>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  drivers/nvdimm/btt.c | 69 ++++++++++++++++++++++++++++++++---------
> > -----------
> >  1 file changed, 42 insertions(+), 27 deletions(-)
> > 
> > diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> > index e9dd651..9acf06b 100644
> > --- a/drivers/nvdimm/btt.c
> > +++ b/drivers/nvdimm/btt.c
> > @@ -31,14 +31,17 @@ enum log_ent_request {
> >         LOG_OLD_ENT
> >  };
> > 
> > +static inline u64 calc_nsoff(struct nd_btt *nd_btt, u64
> > relative_offset)
> > +{
> > +       return relative_offset + nd_btt->initial_offset;
> > +}
> > +
> >  static int arena_read_bytes(struct arena_info *arena,
> > resource_size_t offset,
> >                 void *buf, size_t n, unsigned long flags)
> >  {
> >         struct nd_btt *nd_btt = arena->nd_btt;
> >         struct nd_namespace_common *ndns = nd_btt->ndns;
> > 
> > -       /* arena offsets may be shifted from the base of the device
> > */
> > -       offset += arena->nd_btt->initial_offset;
> >         return nvdimm_read_bytes(ndns, offset, buf, n, flags);
> >  }
> > 
> > @@ -48,14 +51,13 @@ static int arena_write_bytes(struct arena_info
> > *arena, resource_size_t offset,
> >         struct nd_btt *nd_btt = arena->nd_btt;
> >         struct nd_namespace_common *ndns = nd_btt->ndns;
> > 
> > -       /* arena offsets may be shifted from the base of the device
> > */
> > -       offset += arena->nd_btt->initial_offset;
> >         return nvdimm_write_bytes(ndns, offset, buf, n, flags);
> >  }
> 
> So let's get rid of arena_{read,write}_bytes(). The only point of the
> wrapper was to include an arena specific offset, but now we're pushing
> all offset calculations to the caller so arena_{read,write}_bytes() is
> equivalent to nvdimm_{read,write}_bytes(). 

That sounds fine, though that will require all callers to now
dereference arena->nd->btt->ndns to do reads/writes.. Is that ok?

> However, since many of
> these call sites now perform a common offset calculations maybe we can
> provide a new helper for that to differentiate it from the error
> clearing case? 

Not sure I follow - the offset calculations for the error clearing use
and for data reads/writes are exactly the same (lba_to_nsoff). (See
further below)

> After patch 7 is applied it's no longer clear which
> offset calculation is used where and for what reason, there's just too
> many offset variables calculations and overload of the word 'offset'
> for bugs to hide.

Agreed on the ambiguity of 'offset'. The implicit rule I followed was
any kind of 'nsoff' will mean the final namespace offset, after
accounting for arenas, any initial offset etc. Any other 'offset' is
relative to the context in which it is used.

> 
> Maybe it's just late on a Friday and my eyes are going cross, but it
> seems you have 2 classes of use cases between calc_nsoff and
> lba_to_nsoff, perhaps separate read/write wrappers for those 2 cases?

The difference between the two is that lba_to_nsoff is used when were
talking about reading/writing a 'data' block, and where we are talking
in terms of 'lba's, not raw offsets. calc_nsoff (which lba_to_nsoff
calls internally), is a 'last stage' calculation, essentially just to
add the initial_offset. Direct callers of calc_nsoff (i.e. functions
that access the log, map, or info blocks) do their offset calculations
'manually' based on arena->{map,log,info}off, but these offsets are
unaware of 'initial_offset', as they started out being relative to
'this' arena. So calc_nsoff is that last stage addition of
initial_offset.

I agree that if feels a bit clunky, but I'm not sure I see a clear way
to improve it..

> Also, I find something unreadable about calc_nsoff, and reading the
> prototype "u64 calc_nsoff(struct nd_btt *nd_btt, u64 relative_offset)"
> doesn't help clarify. What is 'relative_offset' relative to? Is it the
> base namespace / device offset?

Yeah, I cringed a bit too when I called it 'relative_offset'. But I
couldn't find a more descriptive name. It is 'almost' the raw nsoff,
with just the initial_offset calculation pending... Maybe
btt_relative_offset?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 6/7] libnvdimm, btt: refactor initial_offset calculations
  2017-08-14 19:47       ` Verma, Vishal L
@ 2017-08-17  0:30         ` Dan Williams
  -1 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2017-08-17  0:30 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: linux-acpi, Wysocki, Rafael J  <rafael.j.wysocki@intel.com>,
	linux-nvdimm@lists.01.org

On Mon, Aug 14, 2017 at 12:47 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2017-08-11 at 18:39 -0700, Dan Williams wrote:
>> On Tue, Aug 8, 2017 at 5:07 PM, Vishal Verma <vishal.l.verma@intel.com
>> > wrote:
>> > In preparation for BTT error clearing, refactor the initial offset
>> > calculations. Until now, all callers of arena_{read,write}_bytes
>> > assumed
>> > a relative offset to the arena, and it was later adjusted for the
>> > initial offset. Until now, every time we calculated a relative
>> > offset,
>> > we passed it to these functions to do reads or writes, and so where
>> > the
>> > offset calculations happened didn't matter.
>> >
>> > For error clearing, there will be places where we calculate offsets
>> > to
>> > check for the presence of media errors, and the above scheme becomes
>> > error prone.
>> >
>> > Make the initial_offset calculations explicit for all callers of
>> > arena_{read,write}_bytes, and provide a helper to do them. The error
>> > checking/clearing code can then use these same helpers.
>> >
>> > Reported-by: Toshi Kani <toshi.kani@hpe.com>
>> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>> > ---
>> >  drivers/nvdimm/btt.c | 69 ++++++++++++++++++++++++++++++++---------
>> > -----------
>> >  1 file changed, 42 insertions(+), 27 deletions(-)
>> >
>> > diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
>> > index e9dd651..9acf06b 100644
>> > --- a/drivers/nvdimm/btt.c
>> > +++ b/drivers/nvdimm/btt.c
>> > @@ -31,14 +31,17 @@ enum log_ent_request {
>> >         LOG_OLD_ENT
>> >  };
>> >
>> > +static inline u64 calc_nsoff(struct nd_btt *nd_btt, u64
>> > relative_offset)
>> > +{
>> > +       return relative_offset + nd_btt->initial_offset;
>> > +}
>> > +
>> >  static int arena_read_bytes(struct arena_info *arena,
>> > resource_size_t offset,
>> >                 void *buf, size_t n, unsigned long flags)
>> >  {
>> >         struct nd_btt *nd_btt = arena->nd_btt;
>> >         struct nd_namespace_common *ndns = nd_btt->ndns;
>> >
>> > -       /* arena offsets may be shifted from the base of the device
>> > */
>> > -       offset += arena->nd_btt->initial_offset;
>> >         return nvdimm_read_bytes(ndns, offset, buf, n, flags);
>> >  }
>> >
>> > @@ -48,14 +51,13 @@ static int arena_write_bytes(struct arena_info
>> > *arena, resource_size_t offset,
>> >         struct nd_btt *nd_btt = arena->nd_btt;
>> >         struct nd_namespace_common *ndns = nd_btt->ndns;
>> >
>> > -       /* arena offsets may be shifted from the base of the device
>> > */
>> > -       offset += arena->nd_btt->initial_offset;
>> >         return nvdimm_write_bytes(ndns, offset, buf, n, flags);
>> >  }
>>
>> So let's get rid of arena_{read,write}_bytes(). The only point of the
>> wrapper was to include an arena specific offset, but now we're pushing
>> all offset calculations to the caller so arena_{read,write}_bytes() is
>> equivalent to nvdimm_{read,write}_bytes().
>
> That sounds fine, though that will require all callers to now
> dereference arena->nd->btt->ndns to do reads/writes.. Is that ok?

Sure, just put that in a to_ndns() helper.

>> However, since many of
>> these call sites now perform a common offset calculations maybe we can
>> provide a new helper for that to differentiate it from the error
>> clearing case?
>
> Not sure I follow - the offset calculations for the error clearing use
> and for data reads/writes are exactly the same (lba_to_nsoff). (See
> further below)
>
>> After patch 7 is applied it's no longer clear which
>> offset calculation is used where and for what reason, there's just too
>> many offset variables calculations and overload of the word 'offset'
>> for bugs to hide.
>
> Agreed on the ambiguity of 'offset'. The implicit rule I followed was
> any kind of 'nsoff' will mean the final namespace offset, after
> accounting for arenas, any initial offset etc. Any other 'offset' is
> relative to the context in which it is used.
>
>>
>> Maybe it's just late on a Friday and my eyes are going cross, but it
>> seems you have 2 classes of use cases between calc_nsoff and
>> lba_to_nsoff, perhaps separate read/write wrappers for those 2 cases?
>
> The difference between the two is that lba_to_nsoff is used when were
> talking about reading/writing a 'data' block, and where we are talking
> in terms of 'lba's, not raw offsets. calc_nsoff (which lba_to_nsoff
> calls internally), is a 'last stage' calculation, essentially just to
> add the initial_offset. Direct callers of calc_nsoff (i.e. functions
> that access the log, map, or info blocks) do their offset calculations
> 'manually' based on arena->{map,log,info}off, but these offsets are
> unaware of 'initial_offset', as they started out being relative to
> 'this' arena. So calc_nsoff is that last stage addition of
> initial_offset.
>
> I agree that if feels a bit clunky, but I'm not sure I see a clear way
> to improve it..

Since calc_nsoff is "last stage" and required I think it should be
hidden in the helper that does the read write. Why does it need to be
manually called by every read/write call site?

>> Also, I find something unreadable about calc_nsoff, and reading the
>> prototype "u64 calc_nsoff(struct nd_btt *nd_btt, u64 relative_offset)"
>> doesn't help clarify. What is 'relative_offset' relative to? Is it the
>> base namespace / device offset?
>
> Yeah, I cringed a bit too when I called it 'relative_offset'. But I
> couldn't find a more descriptive name. It is 'almost' the raw nsoff,
> with just the initial_offset calculation pending... Maybe
> btt_relative_offset?

I think this gets easier if the consumer of this ambiguity is reduced
to just the 2 leaf routines that do ->rw_bytes().
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 6/7] libnvdimm, btt: refactor initial_offset calculations
@ 2017-08-17  0:30         ` Dan Williams
  0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2017-08-17  0:30 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: linux-nvdimm, toshi.kani, jmoyer, linux-acpi, Wysocki, Rafael J

On Mon, Aug 14, 2017 at 12:47 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2017-08-11 at 18:39 -0700, Dan Williams wrote:
>> On Tue, Aug 8, 2017 at 5:07 PM, Vishal Verma <vishal.l.verma@intel.com
>> > wrote:
>> > In preparation for BTT error clearing, refactor the initial offset
>> > calculations. Until now, all callers of arena_{read,write}_bytes
>> > assumed
>> > a relative offset to the arena, and it was later adjusted for the
>> > initial offset. Until now, every time we calculated a relative
>> > offset,
>> > we passed it to these functions to do reads or writes, and so where
>> > the
>> > offset calculations happened didn't matter.
>> >
>> > For error clearing, there will be places where we calculate offsets
>> > to
>> > check for the presence of media errors, and the above scheme becomes
>> > error prone.
>> >
>> > Make the initial_offset calculations explicit for all callers of
>> > arena_{read,write}_bytes, and provide a helper to do them. The error
>> > checking/clearing code can then use these same helpers.
>> >
>> > Reported-by: Toshi Kani <toshi.kani@hpe.com>
>> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>> > ---
>> >  drivers/nvdimm/btt.c | 69 ++++++++++++++++++++++++++++++++---------
>> > -----------
>> >  1 file changed, 42 insertions(+), 27 deletions(-)
>> >
>> > diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
>> > index e9dd651..9acf06b 100644
>> > --- a/drivers/nvdimm/btt.c
>> > +++ b/drivers/nvdimm/btt.c
>> > @@ -31,14 +31,17 @@ enum log_ent_request {
>> >         LOG_OLD_ENT
>> >  };
>> >
>> > +static inline u64 calc_nsoff(struct nd_btt *nd_btt, u64
>> > relative_offset)
>> > +{
>> > +       return relative_offset + nd_btt->initial_offset;
>> > +}
>> > +
>> >  static int arena_read_bytes(struct arena_info *arena,
>> > resource_size_t offset,
>> >                 void *buf, size_t n, unsigned long flags)
>> >  {
>> >         struct nd_btt *nd_btt = arena->nd_btt;
>> >         struct nd_namespace_common *ndns = nd_btt->ndns;
>> >
>> > -       /* arena offsets may be shifted from the base of the device
>> > */
>> > -       offset += arena->nd_btt->initial_offset;
>> >         return nvdimm_read_bytes(ndns, offset, buf, n, flags);
>> >  }
>> >
>> > @@ -48,14 +51,13 @@ static int arena_write_bytes(struct arena_info
>> > *arena, resource_size_t offset,
>> >         struct nd_btt *nd_btt = arena->nd_btt;
>> >         struct nd_namespace_common *ndns = nd_btt->ndns;
>> >
>> > -       /* arena offsets may be shifted from the base of the device
>> > */
>> > -       offset += arena->nd_btt->initial_offset;
>> >         return nvdimm_write_bytes(ndns, offset, buf, n, flags);
>> >  }
>>
>> So let's get rid of arena_{read,write}_bytes(). The only point of the
>> wrapper was to include an arena specific offset, but now we're pushing
>> all offset calculations to the caller so arena_{read,write}_bytes() is
>> equivalent to nvdimm_{read,write}_bytes().
>
> That sounds fine, though that will require all callers to now
> dereference arena->nd->btt->ndns to do reads/writes.. Is that ok?

Sure, just put that in a to_ndns() helper.

>> However, since many of
>> these call sites now perform a common offset calculations maybe we can
>> provide a new helper for that to differentiate it from the error
>> clearing case?
>
> Not sure I follow - the offset calculations for the error clearing use
> and for data reads/writes are exactly the same (lba_to_nsoff). (See
> further below)
>
>> After patch 7 is applied it's no longer clear which
>> offset calculation is used where and for what reason, there's just too
>> many offset variables calculations and overload of the word 'offset'
>> for bugs to hide.
>
> Agreed on the ambiguity of 'offset'. The implicit rule I followed was
> any kind of 'nsoff' will mean the final namespace offset, after
> accounting for arenas, any initial offset etc. Any other 'offset' is
> relative to the context in which it is used.
>
>>
>> Maybe it's just late on a Friday and my eyes are going cross, but it
>> seems you have 2 classes of use cases between calc_nsoff and
>> lba_to_nsoff, perhaps separate read/write wrappers for those 2 cases?
>
> The difference between the two is that lba_to_nsoff is used when were
> talking about reading/writing a 'data' block, and where we are talking
> in terms of 'lba's, not raw offsets. calc_nsoff (which lba_to_nsoff
> calls internally), is a 'last stage' calculation, essentially just to
> add the initial_offset. Direct callers of calc_nsoff (i.e. functions
> that access the log, map, or info blocks) do their offset calculations
> 'manually' based on arena->{map,log,info}off, but these offsets are
> unaware of 'initial_offset', as they started out being relative to
> 'this' arena. So calc_nsoff is that last stage addition of
> initial_offset.
>
> I agree that if feels a bit clunky, but I'm not sure I see a clear way
> to improve it..

Since calc_nsoff is "last stage" and required I think it should be
hidden in the helper that does the read write. Why does it need to be
manually called by every read/write call site?

>> Also, I find something unreadable about calc_nsoff, and reading the
>> prototype "u64 calc_nsoff(struct nd_btt *nd_btt, u64 relative_offset)"
>> doesn't help clarify. What is 'relative_offset' relative to? Is it the
>> base namespace / device offset?
>
> Yeah, I cringed a bit too when I called it 'relative_offset'. But I
> couldn't find a more descriptive name. It is 'almost' the raw nsoff,
> with just the initial_offset calculation pending... Maybe
> btt_relative_offset?

I think this gets easier if the consumer of this ambiguity is reduced
to just the 2 leaf routines that do ->rw_bytes().

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-08-17  0:30 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-09  0:07 [PATCH v5 0/7] BTT error clearing rework Vishal Verma
2017-08-09  0:07 ` Vishal Verma
2017-08-09  0:07 ` [PATCH v5 1/7] btt: fix a missed NVDIMM_IO_ATOMIC case in the write path Vishal Verma
2017-08-09  0:07   ` Vishal Verma
2017-08-09  0:07 ` [PATCH v5 2/7] btt: refactor map entry operations with macros Vishal Verma
2017-08-09  0:07   ` Vishal Verma
2017-08-09  0:07 ` [PATCH v5 3/7] btt: ensure that flags were also unchanged during a map_read Vishal Verma
2017-08-09  0:07   ` Vishal Verma
2017-08-09  0:07 ` [PATCH v5 4/7] btt: cache sector_size in arena_info Vishal Verma
2017-08-09  0:07   ` Vishal Verma
2017-08-09  0:07 ` [PATCH v5 5/7] libnvdimm: fix potential deadlock while clearing errors Vishal Verma
2017-08-09  0:07   ` Vishal Verma
2017-08-09  0:07 ` [PATCH v5 6/7] libnvdimm, btt: refactor initial_offset calculations Vishal Verma
2017-08-09  0:07   ` Vishal Verma
2017-08-12  1:39   ` Dan Williams
2017-08-12  1:39     ` Dan Williams
2017-08-14 19:47     ` Verma, Vishal L
2017-08-14 19:47       ` Verma, Vishal L
2017-08-17  0:30       ` Dan Williams
2017-08-17  0:30         ` Dan Williams
2017-08-09  0:07 ` [PATCH v5 7/7] libnvdimm, btt: rework error clearing Vishal Verma
2017-08-09  0:07   ` Vishal Verma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.