linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] lightnvm: bugfixes and improvements
@ 2019-02-27 17:14 Igor Konopko
  2019-02-27 17:14 ` [PATCH 01/13] lightnvm: pblk: Line reference fix in GC Igor Konopko
                   ` (13 more replies)
  0 siblings, 14 replies; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This series provides a group of the bugfixes
or improvements for lightnvm and pblk device.

Most of the patches are rather simple and covers
some corner cases scenario, but we were able to hit
most of them in some scenarios. Few others close some
existing gaps which we were able to found.

Fedback is appreciated.

Igor Konopko (13):
  lightnvm: pblk: Line reference fix in GC
  lightnvm: pblk: Gracefully handle GC data malloc fail
  lightnvm: pblk: Fix put line back behaviour
  lightnvm: pblk: Rollback in gc read
  lightnvm: pblk: Count all read errors in stats
  lightnvm: pblk: Ensure that erase is chunk aligned
  lightnvm: pblk: Cleanly fail when there is not enough memory
  lightnvm: pblk: Set proper read stutus in bio
  lightnvm: pblk: Kick writer for flush requests
  lightnvm: pblk: Reduce L2P DRAM footprint
  lightnvm: pblk: Remove unused smeta_ssec field
  lightnvm: pblk: close opened chunks
  lightnvm: Inherit mdts from the parent nvme device

 drivers/lightnvm/core.c          |   9 ++-
 drivers/lightnvm/pblk-core.c     | 128 +++++++++++++++++++++++++++++--
 drivers/lightnvm/pblk-gc.c       |  47 +++++++-----
 drivers/lightnvm/pblk-init.c     |  30 ++++++--
 drivers/lightnvm/pblk-map.c      |   2 +
 drivers/lightnvm/pblk-read.c     |  13 ++--
 drivers/lightnvm/pblk-recovery.c |   2 +-
 drivers/lightnvm/pblk.h          |   4 +-
 drivers/nvme/host/lightnvm.c     |   1 +
 include/linux/lightnvm.h         |   1 +
 10 files changed, 193 insertions(+), 44 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 01/13] lightnvm: pblk: Line reference fix in GC
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
@ 2019-02-27 17:14 ` Igor Konopko
  2019-03-01 12:20   ` Hans Holmberg
                     ` (2 more replies)
  2019-02-27 17:14 ` [PATCH 02/13] lightnvm: pblk: Gracefully handle GC data malloc fail Igor Konopko
                   ` (12 subsequent siblings)
  13 siblings, 3 replies; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This patch fixes the error case in GC when
we both moves line back to closed state and
release additional reference, what cause illegal
transition from closed to free on pblk_line_put
when only gc to free line state transition is
allowed using that path.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-gc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index 2fa118c8eb71..3feadfd9418d 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -290,8 +290,11 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 fail_free_ws:
 	kfree(line_ws);
 
+	/* Line goes back to closed state, so we cannot release additional
+	 * reference for line, since we do that only when we want to do
+	 * gc to free line state transition.
+	 */
 	pblk_put_line_back(pblk, line);
-	kref_put(&line->ref, pblk_line_put);
 	atomic_dec(&gc->read_inflight_gc);
 
 	pblk_err(pblk, "failed to GC line %d\n", line->id);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 02/13] lightnvm: pblk: Gracefully handle GC data malloc fail
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
  2019-02-27 17:14 ` [PATCH 01/13] lightnvm: pblk: Line reference fix in GC Igor Konopko
@ 2019-02-27 17:14 ` Igor Konopko
  2019-02-28 17:08   ` Javier González
  2019-02-27 17:14 ` [PATCH 03/13] lightnvm: pblk: Fix put line back behaviour Igor Konopko
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

Currently when we fail on gc rq data allocation
we simply skip the data which we wanted to move
and finally move the line to free state and lose
that data due to that. This patch move the data
allocation to some earlier phase of GC, where we
can still fail gracefully by moving line back
to closed state.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-gc.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index 3feadfd9418d..31fc1339faa8 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -84,8 +84,6 @@ static void pblk_gc_line_ws(struct work_struct *work)
 	struct pblk_line_ws *gc_rq_ws = container_of(work,
 						struct pblk_line_ws, ws);
 	struct pblk *pblk = gc_rq_ws->pblk;
-	struct nvm_tgt_dev *dev = pblk->dev;
-	struct nvm_geo *geo = &dev->geo;
 	struct pblk_gc *gc = &pblk->gc;
 	struct pblk_line *line = gc_rq_ws->line;
 	struct pblk_gc_rq *gc_rq = gc_rq_ws->priv;
@@ -93,13 +91,6 @@ static void pblk_gc_line_ws(struct work_struct *work)
 
 	up(&gc->gc_sem);
 
-	gc_rq->data = vmalloc(array_size(gc_rq->nr_secs, geo->csecs));
-	if (!gc_rq->data) {
-		pblk_err(pblk, "could not GC line:%d (%d/%d)\n",
-					line->id, *line->vsc, gc_rq->nr_secs);
-		goto out;
-	}
-
 	/* Read from GC victim block */
 	ret = pblk_submit_read_gc(pblk, gc_rq);
 	if (ret) {
@@ -189,6 +180,8 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 	struct pblk_line *line = line_ws->line;
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct pblk_line_meta *lm = &pblk->lm;
+	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
 	struct pblk_gc *gc = &pblk->gc;
 	struct pblk_line_ws *gc_rq_ws;
 	struct pblk_gc_rq *gc_rq;
@@ -247,9 +240,13 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 	gc_rq->nr_secs = nr_secs;
 	gc_rq->line = line;
 
+	gc_rq->data = vmalloc(gc_rq->nr_secs * geo->csecs);
+	if (!gc_rq->data)
+		goto fail_free_gc_rq;
+
 	gc_rq_ws = kmalloc(sizeof(struct pblk_line_ws), GFP_KERNEL);
 	if (!gc_rq_ws)
-		goto fail_free_gc_rq;
+		goto fail_free_gc_data;
 
 	gc_rq_ws->pblk = pblk;
 	gc_rq_ws->line = line;
@@ -281,6 +278,8 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 
 	return;
 
+fail_free_gc_data:
+	vfree(gc_rq->data);
 fail_free_gc_rq:
 	kfree(gc_rq);
 fail_free_lba_list:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 03/13] lightnvm: pblk: Fix put line back behaviour
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
  2019-02-27 17:14 ` [PATCH 01/13] lightnvm: pblk: Line reference fix in GC Igor Konopko
  2019-02-27 17:14 ` [PATCH 02/13] lightnvm: pblk: Gracefully handle GC data malloc fail Igor Konopko
@ 2019-02-27 17:14 ` Igor Konopko
  2019-03-01 13:27   ` Hans Holmberg
  2019-03-04  7:22   ` Javier González
  2019-02-27 17:14 ` [PATCH 04/13] lightnvm: pblk: Rollback in gc read Igor Konopko
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

In current implementation of pblk_put_line_back behaviour
there are two cases which are not handled.

First one is the race condition with __pblk_map_invalidate in
which function we check for line state, which might be closed,
but still not added to any list and thus explode in list_move_tail.
This is due to lack of locking both gc_lock and line lock
in pblk_put_line_back current implementation.

The second issue is that when we are in that function, line
is not on any list and pblk_line_gc_list might hit the same
gc group and thus not return any move_list. Then our line
will stuck forever unassigned to any list. Simply resetting
gc_list to none will fix that.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-gc.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index 31fc1339faa8..511ed0d5333c 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -64,19 +64,23 @@ static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct list_head *move_list;
 
+	spin_lock(&l_mg->gc_lock);
 	spin_lock(&line->lock);
 	WARN_ON(line->state != PBLK_LINESTATE_GC);
 	line->state = PBLK_LINESTATE_CLOSED;
 	trace_pblk_line_state(pblk_disk_name(pblk), line->id,
 					line->state);
+
+	/* We need to reset gc_group in order to ensure that
+	 * pblk_line_gc_list will return proper move_list
+	 * since right now current line is not on any of the
+	 * gc lists.
+	 */
+	line->gc_group = PBLK_LINEGC_NONE;
 	move_list = pblk_line_gc_list(pblk, line);
 	spin_unlock(&line->lock);
-
-	if (move_list) {
-		spin_lock(&l_mg->gc_lock);
-		list_add_tail(&line->list, move_list);
-		spin_unlock(&l_mg->gc_lock);
-	}
+	list_add_tail(&line->list, move_list);
+	spin_unlock(&l_mg->gc_lock);
 }
 
 static void pblk_gc_line_ws(struct work_struct *work)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 04/13] lightnvm: pblk: Rollback in gc read
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
                   ` (2 preceding siblings ...)
  2019-02-27 17:14 ` [PATCH 03/13] lightnvm: pblk: Fix put line back behaviour Igor Konopko
@ 2019-02-27 17:14 ` Igor Konopko
  2019-03-04  7:38   ` Javier González
  2019-03-04 12:49   ` Matias Bjørling
  2019-02-27 17:14 ` [PATCH 05/13] lightnvm: pblk: Count all read errors in stats Igor Konopko
                   ` (9 subsequent siblings)
  13 siblings, 2 replies; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

Currently in case of error returned by pblk_gc_line
to pblk_gc_read we leave current line unassigned
from all the lists. This patch fixes that issue.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-gc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index 511ed0d5333c..533da6ea3e15 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -361,8 +361,13 @@ static int pblk_gc_read(struct pblk *pblk)
 
 	pblk_gc_kick(pblk);
 
-	if (pblk_gc_line(pblk, line))
+	if (pblk_gc_line(pblk, line)) {
 		pblk_err(pblk, "failed to GC line %d\n", line->id);
+		/* rollback */
+		spin_lock(&gc->r_lock);
+		list_add_tail(&line->list, &gc->r_list);
+		spin_unlock(&gc->r_lock);
+	}
 
 	return 0;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 05/13] lightnvm: pblk: Count all read errors in stats
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
                   ` (3 preceding siblings ...)
  2019-02-27 17:14 ` [PATCH 04/13] lightnvm: pblk: Rollback in gc read Igor Konopko
@ 2019-02-27 17:14 ` Igor Konopko
  2019-03-04  7:42   ` Javier González
  2019-02-27 17:14 ` [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned Igor Konopko
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

Currently when unknown error occurs on read path
there is only dmesg information about it, but it
is not counted in sysfs statistics. Since this is
still an error we should also count it there.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index eabcbc119681..a98b2255f963 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -493,6 +493,7 @@ void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd)
 		atomic_long_inc(&pblk->read_failed);
 		break;
 	default:
+		atomic_long_inc(&pblk->read_failed);
 		pblk_err(pblk, "unknown read error:%d\n", rqd->error);
 	}
 #ifdef CONFIG_NVM_PBLK_DEBUG
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
                   ` (4 preceding siblings ...)
  2019-02-27 17:14 ` [PATCH 05/13] lightnvm: pblk: Count all read errors in stats Igor Konopko
@ 2019-02-27 17:14 ` Igor Konopko
  2019-03-04  7:48   ` Javier González
  2019-02-27 17:14 ` [PATCH 07/13] lightnvm: pblk: Cleanly fail when there is not enough memory Igor Konopko
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

In current pblk implementation of erase command
there is a chance tha sector bits are set to some
random values for erase PPA. This is unexpected
situation, since erase shall be always chunk
aligned. This patch fixes that issue

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c | 1 +
 drivers/lightnvm/pblk-map.c  | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index a98b2255f963..78b1eea4ab67 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -978,6 +978,7 @@ int pblk_line_erase(struct pblk *pblk, struct pblk_line *line)
 
 		ppa = pblk->luns[bit].bppa; /* set ch and lun */
 		ppa.a.blk = line->id;
+		ppa.a.reserved = 0;
 
 		atomic_dec(&line->left_eblks);
 		WARN_ON(test_and_set_bit(bit, line->erase_bitmap));
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 79df583ea709..aea46b4ec40f 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -161,6 +161,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
 
 			*erase_ppa = ppa_list[i];
 			erase_ppa->a.blk = e_line->id;
+			erase_ppa->a.reserved = 0;
 
 			spin_unlock(&e_line->lock);
 
@@ -202,6 +203,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
 		atomic_dec(&e_line->left_eblks);
 		*erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
 		erase_ppa->a.blk = e_line->id;
+		erase_ppa->a.reserved = 0;
 	}
 
 	return 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 07/13] lightnvm: pblk: Cleanly fail when there is not enough memory
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
                   ` (5 preceding siblings ...)
  2019-02-27 17:14 ` [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned Igor Konopko
@ 2019-02-27 17:14 ` Igor Konopko
  2019-03-04  7:53   ` Javier González
  2019-02-27 17:14 ` [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio Igor Konopko
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

L2P table can be huge in many cases, since
it typically requires 1GB of DRAM for 1TB
of drive. When there is not enough memory
available, OOM killer turns on and kills
random processes, which can be very annoying
for users. This patch changes the flag for
L2P table allocation on order to handle this
situation in more user friendly way

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-init.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 8b643d0bffae..e553105b7ba1 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -164,9 +164,14 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
 	int ret = 0;
 
 	map_size = pblk_trans_map_size(pblk);
-	pblk->trans_map = vmalloc(map_size);
-	if (!pblk->trans_map)
+	pblk->trans_map = __vmalloc(map_size, GFP_KERNEL | __GFP_NOWARN
+					| __GFP_RETRY_MAYFAIL | __GFP_HIGHMEM,
+					PAGE_KERNEL);
+	if (!pblk->trans_map) {
+		pblk_err(pblk, "failed to allocate L2P (need %ld of memory)\n",
+				map_size);
 		return -ENOMEM;
+	}
 
 	pblk_ppa_set_empty(&ppa);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
                   ` (6 preceding siblings ...)
  2019-02-27 17:14 ` [PATCH 07/13] lightnvm: pblk: Cleanly fail when there is not enough memory Igor Konopko
@ 2019-02-27 17:14 ` Igor Konopko
  2019-03-04  8:03   ` Javier González
  2019-02-27 17:14 ` [PATCH 09/13] lightnvm: pblk: Kick writer for flush requests Igor Konopko
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

Currently in case of read errors, bi_status is not
set properly which leads to returning inproper data
to higher layer. This patch fix that by setting proper
status in case of read errors

Patch also removes unnecessary warn_once(), which does
not make sense in that place, since user bio is not used
for interation with drive and thus bi_status will not be
set here.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-read.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 3789185144da..39c1d6ccaedb 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
 	WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
 }
 
-static void pblk_end_user_read(struct bio *bio)
+static void pblk_end_user_read(struct bio *bio, int error)
 {
-#ifdef CONFIG_NVM_PBLK_DEBUG
-	WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
-#endif
+	if (error && error != NVM_RSP_WARN_HIGHECC)
+		bio_io_error(bio);
 	bio_endio(bio);
 }
 
@@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
 	struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
 	struct bio *bio = (struct bio *)r_ctx->private;
 
-	pblk_end_user_read(bio);
+	pblk_end_user_read(bio, rqd->error);
 	__pblk_end_io_read(pblk, rqd, true);
 }
 
@@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
 	rqd->bio = NULL;
 	rqd->nr_ppas = nr_secs;
 
-	bio_endio(bio);
+	pblk_end_user_read(bio, rqd->error);
 	__pblk_end_io_read(pblk, rqd, false);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 09/13] lightnvm: pblk: Kick writer for flush requests
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
                   ` (7 preceding siblings ...)
  2019-02-27 17:14 ` [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio Igor Konopko
@ 2019-02-27 17:14 ` Igor Konopko
  2019-03-04  8:08   ` Javier González
  2019-02-27 17:14 ` [PATCH 10/13] lightnvm: pblk: Reduce L2P DRAM footprint Igor Konopko
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

In case when there is no enough sector available in rwb
and there is flush request send we should kick write thread
which is not a case in current implementation. This patch
fixes that issue.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 78b1eea4ab67..f48f2e77f770 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -375,8 +375,9 @@ void pblk_write_timer_fn(struct timer_list *t)
 void pblk_write_should_kick(struct pblk *pblk)
 {
 	unsigned int secs_avail = pblk_rb_read_count(&pblk->rwb);
+	unsigned int secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
 
-	if (secs_avail >= pblk->min_write_pgs_data)
+	if (secs_avail >= pblk->min_write_pgs_data || secs_to_flush)
 		pblk_write_kick(pblk);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 10/13] lightnvm: pblk: Reduce L2P DRAM footprint
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
                   ` (8 preceding siblings ...)
  2019-02-27 17:14 ` [PATCH 09/13] lightnvm: pblk: Kick writer for flush requests Igor Konopko
@ 2019-02-27 17:14 ` Igor Konopko
  2019-03-04  8:17   ` Javier González
  2019-03-04 13:11   ` Matias Bjørling
  2019-02-27 17:14 ` [PATCH 11/13] lightnvm: pblk: Remove unused smeta_ssec field Igor Konopko
                   ` (3 subsequent siblings)
  13 siblings, 2 replies; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

Currently L2P map size is calculated based on
the total number of available sectors, which is
redundant, since it contains some number of
over provisioning (11% by default).

The goal of this patch is to change this size
to the real capacity and thus reduce the DRAM
footprint significantly - with default op value
it is approx. 110MB of DRAM less for every 1TB
of pblk drive.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c     | 8 ++++----
 drivers/lightnvm/pblk-init.c     | 7 +++----
 drivers/lightnvm/pblk-read.c     | 2 +-
 drivers/lightnvm/pblk-recovery.c | 2 +-
 drivers/lightnvm/pblk.h          | 1 -
 5 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index f48f2e77f770..2e424c0275c1 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -2024,7 +2024,7 @@ void pblk_update_map(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
 	struct ppa_addr ppa_l2p;
 
 	/* logic error: lba out-of-bounds. Ignore update */
-	if (!(lba < pblk->rl.nr_secs)) {
+	if (!(lba < pblk->capacity)) {
 		WARN(1, "pblk: corrupted L2P map request\n");
 		return;
 	}
@@ -2064,7 +2064,7 @@ int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa_new,
 #endif
 
 	/* logic error: lba out-of-bounds. Ignore update */
-	if (!(lba < pblk->rl.nr_secs)) {
+	if (!(lba < pblk->capacity)) {
 		WARN(1, "pblk: corrupted L2P map request\n");
 		return 0;
 	}
@@ -2110,7 +2110,7 @@ void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
 	}
 
 	/* logic error: lba out-of-bounds. Ignore update */
-	if (!(lba < pblk->rl.nr_secs)) {
+	if (!(lba < pblk->capacity)) {
 		WARN(1, "pblk: corrupted L2P map request\n");
 		return;
 	}
@@ -2168,7 +2168,7 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
 		lba = lba_list[i];
 		if (lba != ADDR_EMPTY) {
 			/* logic error: lba out-of-bounds. Ignore update */
-			if (!(lba < pblk->rl.nr_secs)) {
+			if (!(lba < pblk->capacity)) {
 				WARN(1, "pblk: corrupted L2P map request\n");
 				continue;
 			}
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index e553105b7ba1..9913a4514eb6 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -105,7 +105,7 @@ static size_t pblk_trans_map_size(struct pblk *pblk)
 	if (pblk->addrf_len < 32)
 		entry_size = 4;
 
-	return entry_size * pblk->rl.nr_secs;
+	return entry_size * pblk->capacity;
 }
 
 #ifdef CONFIG_NVM_PBLK_DEBUG
@@ -175,7 +175,7 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
 
 	pblk_ppa_set_empty(&ppa);
 
-	for (i = 0; i < pblk->rl.nr_secs; i++)
+	for (i = 0; i < pblk->capacity; i++)
 		pblk_trans_map_set(pblk, i, ppa);
 
 	ret = pblk_l2p_recover(pblk, factory_init);
@@ -706,7 +706,6 @@ static int pblk_set_provision(struct pblk *pblk, int nr_free_chks)
 	 * on user capacity consider only provisioned blocks
 	 */
 	pblk->rl.total_blocks = nr_free_chks;
-	pblk->rl.nr_secs = nr_free_chks * geo->clba;
 
 	/* Consider sectors used for metadata */
 	sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
@@ -1289,7 +1288,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
 
 	pblk_info(pblk, "luns:%u, lines:%d, secs:%llu, buf entries:%u\n",
 			geo->all_luns, pblk->l_mg.nr_lines,
-			(unsigned long long)pblk->rl.nr_secs,
+			(unsigned long long)pblk->capacity,
 			pblk->rwb.nr_entries);
 
 	wake_up_process(pblk->writer_ts);
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 39c1d6ccaedb..65697463def8 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -561,7 +561,7 @@ static int read_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
 		goto out;
 
 	/* logic error: lba out-of-bounds */
-	if (lba >= pblk->rl.nr_secs) {
+	if (lba >= pblk->capacity) {
 		WARN(1, "pblk: read lba out of bounds\n");
 		goto out;
 	}
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index d86f580036d3..83b467b5edc7 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -474,7 +474,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 
 		lba_list[paddr++] = cpu_to_le64(lba);
 
-		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
+		if (lba == ADDR_EMPTY || lba >= pblk->capacity)
 			continue;
 
 		line->nr_valid_lbas++;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index a6386d5acd73..a92377530930 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -305,7 +305,6 @@ struct pblk_rl {
 
 	struct timer_list u_timer;
 
-	unsigned long long nr_secs;
 	unsigned long total_blocks;
 
 	atomic_t free_blocks;		/* Total number of free blocks (+ OP) */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 11/13] lightnvm: pblk: Remove unused smeta_ssec field
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
                   ` (9 preceding siblings ...)
  2019-02-27 17:14 ` [PATCH 10/13] lightnvm: pblk: Reduce L2P DRAM footprint Igor Konopko
@ 2019-02-27 17:14 ` Igor Konopko
  2019-03-04  8:21   ` Javier González
  2019-02-27 17:14 ` [PATCH 12/13] lightnvm: pblk: close opened chunks Igor Konopko
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

Smeta_ssec field in pblk_line is set once and
never used, since it was replaced by function
pblk_line_smeta_start(). This patch removes
this no longer needed field

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c | 1 -
 drivers/lightnvm/pblk.h      | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 2e424c0275c1..fa4dc05608ff 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1165,7 +1165,6 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
 	off = bit * geo->ws_opt;
 	bitmap_set(line->map_bitmap, off, lm->smeta_sec);
 	line->sec_in_line -= lm->smeta_sec;
-	line->smeta_ssec = off;
 	line->cur_sec = off + lm->smeta_sec;
 
 	if (init && pblk_line_smeta_write(pblk, line, off)) {
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index a92377530930..b266563508e6 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -464,7 +464,6 @@ struct pblk_line {
 	int meta_line;			/* Metadata line id */
 	int meta_distance;		/* Distance between data and metadata */
 
-	u64 smeta_ssec;			/* Sector where smeta starts */
 	u64 emeta_ssec;			/* Sector where emeta starts */
 
 	unsigned int sec_in_line;	/* Number of usable secs in line */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 12/13] lightnvm: pblk: close opened chunks
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
                   ` (10 preceding siblings ...)
  2019-02-27 17:14 ` [PATCH 11/13] lightnvm: pblk: Remove unused smeta_ssec field Igor Konopko
@ 2019-02-27 17:14 ` Igor Konopko
  2019-03-04  8:27   ` Javier González
  2019-02-27 17:14 ` [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device Igor Konopko
  2019-02-28 16:36 ` [PATCH 00/13] lightnvm: bugfixes and improvements Matias Bjørling
  13 siblings, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

When we creating pblk instance with factory
flag, there is a possibility that some chunks
are in open state, which does not allow to
issue erase request to them directly. Such a
chunk should be filled with some empty data in
order to achieve close state. Without that we
are risking that some erase operation will be
rejected by the drive due to inproper chunk
state.

This patch implements closing chunk logic in pblk
for that case, when creating instance with factory
flag in order to fix that issue

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c | 114 +++++++++++++++++++++++++++++++++++
 drivers/lightnvm/pblk-init.c |  14 ++++-
 drivers/lightnvm/pblk.h      |   2 +
 3 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index fa4dc05608ff..d3c45393f093 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -161,6 +161,120 @@ struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
 	return meta + ch_off + lun_off + chk_off;
 }
 
+static void pblk_close_chunk(struct pblk *pblk, struct ppa_addr ppa, int count)
+{
+	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
+	struct bio *bio;
+	struct ppa_addr *ppa_list;
+	struct nvm_rq rqd;
+	void *meta_list, *data;
+	dma_addr_t dma_meta_list, dma_ppa_list;
+	int i, rq_ppas, rq_len, ret;
+
+	meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
+	if (!meta_list)
+		return;
+
+	ppa_list = meta_list + pblk_dma_meta_size(pblk);
+	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
+
+	rq_ppas = pblk_calc_secs(pblk, count, 0, false);
+	if (!rq_ppas)
+		rq_ppas = pblk->min_write_pgs;
+	rq_len = rq_ppas * geo->csecs;
+
+	data = kzalloc(rq_len, GFP_KERNEL);
+	if (!data)
+		goto free_meta_list;
+
+	memset(&rqd, 0, sizeof(struct nvm_rq));
+	rqd.opcode = NVM_OP_PWRITE;
+	rqd.nr_ppas = rq_ppas;
+	rqd.meta_list = meta_list;
+	rqd.ppa_list = ppa_list;
+	rqd.dma_ppa_list = dma_ppa_list;
+	rqd.dma_meta_list = dma_meta_list;
+
+next_rq:
+	bio = bio_map_kern(dev->q, data, rq_len, GFP_KERNEL);
+	if (IS_ERR(bio))
+		goto out_next;
+
+	bio->bi_iter.bi_sector = 0; /* artificial bio */
+	bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
+
+	rqd.bio = bio;
+	for (i = 0; i < rqd.nr_ppas; i++) {
+		rqd.ppa_list[i] = ppa;
+		rqd.ppa_list[i].m.sec += i;
+		pblk_get_meta(pblk, meta_list, i)->lba =
+					cpu_to_le64(ADDR_EMPTY);
+	}
+
+	ret = nvm_submit_io_sync(dev, &rqd);
+	if (ret) {
+		bio_put(bio);
+		goto out_next;
+	}
+
+	if (rqd.error)
+		goto free_data;
+
+out_next:
+	count -= rqd.nr_ppas;
+	ppa.m.sec += rqd.nr_ppas;
+	if (count > 0)
+		goto next_rq;
+
+free_data:
+	kfree(data);
+free_meta_list:
+	nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
+}
+
+void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
+{
+	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
+	struct nvm_chk_meta *chunk_meta;
+	struct ppa_addr ppa;
+	int i, j, k, count;
+
+	for (i = 0; i < geo->num_chk; i++) {
+		for (j = 0; j < geo->num_lun; j++) {
+			for (k = 0; k < geo->num_ch; k++) {
+				ppa.ppa = 0;
+				ppa.m.grp = k;
+				ppa.m.pu = j;
+				ppa.m.chk = i;
+
+				chunk_meta = pblk_chunk_get_off(pblk,
+								meta, ppa);
+				if (chunk_meta->state == NVM_CHK_ST_OPEN) {
+					ppa.m.sec = chunk_meta->wp;
+					count = geo->clba - chunk_meta->wp;
+					pblk_close_chunk(pblk, ppa, count);
+				}
+			}
+		}
+	}
+}
+
+bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
+{
+	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
+	int i;
+
+	for (i = 0; i < geo->all_luns; i++) {
+		if (meta[i].state == NVM_CHK_ST_OPEN)
+			return true;
+	}
+
+	return false;
+}
+
 void __pblk_map_invalidate(struct pblk *pblk, struct pblk_line *line,
 			   u64 paddr)
 {
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 9913a4514eb6..83abe6960b46 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1028,13 +1028,14 @@ static int pblk_line_meta_init(struct pblk *pblk)
 	return 0;
 }
 
-static int pblk_lines_init(struct pblk *pblk)
+static int pblk_lines_init(struct pblk *pblk, bool factory_init)
 {
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct pblk_line *line;
 	void *chunk_meta;
 	int nr_free_chks = 0;
 	int i, ret;
+	bool retry = false;
 
 	ret = pblk_line_meta_init(pblk);
 	if (ret)
@@ -1048,12 +1049,21 @@ static int pblk_lines_init(struct pblk *pblk)
 	if (ret)
 		goto fail_free_meta;
 
+get_chunk_meta:
 	chunk_meta = pblk_get_chunk_meta(pblk);
 	if (IS_ERR(chunk_meta)) {
 		ret = PTR_ERR(chunk_meta);
 		goto fail_free_luns;
 	}
 
+	if (factory_init && !retry &&
+	    pblk_are_opened_chunks(pblk, chunk_meta)) {
+		pblk_close_opened_chunks(pblk, chunk_meta);
+		retry = true;
+		vfree(chunk_meta);
+		goto get_chunk_meta;
+	}
+
 	pblk->lines = kcalloc(l_mg->nr_lines, sizeof(struct pblk_line),
 								GFP_KERNEL);
 	if (!pblk->lines) {
@@ -1244,7 +1254,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
 		goto fail;
 	}
 
-	ret = pblk_lines_init(pblk);
+	ret = pblk_lines_init(pblk, flags & NVM_TARGET_FACTORY);
 	if (ret) {
 		pblk_err(pblk, "could not initialize lines\n");
 		goto fail_free_core;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index b266563508e6..b248642c4dfb 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -793,6 +793,8 @@ struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk);
 struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
 					      struct nvm_chk_meta *lp,
 					      struct ppa_addr ppa);
+void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
+bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
 void pblk_log_write_err(struct pblk *pblk, struct nvm_rq *rqd);
 void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd);
 int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
                   ` (11 preceding siblings ...)
  2019-02-27 17:14 ` [PATCH 12/13] lightnvm: pblk: close opened chunks Igor Konopko
@ 2019-02-27 17:14 ` Igor Konopko
  2019-03-04  9:05   ` Javier González
  2019-02-28 16:36 ` [PATCH 00/13] lightnvm: bugfixes and improvements Matias Bjørling
  13 siblings, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-02-27 17:14 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

Current lightnvm and pblk implementation does not care
about NVMe max data transfer size, which can be smaller
than 64*K=256K. This patch fixes issues related to that.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/core.c      | 9 +++++++--
 drivers/nvme/host/lightnvm.c | 1 +
 include/linux/lightnvm.h     | 1 +
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 5f82036fe322..c01f83b8fbaf 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
 	struct nvm_target *t;
 	struct nvm_tgt_dev *tgt_dev;
 	void *targetdata;
+	unsigned int mdts;
 	int ret;
 
 	switch (create->conf.type) {
@@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
 	tdisk->private_data = targetdata;
 	tqueue->queuedata = targetdata;
 
-	blk_queue_max_hw_sectors(tqueue,
-			(dev->geo.csecs >> 9) * NVM_MAX_VLBA);
+	mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
+	if (dev->geo.mdts) {
+		mdts = min_t(u32, dev->geo.mdts,
+				(dev->geo.csecs >> 9) * NVM_MAX_VLBA);
+	}
+	blk_queue_max_hw_sectors(tqueue, mdts);
 
 	set_capacity(tdisk, tt->capacity(targetdata));
 	add_disk(tdisk);
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index b759c25c89c8..b88a39a3cbd1 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
 	geo->csecs = 1 << ns->lba_shift;
 	geo->sos = ns->ms;
 	geo->ext = ns->ext;
+	geo->mdts = ns->ctrl->max_hw_sectors;
 
 	dev->q = q;
 	memcpy(dev->name, disk_name, DISK_NAME_LEN);
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index 5d865a5d5cdc..d3b02708e5f0 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -358,6 +358,7 @@ struct nvm_geo {
 	u16	csecs;		/* sector size */
 	u16	sos;		/* out-of-band area size */
 	bool	ext;		/* metadata in extended data buffer */
+	u32	mdts;		/* Max data transfer size*/
 
 	/* device write constrains */
 	u32	ws_min;		/* minimum write size */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH 00/13] lightnvm: bugfixes and improvements
  2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
                   ` (12 preceding siblings ...)
  2019-02-27 17:14 ` [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device Igor Konopko
@ 2019-02-28 16:36 ` Matias Bjørling
  2019-02-28 17:15   ` Javier González
  2019-03-01 10:23   ` Hans Holmberg
  13 siblings, 2 replies; 91+ messages in thread
From: Matias Bjørling @ 2019-02-28 16:36 UTC (permalink / raw)
  To: Igor Konopko, javier, hans.holmberg; +Cc: linux-block

On 2/27/19 6:14 PM, Igor Konopko wrote:
> This series provides a group of the bugfixes
> or improvements for lightnvm and pblk device.
> 
> Most of the patches are rather simple and covers
> some corner cases scenario, but we were able to hit
> most of them in some scenarios. Few others close some
> existing gaps which we were able to found.
> 
> Fedback is appreciated.
> 
> Igor Konopko (13):
>    lightnvm: pblk: Line reference fix in GC
>    lightnvm: pblk: Gracefully handle GC data malloc fail
>    lightnvm: pblk: Fix put line back behaviour
>    lightnvm: pblk: Rollback in gc read
>    lightnvm: pblk: Count all read errors in stats
>    lightnvm: pblk: Ensure that erase is chunk aligned
>    lightnvm: pblk: Cleanly fail when there is not enough memory
>    lightnvm: pblk: Set proper read stutus in bio
>    lightnvm: pblk: Kick writer for flush requests
>    lightnvm: pblk: Reduce L2P DRAM footprint
>    lightnvm: pblk: Remove unused smeta_ssec field
>    lightnvm: pblk: close opened chunks
>    lightnvm: Inherit mdts from the parent nvme device
> 
>   drivers/lightnvm/core.c          |   9 ++-
>   drivers/lightnvm/pblk-core.c     | 128 +++++++++++++++++++++++++++++--
>   drivers/lightnvm/pblk-gc.c       |  47 +++++++-----
>   drivers/lightnvm/pblk-init.c     |  30 ++++++--
>   drivers/lightnvm/pblk-map.c      |   2 +
>   drivers/lightnvm/pblk-read.c     |  13 ++--
>   drivers/lightnvm/pblk-recovery.c |   2 +-
>   drivers/lightnvm/pblk.h          |   4 +-
>   drivers/nvme/host/lightnvm.c     |   1 +
>   include/linux/lightnvm.h         |   1 +
>   10 files changed, 193 insertions(+), 44 deletions(-)
> 

Thanks Igor. I'll give Hans et. al. a couple of days to digest the 
changes.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/13] lightnvm: pblk: Gracefully handle GC data malloc fail
  2019-02-27 17:14 ` [PATCH 02/13] lightnvm: pblk: Gracefully handle GC data malloc fail Igor Konopko
@ 2019-02-28 17:08   ` Javier González
  2019-03-01 12:50     ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Javier González @ 2019-02-28 17:08 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 2776 bytes --]



> On 27 Feb 2019, at 12.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> Currently when we fail on gc rq data allocation
> we simply skip the data which we wanted to move
> and finally move the line to free state and lose
> that data due to that. This patch move the data
> allocation to some earlier phase of GC, where we
> can still fail gracefully by moving line back
> to closed state.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-gc.c | 19 +++++++++----------
> 1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index 3feadfd9418d..31fc1339faa8 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -84,8 +84,6 @@ static void pblk_gc_line_ws(struct work_struct *work)
> 	struct pblk_line_ws *gc_rq_ws = container_of(work,
> 						struct pblk_line_ws, ws);
> 	struct pblk *pblk = gc_rq_ws->pblk;
> -	struct nvm_tgt_dev *dev = pblk->dev;
> -	struct nvm_geo *geo = &dev->geo;
> 	struct pblk_gc *gc = &pblk->gc;
> 	struct pblk_line *line = gc_rq_ws->line;
> 	struct pblk_gc_rq *gc_rq = gc_rq_ws->priv;
> @@ -93,13 +91,6 @@ static void pblk_gc_line_ws(struct work_struct *work)
> 
> 	up(&gc->gc_sem);
> 
> -	gc_rq->data = vmalloc(array_size(gc_rq->nr_secs, geo->csecs));
> -	if (!gc_rq->data) {
> -		pblk_err(pblk, "could not GC line:%d (%d/%d)\n",
> -					line->id, *line->vsc, gc_rq->nr_secs);
> -		goto out;
> -	}
> -
> 	/* Read from GC victim block */
> 	ret = pblk_submit_read_gc(pblk, gc_rq);
> 	if (ret) {
> @@ -189,6 +180,8 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
> 	struct pblk_line *line = line_ws->line;
> 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> 	struct pblk_line_meta *lm = &pblk->lm;
> +	struct nvm_tgt_dev *dev = pblk->dev;
> +	struct nvm_geo *geo = &dev->geo;
> 	struct pblk_gc *gc = &pblk->gc;
> 	struct pblk_line_ws *gc_rq_ws;
> 	struct pblk_gc_rq *gc_rq;
> @@ -247,9 +240,13 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
> 	gc_rq->nr_secs = nr_secs;
> 	gc_rq->line = line;
> 
> +	gc_rq->data = vmalloc(gc_rq->nr_secs * geo->csecs);
> +	if (!gc_rq->data)
> +		goto fail_free_gc_rq;
> +
> 	gc_rq_ws = kmalloc(sizeof(struct pblk_line_ws), GFP_KERNEL);
> 	if (!gc_rq_ws)
> -		goto fail_free_gc_rq;
> +		goto fail_free_gc_data;
> 
> 	gc_rq_ws->pblk = pblk;
> 	gc_rq_ws->line = line;
> @@ -281,6 +278,8 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
> 
> 	return;
> 
> +fail_free_gc_data:
> +	vfree(gc_rq->data);
> fail_free_gc_rq:
> 	kfree(gc_rq);
> fail_free_lba_list:
> --
> 2.17.1

Looks good to me.

Reviewed-by: Javier González <javier@javigon.com>

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 00/13] lightnvm: bugfixes and improvements
  2019-02-28 16:36 ` [PATCH 00/13] lightnvm: bugfixes and improvements Matias Bjørling
@ 2019-02-28 17:15   ` Javier González
  2019-03-01 10:23   ` Hans Holmberg
  1 sibling, 0 replies; 91+ messages in thread
From: Javier González @ 2019-02-28 17:15 UTC (permalink / raw)
  To: Matias Bjørling; +Cc: Konopko, Igor J, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 1949 bytes --]

> On 28 Feb 2019, at 11.36, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> On 2/27/19 6:14 PM, Igor Konopko wrote:
>> This series provides a group of the bugfixes
>> or improvements for lightnvm and pblk device.
>> Most of the patches are rather simple and covers
>> some corner cases scenario, but we were able to hit
>> most of them in some scenarios. Few others close some
>> existing gaps which we were able to found.
>> Fedback is appreciated.
>> Igor Konopko (13):
>>   lightnvm: pblk: Line reference fix in GC
>>   lightnvm: pblk: Gracefully handle GC data malloc fail
>>   lightnvm: pblk: Fix put line back behaviour
>>   lightnvm: pblk: Rollback in gc read
>>   lightnvm: pblk: Count all read errors in stats
>>   lightnvm: pblk: Ensure that erase is chunk aligned
>>   lightnvm: pblk: Cleanly fail when there is not enough memory
>>   lightnvm: pblk: Set proper read stutus in bio
>>   lightnvm: pblk: Kick writer for flush requests
>>   lightnvm: pblk: Reduce L2P DRAM footprint
>>   lightnvm: pblk: Remove unused smeta_ssec field
>>   lightnvm: pblk: close opened chunks
>>   lightnvm: Inherit mdts from the parent nvme device
>>  drivers/lightnvm/core.c          |   9 ++-
>>  drivers/lightnvm/pblk-core.c     | 128 +++++++++++++++++++++++++++++--
>>  drivers/lightnvm/pblk-gc.c       |  47 +++++++-----
>>  drivers/lightnvm/pblk-init.c     |  30 ++++++--
>>  drivers/lightnvm/pblk-map.c      |   2 +
>>  drivers/lightnvm/pblk-read.c     |  13 ++--
>>  drivers/lightnvm/pblk-recovery.c |   2 +-
>>  drivers/lightnvm/pblk.h          |   4 +-
>>  drivers/nvme/host/lightnvm.c     |   1 +
>>  include/linux/lightnvm.h         |   1 +
>>  10 files changed, 193 insertions(+), 44 deletions(-)
> 
> Thanks Igor. I'll give Hans et. al. a couple of days to digest the changes.

If it is that difficult to spell Javier you can look at the pblk copyright…

I’ll review the patches next week.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 00/13] lightnvm: bugfixes and improvements
  2019-02-28 16:36 ` [PATCH 00/13] lightnvm: bugfixes and improvements Matias Bjørling
  2019-02-28 17:15   ` Javier González
@ 2019-03-01 10:23   ` Hans Holmberg
  1 sibling, 0 replies; 91+ messages in thread
From: Hans Holmberg @ 2019-03-01 10:23 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Igor Konopko, Javier González, Hans Holmberg, linux-block

Good stuff Igor!

The jetlag from vault is wearing off now, so I'll start looking at the
patches today.

Thanks,
Hans

On Thu, Feb 28, 2019 at 11:36 AM Matias Bjørling <mb@lightnvm.io> wrote:
>
> On 2/27/19 6:14 PM, Igor Konopko wrote:
> > This series provides a group of the bugfixes
> > or improvements for lightnvm and pblk device.
> >
> > Most of the patches are rather simple and covers
> > some corner cases scenario, but we were able to hit
> > most of them in some scenarios. Few others close some
> > existing gaps which we were able to found.
> >
> > Fedback is appreciated.
> >
> > Igor Konopko (13):
> >    lightnvm: pblk: Line reference fix in GC
> >    lightnvm: pblk: Gracefully handle GC data malloc fail
> >    lightnvm: pblk: Fix put line back behaviour
> >    lightnvm: pblk: Rollback in gc read
> >    lightnvm: pblk: Count all read errors in stats
> >    lightnvm: pblk: Ensure that erase is chunk aligned
> >    lightnvm: pblk: Cleanly fail when there is not enough memory
> >    lightnvm: pblk: Set proper read stutus in bio
> >    lightnvm: pblk: Kick writer for flush requests
> >    lightnvm: pblk: Reduce L2P DRAM footprint
> >    lightnvm: pblk: Remove unused smeta_ssec field
> >    lightnvm: pblk: close opened chunks
> >    lightnvm: Inherit mdts from the parent nvme device
> >
> >   drivers/lightnvm/core.c          |   9 ++-
> >   drivers/lightnvm/pblk-core.c     | 128 +++++++++++++++++++++++++++++--
> >   drivers/lightnvm/pblk-gc.c       |  47 +++++++-----
> >   drivers/lightnvm/pblk-init.c     |  30 ++++++--
> >   drivers/lightnvm/pblk-map.c      |   2 +
> >   drivers/lightnvm/pblk-read.c     |  13 ++--
> >   drivers/lightnvm/pblk-recovery.c |   2 +-
> >   drivers/lightnvm/pblk.h          |   4 +-
> >   drivers/nvme/host/lightnvm.c     |   1 +
> >   include/linux/lightnvm.h         |   1 +
> >   10 files changed, 193 insertions(+), 44 deletions(-)
> >
>
> Thanks Igor. I'll give Hans et. al. a couple of days to digest the
> changes.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/13] lightnvm: pblk: Line reference fix in GC
  2019-02-27 17:14 ` [PATCH 01/13] lightnvm: pblk: Line reference fix in GC Igor Konopko
@ 2019-03-01 12:20   ` Hans Holmberg
  2019-03-04  7:18   ` Javier González
  2019-03-04 12:40   ` Matias Bjørling
  2 siblings, 0 replies; 91+ messages in thread
From: Hans Holmberg @ 2019-03-01 12:20 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Matias Bjorling, Javier González, Hans Holmberg,
	linux-block, Heiner Litz

On Wed, Feb 27, 2019 at 6:17 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
> This patch fixes the error case in GC when
> we both moves line back to closed state and
> release additional reference, what cause illegal
> transition from closed to free on pblk_line_put
> when only gc to free line state transition is
> allowed using that path.
>
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>  drivers/lightnvm/pblk-gc.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index 2fa118c8eb71..3feadfd9418d 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -290,8 +290,11 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
>  fail_free_ws:
>         kfree(line_ws);
>
> +       /* Line goes back to closed state, so we cannot release additional
> +        * reference for line, since we do that only when we want to do
> +        * gc to free line state transition.
> +        */

Good comment.
It would be nice to document how the line recounting works in general,
as it is, non-trivial to deduce from the code.
I dug into this when supporting Heiner in his latest bugfix, and it's
on my laundry list .. but if someone else feels inclined.. :)

>         pblk_put_line_back(pblk, line);
> -       kref_put(&line->ref, pblk_line_put);
>         atomic_dec(&gc->read_inflight_gc);
>
>         pblk_err(pblk, "failed to GC line %d\n", line->id);
> --
> 2.17.1
>

Great catch!
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/13] lightnvm: pblk: Gracefully handle GC data malloc fail
  2019-02-28 17:08   ` Javier González
@ 2019-03-01 12:50     ` Hans Holmberg
  2019-03-04 12:38       ` Igor Konopko
  0 siblings, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-01 12:50 UTC (permalink / raw)
  To: Javier González
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

On Thu, Feb 28, 2019 at 6:09 PM Javier González <javier@javigon.com> wrote:
>
>
>
> > On 27 Feb 2019, at 12.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >
> > Currently when we fail on gc rq data allocation
> > we simply skip the data which we wanted to move
> > and finally move the line to free state and lose
> > that data due to that. This patch move the data
> > allocation to some earlier phase of GC, where we
> > can still fail gracefully by moving line back
> > to closed state.
> >
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > ---
> > drivers/lightnvm/pblk-gc.c | 19 +++++++++----------
> > 1 file changed, 9 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> > index 3feadfd9418d..31fc1339faa8 100644
> > --- a/drivers/lightnvm/pblk-gc.c
> > +++ b/drivers/lightnvm/pblk-gc.c
> > @@ -84,8 +84,6 @@ static void pblk_gc_line_ws(struct work_struct *work)
> >       struct pblk_line_ws *gc_rq_ws = container_of(work,
> >                                               struct pblk_line_ws, ws);
> >       struct pblk *pblk = gc_rq_ws->pblk;
> > -     struct nvm_tgt_dev *dev = pblk->dev;
> > -     struct nvm_geo *geo = &dev->geo;
> >       struct pblk_gc *gc = &pblk->gc;
> >       struct pblk_line *line = gc_rq_ws->line;
> >       struct pblk_gc_rq *gc_rq = gc_rq_ws->priv;
> > @@ -93,13 +91,6 @@ static void pblk_gc_line_ws(struct work_struct *work)
> >
> >       up(&gc->gc_sem);
> >
> > -     gc_rq->data = vmalloc(array_size(gc_rq->nr_secs, geo->csecs));
> > -     if (!gc_rq->data) {
> > -             pblk_err(pblk, "could not GC line:%d (%d/%d)\n",
> > -                                     line->id, *line->vsc, gc_rq->nr_secs);
> > -             goto out;
> > -     }
> > -
> >       /* Read from GC victim block */
> >       ret = pblk_submit_read_gc(pblk, gc_rq);
> >       if (ret) {
> > @@ -189,6 +180,8 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
> >       struct pblk_line *line = line_ws->line;
> >       struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> >       struct pblk_line_meta *lm = &pblk->lm;
> > +     struct nvm_tgt_dev *dev = pblk->dev;
> > +     struct nvm_geo *geo = &dev->geo;
> >       struct pblk_gc *gc = &pblk->gc;
> >       struct pblk_line_ws *gc_rq_ws;
> >       struct pblk_gc_rq *gc_rq;
> > @@ -247,9 +240,13 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
> >       gc_rq->nr_secs = nr_secs;
> >       gc_rq->line = line;
> >
> > +     gc_rq->data = vmalloc(gc_rq->nr_secs * geo->csecs);

Why not use array_size to do the size calculation as before? It checks
for overflows.
Apart from this, the patch looks good to me.

> > +     if (!gc_rq->data)
> > +             goto fail_free_gc_rq;
> > +
> >       gc_rq_ws = kmalloc(sizeof(struct pblk_line_ws), GFP_KERNEL);
> >       if (!gc_rq_ws)
> > -             goto fail_free_gc_rq;
> > +             goto fail_free_gc_data;
> >
> >       gc_rq_ws->pblk = pblk;
> >       gc_rq_ws->line = line;
> > @@ -281,6 +278,8 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
> >
> >       return;
> >
> > +fail_free_gc_data:
> > +     vfree(gc_rq->data);
> > fail_free_gc_rq:
> >       kfree(gc_rq);
> > fail_free_lba_list:
> > --
> > 2.17.1
>
> Looks good to me.
>
> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 03/13] lightnvm: pblk: Fix put line back behaviour
  2019-02-27 17:14 ` [PATCH 03/13] lightnvm: pblk: Fix put line back behaviour Igor Konopko
@ 2019-03-01 13:27   ` Hans Holmberg
  2019-03-04  7:22   ` Javier González
  1 sibling, 0 replies; 91+ messages in thread
From: Hans Holmberg @ 2019-03-01 13:27 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Matias Bjorling, Javier González, Hans Holmberg, linux-block

Looks good.

Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>

On Wed, Feb 27, 2019 at 6:17 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
> In current implementation of pblk_put_line_back behaviour
> there are two cases which are not handled.
>
> First one is the race condition with __pblk_map_invalidate in
> which function we check for line state, which might be closed,
> but still not added to any list and thus explode in list_move_tail.
> This is due to lack of locking both gc_lock and line lock
> in pblk_put_line_back current implementation.
>
> The second issue is that when we are in that function, line
> is not on any list and pblk_line_gc_list might hit the same
> gc group and thus not return any move_list. Then our line
> will stuck forever unassigned to any list. Simply resetting
> gc_list to none will fix that.
>
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>  drivers/lightnvm/pblk-gc.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index 31fc1339faa8..511ed0d5333c 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -64,19 +64,23 @@ static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
>         struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>         struct list_head *move_list;
>
> +       spin_lock(&l_mg->gc_lock);
>         spin_lock(&line->lock);
>         WARN_ON(line->state != PBLK_LINESTATE_GC);
>         line->state = PBLK_LINESTATE_CLOSED;
>         trace_pblk_line_state(pblk_disk_name(pblk), line->id,
>                                         line->state);
> +
> +       /* We need to reset gc_group in order to ensure that
> +        * pblk_line_gc_list will return proper move_list
> +        * since right now current line is not on any of the
> +        * gc lists.
> +        */
> +       line->gc_group = PBLK_LINEGC_NONE;
>         move_list = pblk_line_gc_list(pblk, line);
>         spin_unlock(&line->lock);
> -
> -       if (move_list) {
> -               spin_lock(&l_mg->gc_lock);
> -               list_add_tail(&line->list, move_list);
> -               spin_unlock(&l_mg->gc_lock);
> -       }
> +       list_add_tail(&line->list, move_list);
> +       spin_unlock(&l_mg->gc_lock);
>  }
>
>  static void pblk_gc_line_ws(struct work_struct *work)
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/13] lightnvm: pblk: Line reference fix in GC
  2019-02-27 17:14 ` [PATCH 01/13] lightnvm: pblk: Line reference fix in GC Igor Konopko
  2019-03-01 12:20   ` Hans Holmberg
@ 2019-03-04  7:18   ` Javier González
  2019-03-04 12:40   ` Matias Bjørling
  2 siblings, 0 replies; 91+ messages in thread
From: Javier González @ 2019-03-04  7:18 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 1289 bytes --]

> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> This patch fixes the error case in GC when
> we both moves line back to closed state and
> release additional reference, what cause illegal
> transition from closed to free on pblk_line_put
> when only gc to free line state transition is
> allowed using that path.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-gc.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index 2fa118c8eb71..3feadfd9418d 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -290,8 +290,11 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
> fail_free_ws:
> 	kfree(line_ws);
> 
> +	/* Line goes back to closed state, so we cannot release additional
> +	 * reference for line, since we do that only when we want to do
> +	 * gc to free line state transition.
> +	 */
> 	pblk_put_line_back(pblk, line);
> -	kref_put(&line->ref, pblk_line_put);
> 	atomic_dec(&gc->read_inflight_gc);
> 
> 	pblk_err(pblk, "failed to GC line %d\n", line->id);
> --
> 2.17.1

Looks good to me

Reviewed-by: Javier González <javier@javigon.com>

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 03/13] lightnvm: pblk: Fix put line back behaviour
  2019-02-27 17:14 ` [PATCH 03/13] lightnvm: pblk: Fix put line back behaviour Igor Konopko
  2019-03-01 13:27   ` Hans Holmberg
@ 2019-03-04  7:22   ` Javier González
  1 sibling, 0 replies; 91+ messages in thread
From: Javier González @ 2019-03-04  7:22 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 2430 bytes --]


> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> In current implementation of pblk_put_line_back behaviour
> there are two cases which are not handled.
> 
> First one is the race condition with __pblk_map_invalidate in
> which function we check for line state, which might be closed,
> but still not added to any list and thus explode in list_move_tail.
> This is due to lack of locking both gc_lock and line lock
> in pblk_put_line_back current implementation.
> 
> The second issue is that when we are in that function, line
> is not on any list and pblk_line_gc_list might hit the same
> gc group and thus not return any move_list. Then our line
> will stuck forever unassigned to any list. Simply resetting
> gc_list to none will fix that.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-gc.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index 31fc1339faa8..511ed0d5333c 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -64,19 +64,23 @@ static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
> 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> 	struct list_head *move_list;
> 
> +	spin_lock(&l_mg->gc_lock);
> 	spin_lock(&line->lock);
> 	WARN_ON(line->state != PBLK_LINESTATE_GC);
> 	line->state = PBLK_LINESTATE_CLOSED;
> 	trace_pblk_line_state(pblk_disk_name(pblk), line->id,
> 					line->state);
> +
> +	/* We need to reset gc_group in order to ensure that
> +	 * pblk_line_gc_list will return proper move_list
> +	 * since right now current line is not on any of the
> +	 * gc lists.
> +	 */
> +	line->gc_group = PBLK_LINEGC_NONE;
> 	move_list = pblk_line_gc_list(pblk, line);
> 	spin_unlock(&line->lock);
> -
> -	if (move_list) {
> -		spin_lock(&l_mg->gc_lock);
> -		list_add_tail(&line->list, move_list);
> -		spin_unlock(&l_mg->gc_lock);
> -	}
> +	list_add_tail(&line->list, move_list);
> +	spin_unlock(&l_mg->gc_lock);
> }
> 
> static void pblk_gc_line_ws(struct work_struct *work)
> --
> 2.17.1

This comes back from the time where GC was single threaded - I left out
this when reimplementing the locking. Good catch.

Please, add a fixes tag for this to be back ported.

Reviewed-by: Javier González <javier@javigon.com>


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/13] lightnvm: pblk: Rollback in gc read
  2019-02-27 17:14 ` [PATCH 04/13] lightnvm: pblk: Rollback in gc read Igor Konopko
@ 2019-03-04  7:38   ` Javier González
  2019-03-04  8:44     ` Hans Holmberg
  2019-03-04 12:49   ` Matias Bjørling
  1 sibling, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04  7:38 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 1070 bytes --]

> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> Currently in case of error returned by pblk_gc_line
> to pblk_gc_read we leave current line unassigned
> from all the lists. This patch fixes that issue.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-gc.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index 511ed0d5333c..533da6ea3e15 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -361,8 +361,13 @@ static int pblk_gc_read(struct pblk *pblk)
> 
> 	pblk_gc_kick(pblk);
> 
> -	if (pblk_gc_line(pblk, line))
> +	if (pblk_gc_line(pblk, line)) {
> 		pblk_err(pblk, "failed to GC line %d\n", line->id);
> +		/* rollback */
> +		spin_lock(&gc->r_lock);
> +		list_add_tail(&line->list, &gc->r_list);
> +		spin_unlock(&gc->r_lock);
> +	}
> 
> 	return 0;
> }
> --
> 2.17.1

Looks good to me.

Reviewed-by: Javier González <javier@javigon.com>

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/13] lightnvm: pblk: Count all read errors in stats
  2019-02-27 17:14 ` [PATCH 05/13] lightnvm: pblk: Count all read errors in stats Igor Konopko
@ 2019-03-04  7:42   ` Javier González
  2019-03-04  9:02     ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04  7:42 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 1305 bytes --]

> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> Currently when unknown error occurs on read path
> there is only dmesg information about it, but it
> is not counted in sysfs statistics. Since this is
> still an error we should also count it there.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-core.c | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index eabcbc119681..a98b2255f963 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -493,6 +493,7 @@ void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd)
> 		atomic_long_inc(&pblk->read_failed);
> 		break;
> 	default:
> +		atomic_long_inc(&pblk->read_failed);
> 		pblk_err(pblk, "unknown read error:%d\n", rqd->error);
> 	}
> #ifdef CONFIG_NVM_PBLK_DEBUG
> --
> 2.17.1

I left this out intentionally  so that we could correlate the logs from
the controller and the errors in the read path. Since we do not have an
standard way to correlate this on SMART yet, let’s add this now (I
assume that you are using it for something?) and we can separate the
error stats in the future.

Reviewed-by: Javier González <javier@javigon.com>

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned
  2019-02-27 17:14 ` [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned Igor Konopko
@ 2019-03-04  7:48   ` Javier González
  2019-03-04  9:05     ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04  7:48 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 1953 bytes --]

> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> In current pblk implementation of erase command
> there is a chance tha sector bits are set to some
> random values for erase PPA. This is unexpected
> situation, since erase shall be always chunk
> aligned. This patch fixes that issue
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-core.c | 1 +
> drivers/lightnvm/pblk-map.c  | 2 ++
> 2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index a98b2255f963..78b1eea4ab67 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -978,6 +978,7 @@ int pblk_line_erase(struct pblk *pblk, struct pblk_line *line)
> 
> 		ppa = pblk->luns[bit].bppa; /* set ch and lun */
> 		ppa.a.blk = line->id;
> +		ppa.a.reserved = 0;
> 
> 		atomic_dec(&line->left_eblks);
> 		WARN_ON(test_and_set_bit(bit, line->erase_bitmap));
> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
> index 79df583ea709..aea46b4ec40f 100644
> --- a/drivers/lightnvm/pblk-map.c
> +++ b/drivers/lightnvm/pblk-map.c
> @@ -161,6 +161,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
> 
> 			*erase_ppa = ppa_list[i];
> 			erase_ppa->a.blk = e_line->id;
> +			erase_ppa->a.reserved = 0;
> 
> 			spin_unlock(&e_line->lock);
> 
> @@ -202,6 +203,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
> 		atomic_dec(&e_line->left_eblks);
> 		*erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
> 		erase_ppa->a.blk = e_line->id;
> +		erase_ppa->a.reserved = 0;
> 	}
> 
> 	return 0;
> --
> 2.17.1

I’m fine with adding this, but note that there is actually no
requirement for the erase to be chunk aligned - the only bits that
should be looked at are group, PU and chunk.

Reviewed-by: Javier González <javier@javigon.com>



[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 07/13] lightnvm: pblk: Cleanly fail when there is not enough memory
  2019-02-27 17:14 ` [PATCH 07/13] lightnvm: pblk: Cleanly fail when there is not enough memory Igor Konopko
@ 2019-03-04  7:53   ` Javier González
  2019-03-04  9:24     ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04  7:53 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 1542 bytes --]

> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> L2P table can be huge in many cases, since
> it typically requires 1GB of DRAM for 1TB
> of drive. When there is not enough memory
> available, OOM killer turns on and kills
> random processes, which can be very annoying
> for users. This patch changes the flag for
> L2P table allocation on order to handle this
> situation in more user friendly way
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-init.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index 8b643d0bffae..e553105b7ba1 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -164,9 +164,14 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
> 	int ret = 0;
> 
> 	map_size = pblk_trans_map_size(pblk);
> -	pblk->trans_map = vmalloc(map_size);
> -	if (!pblk->trans_map)
> +	pblk->trans_map = __vmalloc(map_size, GFP_KERNEL | __GFP_NOWARN
> +					| __GFP_RETRY_MAYFAIL | __GFP_HIGHMEM,
> +					PAGE_KERNEL);
> +	if (!pblk->trans_map) {
> +		pblk_err(pblk, "failed to allocate L2P (need %ld of memory)\n",
> +				map_size);
> 		return -ENOMEM;
> +	}
> 
> 	pblk_ppa_set_empty(&ppa);
> 
> --
> 2.17.1

Is there any extra consideration we should take when enabling high
memory for the L2P table? If not, looks good to me.

Reviewed-by: Javier González <javier@javigon.com>

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio
  2019-02-27 17:14 ` [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio Igor Konopko
@ 2019-03-04  8:03   ` Javier González
  2019-03-04  9:35     ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04  8:03 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 2299 bytes --]

> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> Currently in case of read errors, bi_status is not
> set properly which leads to returning inproper data
> to higher layer. This patch fix that by setting proper
> status in case of read errors
> 
> Patch also removes unnecessary warn_once(), which does
> not make sense in that place, since user bio is not used
> for interation with drive and thus bi_status will not be
> set here.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-read.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 3789185144da..39c1d6ccaedb 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
> 	WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
> }
> 
> -static void pblk_end_user_read(struct bio *bio)
> +static void pblk_end_user_read(struct bio *bio, int error)
> {
> -#ifdef CONFIG_NVM_PBLK_DEBUG
> -	WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
> -#endif
> +	if (error && error != NVM_RSP_WARN_HIGHECC)
> +		bio_io_error(bio);
> 	bio_endio(bio);
> }
> 
> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
> 	struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
> 	struct bio *bio = (struct bio *)r_ctx->private;
> 
> -	pblk_end_user_read(bio);
> +	pblk_end_user_read(bio, rqd->error);
> 	__pblk_end_io_read(pblk, rqd, true);
> }
> 
> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
> 	rqd->bio = NULL;
> 	rqd->nr_ppas = nr_secs;
> 
> -	bio_endio(bio);
> +	pblk_end_user_read(bio, rqd->error);
> 	__pblk_end_io_read(pblk, rqd, false);
> }
> 
> --
> 2.17.1

This is by design. We do not report the read errors as in any other
block device - this is why we clone the read bio.

If you want to remove the WARN_ONCE, it is fine by me - it helped in the
past to debug the read path as when one read failed we go a storm of
them. Now we have better controller tools to debug this, so if nobody
else wants it there, let’s remove it.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 09/13] lightnvm: pblk: Kick writer for flush requests
  2019-02-27 17:14 ` [PATCH 09/13] lightnvm: pblk: Kick writer for flush requests Igor Konopko
@ 2019-03-04  8:08   ` Javier González
  2019-03-04  9:39     ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04  8:08 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 1161 bytes --]


> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> In case when there is no enough sector available in rwb
> and there is flush request send we should kick write thread
> which is not a case in current implementation. This patch
> fixes that issue.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 78b1eea4ab67..f48f2e77f770 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -375,8 +375,9 @@ void pblk_write_timer_fn(struct timer_list *t)
> void pblk_write_should_kick(struct pblk *pblk)
> {
> 	unsigned int secs_avail = pblk_rb_read_count(&pblk->rwb);
> +	unsigned int secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
> 
> -	if (secs_avail >= pblk->min_write_pgs_data)
> +	if (secs_avail >= pblk->min_write_pgs_data || secs_to_flush)
> 		pblk_write_kick(pblk);
> }
> 
> --
> 2.17.1

We already kick the write thread in case of REQ_PREFLUSH in
pblk_write_cache(), so no need to kick again.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 10/13] lightnvm: pblk: Reduce L2P DRAM footprint
  2019-02-27 17:14 ` [PATCH 10/13] lightnvm: pblk: Reduce L2P DRAM footprint Igor Konopko
@ 2019-03-04  8:17   ` Javier González
  2019-03-04  9:29     ` Hans Holmberg
  2019-03-04 13:11   ` Matias Bjørling
  1 sibling, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04  8:17 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 5310 bytes --]

> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> Currently L2P map size is calculated based on
> the total number of available sectors, which is
> redundant, since it contains some number of
> over provisioning (11% by default).
> 
> The goal of this patch is to change this size
> to the real capacity and thus reduce the DRAM
> footprint significantly - with default op value
> it is approx. 110MB of DRAM less for every 1TB
> of pblk drive.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-core.c     | 8 ++++----
> drivers/lightnvm/pblk-init.c     | 7 +++----
> drivers/lightnvm/pblk-read.c     | 2 +-
> drivers/lightnvm/pblk-recovery.c | 2 +-
> drivers/lightnvm/pblk.h          | 1 -
> 5 files changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index f48f2e77f770..2e424c0275c1 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -2024,7 +2024,7 @@ void pblk_update_map(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
> 	struct ppa_addr ppa_l2p;
> 
> 	/* logic error: lba out-of-bounds. Ignore update */
> -	if (!(lba < pblk->rl.nr_secs)) {
> +	if (!(lba < pblk->capacity)) {
> 		WARN(1, "pblk: corrupted L2P map request\n");
> 		return;
> 	}
> @@ -2064,7 +2064,7 @@ int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa_new,
> #endif
> 
> 	/* logic error: lba out-of-bounds. Ignore update */
> -	if (!(lba < pblk->rl.nr_secs)) {
> +	if (!(lba < pblk->capacity)) {
> 		WARN(1, "pblk: corrupted L2P map request\n");
> 		return 0;
> 	}
> @@ -2110,7 +2110,7 @@ void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
> 	}
> 
> 	/* logic error: lba out-of-bounds. Ignore update */
> -	if (!(lba < pblk->rl.nr_secs)) {
> +	if (!(lba < pblk->capacity)) {
> 		WARN(1, "pblk: corrupted L2P map request\n");
> 		return;
> 	}
> @@ -2168,7 +2168,7 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
> 		lba = lba_list[i];
> 		if (lba != ADDR_EMPTY) {
> 			/* logic error: lba out-of-bounds. Ignore update */
> -			if (!(lba < pblk->rl.nr_secs)) {
> +			if (!(lba < pblk->capacity)) {
> 				WARN(1, "pblk: corrupted L2P map request\n");
> 				continue;
> 			}
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index e553105b7ba1..9913a4514eb6 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -105,7 +105,7 @@ static size_t pblk_trans_map_size(struct pblk *pblk)
> 	if (pblk->addrf_len < 32)
> 		entry_size = 4;
> 
> -	return entry_size * pblk->rl.nr_secs;
> +	return entry_size * pblk->capacity;
> }
> 
> #ifdef CONFIG_NVM_PBLK_DEBUG
> @@ -175,7 +175,7 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
> 
> 	pblk_ppa_set_empty(&ppa);
> 
> -	for (i = 0; i < pblk->rl.nr_secs; i++)
> +	for (i = 0; i < pblk->capacity; i++)
> 		pblk_trans_map_set(pblk, i, ppa);
> 
> 	ret = pblk_l2p_recover(pblk, factory_init);
> @@ -706,7 +706,6 @@ static int pblk_set_provision(struct pblk *pblk, int nr_free_chks)
> 	 * on user capacity consider only provisioned blocks
> 	 */
> 	pblk->rl.total_blocks = nr_free_chks;
> -	pblk->rl.nr_secs = nr_free_chks * geo->clba;
> 
> 	/* Consider sectors used for metadata */
> 	sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
> @@ -1289,7 +1288,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
> 
> 	pblk_info(pblk, "luns:%u, lines:%d, secs:%llu, buf entries:%u\n",
> 			geo->all_luns, pblk->l_mg.nr_lines,
> -			(unsigned long long)pblk->rl.nr_secs,
> +			(unsigned long long)pblk->capacity,
> 			pblk->rwb.nr_entries);
> 
> 	wake_up_process(pblk->writer_ts);
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 39c1d6ccaedb..65697463def8 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -561,7 +561,7 @@ static int read_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
> 		goto out;
> 
> 	/* logic error: lba out-of-bounds */
> -	if (lba >= pblk->rl.nr_secs) {
> +	if (lba >= pblk->capacity) {
> 		WARN(1, "pblk: read lba out of bounds\n");
> 		goto out;
> 	}
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index d86f580036d3..83b467b5edc7 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -474,7 +474,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
> 
> 		lba_list[paddr++] = cpu_to_le64(lba);
> 
> -		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
> +		if (lba == ADDR_EMPTY || lba >= pblk->capacity)
> 			continue;
> 
> 		line->nr_valid_lbas++;
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index a6386d5acd73..a92377530930 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -305,7 +305,6 @@ struct pblk_rl {
> 
> 	struct timer_list u_timer;
> 
> -	unsigned long long nr_secs;
> 	unsigned long total_blocks;
> 
> 	atomic_t free_blocks;		/* Total number of free blocks (+ OP) */
> --
> 2.17.1

Looks good to me.

Reviewed-by: Javier González <javier@javigon.com>

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 11/13] lightnvm: pblk: Remove unused smeta_ssec field
  2019-02-27 17:14 ` [PATCH 11/13] lightnvm: pblk: Remove unused smeta_ssec field Igor Konopko
@ 2019-03-04  8:21   ` Javier González
  2019-03-04  9:40     ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04  8:21 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 1587 bytes --]

> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> Smeta_ssec field in pblk_line is set once and
> never used, since it was replaced by function
> pblk_line_smeta_start(). This patch removes
> this no longer needed field
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-core.c | 1 -
> drivers/lightnvm/pblk.h      | 1 -
> 2 files changed, 2 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 2e424c0275c1..fa4dc05608ff 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -1165,7 +1165,6 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
> 	off = bit * geo->ws_opt;
> 	bitmap_set(line->map_bitmap, off, lm->smeta_sec);
> 	line->sec_in_line -= lm->smeta_sec;
> -	line->smeta_ssec = off;
> 	line->cur_sec = off + lm->smeta_sec;
> 
> 	if (init && pblk_line_smeta_write(pblk, line, off)) {
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index a92377530930..b266563508e6 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -464,7 +464,6 @@ struct pblk_line {
> 	int meta_line;			/* Metadata line id */
> 	int meta_distance;		/* Distance between data and metadata */
> 
> -	u64 smeta_ssec;			/* Sector where smeta starts */
> 	u64 emeta_ssec;			/* Sector where emeta starts */
> 
> 	unsigned int sec_in_line;	/* Number of usable secs in line */
> --
> 2.17.1

Looks good to me.

Reviewed-by: Javier González <javier@javigon.com>

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 12/13] lightnvm: pblk: close opened chunks
  2019-02-27 17:14 ` [PATCH 12/13] lightnvm: pblk: close opened chunks Igor Konopko
@ 2019-03-04  8:27   ` Javier González
  2019-03-04 10:05     ` Hans Holmberg
  2019-03-04 13:18     ` Matias Bjørling
  0 siblings, 2 replies; 91+ messages in thread
From: Javier González @ 2019-03-04  8:27 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 7208 bytes --]

> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> When we creating pblk instance with factory
> flag, there is a possibility that some chunks
> are in open state, which does not allow to
> issue erase request to them directly. Such a
> chunk should be filled with some empty data in
> order to achieve close state. Without that we
> are risking that some erase operation will be
> rejected by the drive due to inproper chunk
> state.
> 
> This patch implements closing chunk logic in pblk
> for that case, when creating instance with factory
> flag in order to fix that issue
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-core.c | 114 +++++++++++++++++++++++++++++++++++
> drivers/lightnvm/pblk-init.c |  14 ++++-
> drivers/lightnvm/pblk.h      |   2 +
> 3 files changed, 128 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index fa4dc05608ff..d3c45393f093 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -161,6 +161,120 @@ struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
> 	return meta + ch_off + lun_off + chk_off;
> }
> 
> +static void pblk_close_chunk(struct pblk *pblk, struct ppa_addr ppa, int count)
> +{
> +	struct nvm_tgt_dev *dev = pblk->dev;
> +	struct nvm_geo *geo = &dev->geo;
> +	struct bio *bio;
> +	struct ppa_addr *ppa_list;
> +	struct nvm_rq rqd;
> +	void *meta_list, *data;
> +	dma_addr_t dma_meta_list, dma_ppa_list;
> +	int i, rq_ppas, rq_len, ret;
> +
> +	meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
> +	if (!meta_list)
> +		return;
> +
> +	ppa_list = meta_list + pblk_dma_meta_size(pblk);
> +	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
> +
> +	rq_ppas = pblk_calc_secs(pblk, count, 0, false);
> +	if (!rq_ppas)
> +		rq_ppas = pblk->min_write_pgs;
> +	rq_len = rq_ppas * geo->csecs;
> +
> +	data = kzalloc(rq_len, GFP_KERNEL);
> +	if (!data)
> +		goto free_meta_list;
> +
> +	memset(&rqd, 0, sizeof(struct nvm_rq));
> +	rqd.opcode = NVM_OP_PWRITE;
> +	rqd.nr_ppas = rq_ppas;
> +	rqd.meta_list = meta_list;
> +	rqd.ppa_list = ppa_list;
> +	rqd.dma_ppa_list = dma_ppa_list;
> +	rqd.dma_meta_list = dma_meta_list;
> +
> +next_rq:
> +	bio = bio_map_kern(dev->q, data, rq_len, GFP_KERNEL);
> +	if (IS_ERR(bio))
> +		goto out_next;
> +
> +	bio->bi_iter.bi_sector = 0; /* artificial bio */
> +	bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> +
> +	rqd.bio = bio;
> +	for (i = 0; i < rqd.nr_ppas; i++) {
> +		rqd.ppa_list[i] = ppa;
> +		rqd.ppa_list[i].m.sec += i;
> +		pblk_get_meta(pblk, meta_list, i)->lba =
> +					cpu_to_le64(ADDR_EMPTY);
> +	}
> +
> +	ret = nvm_submit_io_sync(dev, &rqd);
> +	if (ret) {
> +		bio_put(bio);
> +		goto out_next;
> +	}
> +
> +	if (rqd.error)
> +		goto free_data;
> +
> +out_next:
> +	count -= rqd.nr_ppas;
> +	ppa.m.sec += rqd.nr_ppas;
> +	if (count > 0)
> +		goto next_rq;
> +
> +free_data:
> +	kfree(data);
> +free_meta_list:
> +	nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
> +}
> +
> +void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
> +{
> +	struct nvm_tgt_dev *dev = pblk->dev;
> +	struct nvm_geo *geo = &dev->geo;
> +	struct nvm_chk_meta *chunk_meta;
> +	struct ppa_addr ppa;
> +	int i, j, k, count;
> +
> +	for (i = 0; i < geo->num_chk; i++) {
> +		for (j = 0; j < geo->num_lun; j++) {
> +			for (k = 0; k < geo->num_ch; k++) {
> +				ppa.ppa = 0;
> +				ppa.m.grp = k;
> +				ppa.m.pu = j;
> +				ppa.m.chk = i;
> +
> +				chunk_meta = pblk_chunk_get_off(pblk,
> +								meta, ppa);
> +				if (chunk_meta->state == NVM_CHK_ST_OPEN) {
> +					ppa.m.sec = chunk_meta->wp;
> +					count = geo->clba - chunk_meta->wp;
> +					pblk_close_chunk(pblk, ppa, count);
> +				}
> +			}
> +		}
> +	}
> +}
> +
> +bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
> +{
> +	struct nvm_tgt_dev *dev = pblk->dev;
> +	struct nvm_geo *geo = &dev->geo;
> +	int i;
> +
> +	for (i = 0; i < geo->all_luns; i++) {
> +		if (meta[i].state == NVM_CHK_ST_OPEN)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> void __pblk_map_invalidate(struct pblk *pblk, struct pblk_line *line,
> 			   u64 paddr)
> {
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index 9913a4514eb6..83abe6960b46 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -1028,13 +1028,14 @@ static int pblk_line_meta_init(struct pblk *pblk)
> 	return 0;
> }
> 
> -static int pblk_lines_init(struct pblk *pblk)
> +static int pblk_lines_init(struct pblk *pblk, bool factory_init)
> {
> 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> 	struct pblk_line *line;
> 	void *chunk_meta;
> 	int nr_free_chks = 0;
> 	int i, ret;
> +	bool retry = false;
> 
> 	ret = pblk_line_meta_init(pblk);
> 	if (ret)
> @@ -1048,12 +1049,21 @@ static int pblk_lines_init(struct pblk *pblk)
> 	if (ret)
> 		goto fail_free_meta;
> 
> +get_chunk_meta:
> 	chunk_meta = pblk_get_chunk_meta(pblk);
> 	if (IS_ERR(chunk_meta)) {
> 		ret = PTR_ERR(chunk_meta);
> 		goto fail_free_luns;
> 	}
> 
> +	if (factory_init && !retry &&
> +	    pblk_are_opened_chunks(pblk, chunk_meta)) {
> +		pblk_close_opened_chunks(pblk, chunk_meta);
> +		retry = true;
> +		vfree(chunk_meta);
> +		goto get_chunk_meta;
> +	}
> +
> 	pblk->lines = kcalloc(l_mg->nr_lines, sizeof(struct pblk_line),
> 								GFP_KERNEL);
> 	if (!pblk->lines) {
> @@ -1244,7 +1254,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
> 		goto fail;
> 	}
> 
> -	ret = pblk_lines_init(pblk);
> +	ret = pblk_lines_init(pblk, flags & NVM_TARGET_FACTORY);
> 	if (ret) {
> 		pblk_err(pblk, "could not initialize lines\n");
> 		goto fail_free_core;
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index b266563508e6..b248642c4dfb 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -793,6 +793,8 @@ struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk);
> struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
> 					      struct nvm_chk_meta *lp,
> 					      struct ppa_addr ppa);
> +void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
> +bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
> void pblk_log_write_err(struct pblk *pblk, struct nvm_rq *rqd);
> void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd);
> int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd);
> --
> 2.17.1

I know that the OCSSD 2.0 spec does not allow to transition from open to
free, but to me this is a spec bug as there is no underlying issue in
reading an open block. Note that all controllers I know support this,
and the upcoming Denali spec. fixes this too.

Besides, the factory flag is intended to start a pblk instance
immediately, without having to pay the price of padding any past device
state.  If you still want to do this, I think this belongs in a user space tool.

Javier


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/13] lightnvm: pblk: Rollback in gc read
  2019-03-04  7:38   ` Javier González
@ 2019-03-04  8:44     ` Hans Holmberg
  2019-03-04 12:39       ` Igor Konopko
  0 siblings, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04  8:44 UTC (permalink / raw)
  To: Javier González
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

Did you ever see this in the wild?
The only time pblk_gc_line returns an error is if
kmalloc(sizeof(struct pblk_line_ws), GFP_KERNEL); fails, and then
we're in real trouble :)

Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>

On Mon, Mar 4, 2019 at 8:38 AM Javier González <javier@javigon.com> wrote:
>
> > On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >
> > Currently in case of error returned by pblk_gc_line
> > to pblk_gc_read we leave current line unassigned
> > from all the lists. This patch fixes that issue.
> >
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > ---
> > drivers/lightnvm/pblk-gc.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> > index 511ed0d5333c..533da6ea3e15 100644
> > --- a/drivers/lightnvm/pblk-gc.c
> > +++ b/drivers/lightnvm/pblk-gc.c
> > @@ -361,8 +361,13 @@ static int pblk_gc_read(struct pblk *pblk)
> >
> >       pblk_gc_kick(pblk);
> >
> > -     if (pblk_gc_line(pblk, line))
> > +     if (pblk_gc_line(pblk, line)) {
> >               pblk_err(pblk, "failed to GC line %d\n", line->id);
> > +             /* rollback */
> > +             spin_lock(&gc->r_lock);
> > +             list_add_tail(&line->list, &gc->r_list);
> > +             spin_unlock(&gc->r_lock);
> > +     }
> >
> >       return 0;
> > }
> > --
> > 2.17.1
>
> Looks good to me.
>
> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/13] lightnvm: pblk: Count all read errors in stats
  2019-03-04  7:42   ` Javier González
@ 2019-03-04  9:02     ` Hans Holmberg
  2019-03-04  9:23       ` Javier González
  0 siblings, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04  9:02 UTC (permalink / raw)
  To: Javier González
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

Igor: Have you seen this happening in real life?

I think it would be better to count all expected errors and put them
in the right bucket (without spamming dmesg). If we need a new bucket
for i.e. vendor-specific-errors, let's do that instead.

Someone wiser than me told me that every error print in the log is a
potential customer call.

Javier: Yeah, I think S.M.A.R.T is the way to deliver this
information. Why can't we let the drives expose this info and remove
this from pblk? What's blocking that?

Thanks,
Hans

On Mon, Mar 4, 2019 at 8:42 AM Javier González <javier@javigon.com> wrote:
>
> > On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >
> > Currently when unknown error occurs on read path
> > there is only dmesg information about it, but it
> > is not counted in sysfs statistics. Since this is
> > still an error we should also count it there.
> >
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > ---
> > drivers/lightnvm/pblk-core.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> > index eabcbc119681..a98b2255f963 100644
> > --- a/drivers/lightnvm/pblk-core.c
> > +++ b/drivers/lightnvm/pblk-core.c
> > @@ -493,6 +493,7 @@ void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd)
> >               atomic_long_inc(&pblk->read_failed);
> >               break;
> >       default:
> > +             atomic_long_inc(&pblk->read_failed);
> >               pblk_err(pblk, "unknown read error:%d\n", rqd->error);
> >       }
> > #ifdef CONFIG_NVM_PBLK_DEBUG
> > --
> > 2.17.1
>
> I left this out intentionally  so that we could correlate the logs from
> the controller and the errors in the read path. Since we do not have an
> standard way to correlate this on SMART yet, let’s add this now (I
> assume that you are using it for something?) and we can separate the
> error stats in the future.
>
> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-02-27 17:14 ` [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device Igor Konopko
@ 2019-03-04  9:05   ` Javier González
  2019-03-04 11:30     ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04  9:05 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 2781 bytes --]

> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> Current lightnvm and pblk implementation does not care
> about NVMe max data transfer size, which can be smaller
> than 64*K=256K. This patch fixes issues related to that.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/core.c      | 9 +++++++--
> drivers/nvme/host/lightnvm.c | 1 +
> include/linux/lightnvm.h     | 1 +
> 3 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> index 5f82036fe322..c01f83b8fbaf 100644
> --- a/drivers/lightnvm/core.c
> +++ b/drivers/lightnvm/core.c
> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> 	struct nvm_target *t;
> 	struct nvm_tgt_dev *tgt_dev;
> 	void *targetdata;
> +	unsigned int mdts;
> 	int ret;
> 
> 	switch (create->conf.type) {
> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> 	tdisk->private_data = targetdata;
> 	tqueue->queuedata = targetdata;
> 
> -	blk_queue_max_hw_sectors(tqueue,
> -			(dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> +	mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
> +	if (dev->geo.mdts) {
> +		mdts = min_t(u32, dev->geo.mdts,
> +				(dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> +	}
> +	blk_queue_max_hw_sectors(tqueue, mdts);
> 
> 	set_capacity(tdisk, tt->capacity(targetdata));
> 	add_disk(tdisk);
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index b759c25c89c8..b88a39a3cbd1 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
> 	geo->csecs = 1 << ns->lba_shift;
> 	geo->sos = ns->ms;
> 	geo->ext = ns->ext;
> +	geo->mdts = ns->ctrl->max_hw_sectors;
> 
> 	dev->q = q;
> 	memcpy(dev->name, disk_name, DISK_NAME_LEN);
> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
> index 5d865a5d5cdc..d3b02708e5f0 100644
> --- a/include/linux/lightnvm.h
> +++ b/include/linux/lightnvm.h
> @@ -358,6 +358,7 @@ struct nvm_geo {
> 	u16	csecs;		/* sector size */
> 	u16	sos;		/* out-of-band area size */
> 	bool	ext;		/* metadata in extended data buffer */
> +	u32	mdts;		/* Max data transfer size*/
> 
> 	/* device write constrains */
> 	u32	ws_min;		/* minimum write size */
> --
> 2.17.1

I see where you are going with this and I partially agree, but none of
the OCSSD specs define a way to define this parameter. Thus, adding this
behavior taken from NVMe in Linux can break current implementations. Is
this a real life problem for you? Or this is just for NVMe “correctness”?

Javier




[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned
  2019-03-04  7:48   ` Javier González
@ 2019-03-04  9:05     ` Hans Holmberg
  2019-03-04  9:11       ` Javier González
  0 siblings, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04  9:05 UTC (permalink / raw)
  To: Javier González
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

I strongly disagree with adding code that would mask implantation errors.

If we want more internal checks, we could add an if statement that
would only be compiled in if CONFIG_NVM_PBLK_DEBUG is enabled.


On Mon, Mar 4, 2019 at 8:48 AM Javier González <javier@javigon.com> wrote:
>
> > On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >
> > In current pblk implementation of erase command
> > there is a chance tha sector bits are set to some
> > random values for erase PPA. This is unexpected
> > situation, since erase shall be always chunk
> > aligned. This patch fixes that issue
> >
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > ---
> > drivers/lightnvm/pblk-core.c | 1 +
> > drivers/lightnvm/pblk-map.c  | 2 ++
> > 2 files changed, 3 insertions(+)
> >
> > diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> > index a98b2255f963..78b1eea4ab67 100644
> > --- a/drivers/lightnvm/pblk-core.c
> > +++ b/drivers/lightnvm/pblk-core.c
> > @@ -978,6 +978,7 @@ int pblk_line_erase(struct pblk *pblk, struct pblk_line *line)
> >
> >               ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >               ppa.a.blk = line->id;
> > +             ppa.a.reserved = 0;
> >
> >               atomic_dec(&line->left_eblks);
> >               WARN_ON(test_and_set_bit(bit, line->erase_bitmap));
> > diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
> > index 79df583ea709..aea46b4ec40f 100644
> > --- a/drivers/lightnvm/pblk-map.c
> > +++ b/drivers/lightnvm/pblk-map.c
> > @@ -161,6 +161,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
> >
> >                       *erase_ppa = ppa_list[i];
> >                       erase_ppa->a.blk = e_line->id;
> > +                     erase_ppa->a.reserved = 0;
> >
> >                       spin_unlock(&e_line->lock);
> >
> > @@ -202,6 +203,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
> >               atomic_dec(&e_line->left_eblks);
> >               *erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >               erase_ppa->a.blk = e_line->id;
> > +             erase_ppa->a.reserved = 0;
> >       }
> >
> >       return 0;
> > --
> > 2.17.1
>
> I’m fine with adding this, but note that there is actually no
> requirement for the erase to be chunk aligned - the only bits that
> should be looked at are group, PU and chunk.
>
> Reviewed-by: Javier González <javier@javigon.com>
>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned
  2019-03-04  9:05     ` Hans Holmberg
@ 2019-03-04  9:11       ` Javier González
  2019-03-04 11:43         ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04  9:11 UTC (permalink / raw)
  To: Hans Holmberg
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 2873 bytes --]

> On 4 Mar 2019, at 10.05, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
> 
> I strongly disagree with adding code that would mask implantation errors.
> 
> If we want more internal checks, we could add an if statement that
> would only be compiled in if CONFIG_NVM_PBLK_DEBUG is enabled.
> 

Not sure who this is for - better not to top post.

In any case, this is a spec grey zone. I’m ok with cleaning the bits as
they mean nothing for the reset command. If you feel that strongly about
this, you can take if with Igor.

> 
> On Mon, Mar 4, 2019 at 8:48 AM Javier González <javier@javigon.com> wrote:
>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>> 
>>> In current pblk implementation of erase command
>>> there is a chance tha sector bits are set to some
>>> random values for erase PPA. This is unexpected
>>> situation, since erase shall be always chunk
>>> aligned. This patch fixes that issue
>>> 
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-core.c | 1 +
>>> drivers/lightnvm/pblk-map.c  | 2 ++
>>> 2 files changed, 3 insertions(+)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>> index a98b2255f963..78b1eea4ab67 100644
>>> --- a/drivers/lightnvm/pblk-core.c
>>> +++ b/drivers/lightnvm/pblk-core.c
>>> @@ -978,6 +978,7 @@ int pblk_line_erase(struct pblk *pblk, struct pblk_line *line)
>>> 
>>>              ppa = pblk->luns[bit].bppa; /* set ch and lun */
>>>              ppa.a.blk = line->id;
>>> +             ppa.a.reserved = 0;
>>> 
>>>              atomic_dec(&line->left_eblks);
>>>              WARN_ON(test_and_set_bit(bit, line->erase_bitmap));
>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>> index 79df583ea709..aea46b4ec40f 100644
>>> --- a/drivers/lightnvm/pblk-map.c
>>> +++ b/drivers/lightnvm/pblk-map.c
>>> @@ -161,6 +161,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>> 
>>>                      *erase_ppa = ppa_list[i];
>>>                      erase_ppa->a.blk = e_line->id;
>>> +                     erase_ppa->a.reserved = 0;
>>> 
>>>                      spin_unlock(&e_line->lock);
>>> 
>>> @@ -202,6 +203,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>              atomic_dec(&e_line->left_eblks);
>>>              *erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
>>>              erase_ppa->a.blk = e_line->id;
>>> +             erase_ppa->a.reserved = 0;
>>>      }
>>> 
>>>      return 0;
>>> --
>>> 2.17.1
>> 
>> I’m fine with adding this, but note that there is actually no
>> requirement for the erase to be chunk aligned - the only bits that
>> should be looked at are group, PU and chunk.
>> 
>> Reviewed-by: Javier González <javier@javigon.com>

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/13] lightnvm: pblk: Count all read errors in stats
  2019-03-04  9:02     ` Hans Holmberg
@ 2019-03-04  9:23       ` Javier González
  2019-03-04 11:41         ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04  9:23 UTC (permalink / raw)
  To: Hans Holmberg
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 2361 bytes --]

> On 4 Mar 2019, at 10.02, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
> 
> Igor: Have you seen this happening in real life?
> 
> I think it would be better to count all expected errors and put them
> in the right bucket (without spamming dmesg). If we need a new bucket
> for i.e. vendor-specific-errors, let's do that instead.
> 
> Someone wiser than me told me that every error print in the log is a
> potential customer call.
> 
> Javier: Yeah, I think S.M.A.R.T is the way to deliver this
> information. Why can't we let the drives expose this info and remove
> this from pblk? What's blocking that?

Until now the spec. We added some new log information in Denali exactly
for this. But since pblk supports OCSSD 1.2 and 2.0 I think it is needed to
have it here, at least for debugging.

> 
> Thanks,
> Hans
> 
> On Mon, Mar 4, 2019 at 8:42 AM Javier González <javier@javigon.com> wrote:
>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>> 
>>> Currently when unknown error occurs on read path
>>> there is only dmesg information about it, but it
>>> is not counted in sysfs statistics. Since this is
>>> still an error we should also count it there.
>>> 
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-core.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>> index eabcbc119681..a98b2255f963 100644
>>> --- a/drivers/lightnvm/pblk-core.c
>>> +++ b/drivers/lightnvm/pblk-core.c
>>> @@ -493,6 +493,7 @@ void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd)
>>>              atomic_long_inc(&pblk->read_failed);
>>>              break;
>>>      default:
>>> +             atomic_long_inc(&pblk->read_failed);
>>>              pblk_err(pblk, "unknown read error:%d\n", rqd->error);
>>>      }
>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>> --
>>> 2.17.1
>> 
>> I left this out intentionally  so that we could correlate the logs from
>> the controller and the errors in the read path. Since we do not have an
>> standard way to correlate this on SMART yet, let’s add this now (I
>> assume that you are using it for something?) and we can separate the
>> error stats in the future.
>> 
>> Reviewed-by: Javier González <javier@javigon.com>

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 07/13] lightnvm: pblk: Cleanly fail when there is not enough memory
  2019-03-04  7:53   ` Javier González
@ 2019-03-04  9:24     ` Hans Holmberg
  2019-03-04 12:46       ` Igor Konopko
  0 siblings, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04  9:24 UTC (permalink / raw)
  To: Javier González
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

Hi Igor,

I think you need to motivate (and document) why each of the new flags
are needed.

Thanks,
Hans

On Mon, Mar 4, 2019 at 8:53 AM Javier González <javier@javigon.com> wrote:
>
> > On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >
> > L2P table can be huge in many cases, since
> > it typically requires 1GB of DRAM for 1TB
> > of drive. When there is not enough memory
> > available, OOM killer turns on and kills
> > random processes, which can be very annoying
> > for users. This patch changes the flag for
> > L2P table allocation on order to handle this
> > situation in more user friendly way
> >
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > ---
> > drivers/lightnvm/pblk-init.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> > index 8b643d0bffae..e553105b7ba1 100644
> > --- a/drivers/lightnvm/pblk-init.c
> > +++ b/drivers/lightnvm/pblk-init.c
> > @@ -164,9 +164,14 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
> >       int ret = 0;
> >
> >       map_size = pblk_trans_map_size(pblk);
> > -     pblk->trans_map = vmalloc(map_size);
> > -     if (!pblk->trans_map)
> > +     pblk->trans_map = __vmalloc(map_size, GFP_KERNEL | __GFP_NOWARN
> > +                                     | __GFP_RETRY_MAYFAIL | __GFP_HIGHMEM,
> > +                                     PAGE_KERNEL);
> > +     if (!pblk->trans_map) {
> > +             pblk_err(pblk, "failed to allocate L2P (need %ld of memory)\n",
> > +                             map_size);
> >               return -ENOMEM;
> > +     }
> >
> >       pblk_ppa_set_empty(&ppa);
> >
> > --
> > 2.17.1
>
> Is there any extra consideration we should take when enabling high
> memory for the L2P table? If not, looks good to me.
>
> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 10/13] lightnvm: pblk: Reduce L2P DRAM footprint
  2019-03-04  8:17   ` Javier González
@ 2019-03-04  9:29     ` Hans Holmberg
  0 siblings, 0 replies; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04  9:29 UTC (permalink / raw)
  To: Javier González
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

Great!

Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>

On Mon, Mar 4, 2019 at 9:17 AM Javier González <javier@javigon.com> wrote:
>
> > On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >
> > Currently L2P map size is calculated based on
> > the total number of available sectors, which is
> > redundant, since it contains some number of
> > over provisioning (11% by default).
> >
> > The goal of this patch is to change this size
> > to the real capacity and thus reduce the DRAM
> > footprint significantly - with default op value
> > it is approx. 110MB of DRAM less for every 1TB
> > of pblk drive.
> >
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > ---
> > drivers/lightnvm/pblk-core.c     | 8 ++++----
> > drivers/lightnvm/pblk-init.c     | 7 +++----
> > drivers/lightnvm/pblk-read.c     | 2 +-
> > drivers/lightnvm/pblk-recovery.c | 2 +-
> > drivers/lightnvm/pblk.h          | 1 -
> > 5 files changed, 9 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> > index f48f2e77f770..2e424c0275c1 100644
> > --- a/drivers/lightnvm/pblk-core.c
> > +++ b/drivers/lightnvm/pblk-core.c
> > @@ -2024,7 +2024,7 @@ void pblk_update_map(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
> >       struct ppa_addr ppa_l2p;
> >
> >       /* logic error: lba out-of-bounds. Ignore update */
> > -     if (!(lba < pblk->rl.nr_secs)) {
> > +     if (!(lba < pblk->capacity)) {
> >               WARN(1, "pblk: corrupted L2P map request\n");
> >               return;
> >       }
> > @@ -2064,7 +2064,7 @@ int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa_new,
> > #endif
> >
> >       /* logic error: lba out-of-bounds. Ignore update */
> > -     if (!(lba < pblk->rl.nr_secs)) {
> > +     if (!(lba < pblk->capacity)) {
> >               WARN(1, "pblk: corrupted L2P map request\n");
> >               return 0;
> >       }
> > @@ -2110,7 +2110,7 @@ void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
> >       }
> >
> >       /* logic error: lba out-of-bounds. Ignore update */
> > -     if (!(lba < pblk->rl.nr_secs)) {
> > +     if (!(lba < pblk->capacity)) {
> >               WARN(1, "pblk: corrupted L2P map request\n");
> >               return;
> >       }
> > @@ -2168,7 +2168,7 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
> >               lba = lba_list[i];
> >               if (lba != ADDR_EMPTY) {
> >                       /* logic error: lba out-of-bounds. Ignore update */
> > -                     if (!(lba < pblk->rl.nr_secs)) {
> > +                     if (!(lba < pblk->capacity)) {
> >                               WARN(1, "pblk: corrupted L2P map request\n");
> >                               continue;
> >                       }
> > diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> > index e553105b7ba1..9913a4514eb6 100644
> > --- a/drivers/lightnvm/pblk-init.c
> > +++ b/drivers/lightnvm/pblk-init.c
> > @@ -105,7 +105,7 @@ static size_t pblk_trans_map_size(struct pblk *pblk)
> >       if (pblk->addrf_len < 32)
> >               entry_size = 4;
> >
> > -     return entry_size * pblk->rl.nr_secs;
> > +     return entry_size * pblk->capacity;
> > }
> >
> > #ifdef CONFIG_NVM_PBLK_DEBUG
> > @@ -175,7 +175,7 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
> >
> >       pblk_ppa_set_empty(&ppa);
> >
> > -     for (i = 0; i < pblk->rl.nr_secs; i++)
> > +     for (i = 0; i < pblk->capacity; i++)
> >               pblk_trans_map_set(pblk, i, ppa);
> >
> >       ret = pblk_l2p_recover(pblk, factory_init);
> > @@ -706,7 +706,6 @@ static int pblk_set_provision(struct pblk *pblk, int nr_free_chks)
> >        * on user capacity consider only provisioned blocks
> >        */
> >       pblk->rl.total_blocks = nr_free_chks;
> > -     pblk->rl.nr_secs = nr_free_chks * geo->clba;
> >
> >       /* Consider sectors used for metadata */
> >       sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
> > @@ -1289,7 +1288,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
> >
> >       pblk_info(pblk, "luns:%u, lines:%d, secs:%llu, buf entries:%u\n",
> >                       geo->all_luns, pblk->l_mg.nr_lines,
> > -                     (unsigned long long)pblk->rl.nr_secs,
> > +                     (unsigned long long)pblk->capacity,
> >                       pblk->rwb.nr_entries);
> >
> >       wake_up_process(pblk->writer_ts);
> > diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> > index 39c1d6ccaedb..65697463def8 100644
> > --- a/drivers/lightnvm/pblk-read.c
> > +++ b/drivers/lightnvm/pblk-read.c
> > @@ -561,7 +561,7 @@ static int read_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
> >               goto out;
> >
> >       /* logic error: lba out-of-bounds */
> > -     if (lba >= pblk->rl.nr_secs) {
> > +     if (lba >= pblk->capacity) {
> >               WARN(1, "pblk: read lba out of bounds\n");
> >               goto out;
> >       }
> > diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> > index d86f580036d3..83b467b5edc7 100644
> > --- a/drivers/lightnvm/pblk-recovery.c
> > +++ b/drivers/lightnvm/pblk-recovery.c
> > @@ -474,7 +474,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
> >
> >               lba_list[paddr++] = cpu_to_le64(lba);
> >
> > -             if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
> > +             if (lba == ADDR_EMPTY || lba >= pblk->capacity)
> >                       continue;
> >
> >               line->nr_valid_lbas++;
> > diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> > index a6386d5acd73..a92377530930 100644
> > --- a/drivers/lightnvm/pblk.h
> > +++ b/drivers/lightnvm/pblk.h
> > @@ -305,7 +305,6 @@ struct pblk_rl {
> >
> >       struct timer_list u_timer;
> >
> > -     unsigned long long nr_secs;
> >       unsigned long total_blocks;
> >
> >       atomic_t free_blocks;           /* Total number of free blocks (+ OP) */
> > --
> > 2.17.1
>
> Looks good to me.
>
> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio
  2019-03-04  8:03   ` Javier González
@ 2019-03-04  9:35     ` Hans Holmberg
  2019-03-04  9:48       ` Javier González
  0 siblings, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04  9:35 UTC (permalink / raw)
  To: Javier González
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>
> > On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >
> > Currently in case of read errors, bi_status is not
> > set properly which leads to returning inproper data
> > to higher layer. This patch fix that by setting proper
> > status in case of read errors
> >
> > Patch also removes unnecessary warn_once(), which does
> > not make sense in that place, since user bio is not used
> > for interation with drive and thus bi_status will not be
> > set here.
> >
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > ---
> > drivers/lightnvm/pblk-read.c | 11 +++++------
> > 1 file changed, 5 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> > index 3789185144da..39c1d6ccaedb 100644
> > --- a/drivers/lightnvm/pblk-read.c
> > +++ b/drivers/lightnvm/pblk-read.c
> > @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
> >       WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
> > }
> >
> > -static void pblk_end_user_read(struct bio *bio)
> > +static void pblk_end_user_read(struct bio *bio, int error)
> > {
> > -#ifdef CONFIG_NVM_PBLK_DEBUG
> > -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
> > -#endif
> > +     if (error && error != NVM_RSP_WARN_HIGHECC)
> > +             bio_io_error(bio);
> >       bio_endio(bio);
> > }
> >
> > @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
> >       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
> >       struct bio *bio = (struct bio *)r_ctx->private;
> >
> > -     pblk_end_user_read(bio);
> > +     pblk_end_user_read(bio, rqd->error);
> >       __pblk_end_io_read(pblk, rqd, true);
> > }
> >
> > @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
> >       rqd->bio = NULL;
> >       rqd->nr_ppas = nr_secs;
> >
> > -     bio_endio(bio);
> > +     pblk_end_user_read(bio, rqd->error);
> >       __pblk_end_io_read(pblk, rqd, false);
> > }
> >
> > --
> > 2.17.1
>
> This is by design. We do not report the read errors as in any other
> block device - this is why we clone the read bio.

Could you elaborate on why not reporting read errors is a good thing in pblk?

>
> If you want to remove the WARN_ONCE, it is fine by me - it helped in the
> past to debug the read path as when one read failed we go a storm of
> them. Now we have better controller tools to debug this, so if nobody
> else wants it there, let’s remove it.
>
> Javier

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 09/13] lightnvm: pblk: Kick writer for flush requests
  2019-03-04  8:08   ` Javier González
@ 2019-03-04  9:39     ` Hans Holmberg
  2019-03-04 12:52       ` Igor Konopko
  0 siblings, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04  9:39 UTC (permalink / raw)
  To: Javier González
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

On Mon, Mar 4, 2019 at 9:08 AM Javier González <javier@javigon.com> wrote:
>
>
> > On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >
> > In case when there is no enough sector available in rwb
> > and there is flush request send we should kick write thread
> > which is not a case in current implementation. This patch
> > fixes that issue.
> >
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > ---
> > drivers/lightnvm/pblk-core.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> > index 78b1eea4ab67..f48f2e77f770 100644
> > --- a/drivers/lightnvm/pblk-core.c
> > +++ b/drivers/lightnvm/pblk-core.c
> > @@ -375,8 +375,9 @@ void pblk_write_timer_fn(struct timer_list *t)
> > void pblk_write_should_kick(struct pblk *pblk)
> > {
> >       unsigned int secs_avail = pblk_rb_read_count(&pblk->rwb);
> > +     unsigned int secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
> >
> > -     if (secs_avail >= pblk->min_write_pgs_data)
> > +     if (secs_avail >= pblk->min_write_pgs_data || secs_to_flush)
> >               pblk_write_kick(pblk);
> > }
> >
> > --
> > 2.17.1
>
> We already kick the write thread in case of REQ_PREFLUSH in
> pblk_write_cache(), so no need to kick again.

Yeah, I thought i fixed this issue in:

cc9c9a00b10e ("lightnvm: pblk: kick writer on new flush points")

That commit brought down the test time of some of the xfs sync tests
with a factor of 20 or so.

Igor: Have you seen any case of delayed syncs?

>
> Javier

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 11/13] lightnvm: pblk: Remove unused smeta_ssec field
  2019-03-04  8:21   ` Javier González
@ 2019-03-04  9:40     ` Hans Holmberg
  0 siblings, 0 replies; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04  9:40 UTC (permalink / raw)
  To: Javier González
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

Good riddance!

Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>

On Mon, Mar 4, 2019 at 9:21 AM Javier González <javier@javigon.com> wrote:
>
> > On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >
> > Smeta_ssec field in pblk_line is set once and
> > never used, since it was replaced by function
> > pblk_line_smeta_start(). This patch removes
> > this no longer needed field
> >
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > ---
> > drivers/lightnvm/pblk-core.c | 1 -
> > drivers/lightnvm/pblk.h      | 1 -
> > 2 files changed, 2 deletions(-)
> >
> > diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> > index 2e424c0275c1..fa4dc05608ff 100644
> > --- a/drivers/lightnvm/pblk-core.c
> > +++ b/drivers/lightnvm/pblk-core.c
> > @@ -1165,7 +1165,6 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
> >       off = bit * geo->ws_opt;
> >       bitmap_set(line->map_bitmap, off, lm->smeta_sec);
> >       line->sec_in_line -= lm->smeta_sec;
> > -     line->smeta_ssec = off;
> >       line->cur_sec = off + lm->smeta_sec;
> >
> >       if (init && pblk_line_smeta_write(pblk, line, off)) {
> > diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> > index a92377530930..b266563508e6 100644
> > --- a/drivers/lightnvm/pblk.h
> > +++ b/drivers/lightnvm/pblk.h
> > @@ -464,7 +464,6 @@ struct pblk_line {
> >       int meta_line;                  /* Metadata line id */
> >       int meta_distance;              /* Distance between data and metadata */
> >
> > -     u64 smeta_ssec;                 /* Sector where smeta starts */
> >       u64 emeta_ssec;                 /* Sector where emeta starts */
> >
> >       unsigned int sec_in_line;       /* Number of usable secs in line */
> > --
> > 2.17.1
>
> Looks good to me.
>
> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio
  2019-03-04  9:35     ` Hans Holmberg
@ 2019-03-04  9:48       ` Javier González
  2019-03-04 12:14         ` Hans Holmberg
  2019-03-04 13:04         ` Matias Bjørling
  0 siblings, 2 replies; 91+ messages in thread
From: Javier González @ 2019-03-04  9:48 UTC (permalink / raw)
  To: Hans Holmberg
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 3025 bytes --]



> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
> 
> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>> 
>>> Currently in case of read errors, bi_status is not
>>> set properly which leads to returning inproper data
>>> to higher layer. This patch fix that by setting proper
>>> status in case of read errors
>>> 
>>> Patch also removes unnecessary warn_once(), which does
>>> not make sense in that place, since user bio is not used
>>> for interation with drive and thus bi_status will not be
>>> set here.
>>> 
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>> index 3789185144da..39c1d6ccaedb 100644
>>> --- a/drivers/lightnvm/pblk-read.c
>>> +++ b/drivers/lightnvm/pblk-read.c
>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>      WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>> }
>>> 
>>> -static void pblk_end_user_read(struct bio *bio)
>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>> {
>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>> -#endif
>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>> +             bio_io_error(bio);
>>>      bio_endio(bio);
>>> }
>>> 
>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>      struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>      struct bio *bio = (struct bio *)r_ctx->private;
>>> 
>>> -     pblk_end_user_read(bio);
>>> +     pblk_end_user_read(bio, rqd->error);
>>>      __pblk_end_io_read(pblk, rqd, true);
>>> }
>>> 
>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>      rqd->bio = NULL;
>>>      rqd->nr_ppas = nr_secs;
>>> 
>>> -     bio_endio(bio);
>>> +     pblk_end_user_read(bio, rqd->error);
>>>      __pblk_end_io_read(pblk, rqd, false);
>>> }
>>> 
>>> --
>>> 2.17.1
>> 
>> This is by design. We do not report the read errors as in any other
>> block device - this is why we clone the read bio.
> 
> Could you elaborate on why not reporting read errors is a good thing in pblk?
> 

Normal block devices do not report read errors on the completion path
unless it is a fatal error. This is actually not well understood by the
upper layers, which tend to assume that the device is completely broken.

This is a challenge for OCSSD / Denali / Zone devices as there are cases
where reads can fail. Unfortunately at this point, we need to mask these
errors and deal with them in the different layers.

For OCSSD currently, we do this in pblk, which I think fits well the
model as we exposed a normal block device.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 12/13] lightnvm: pblk: close opened chunks
  2019-03-04  8:27   ` Javier González
@ 2019-03-04 10:05     ` Hans Holmberg
  2019-03-04 12:56       ` Igor Konopko
  2019-03-04 13:19       ` Matias Bjørling
  2019-03-04 13:18     ` Matias Bjørling
  1 sibling, 2 replies; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04 10:05 UTC (permalink / raw)
  To: Javier González, Matias Bjorling
  Cc: Konopko, Igor J, Hans Holmberg, linux-block, Klaus Jensen

On Mon, Mar 4, 2019 at 9:46 AM Javier González <javier@javigon.com> wrote:
>
> > On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >
> > When we creating pblk instance with factory
> > flag, there is a possibility that some chunks
> > are in open state, which does not allow to
> > issue erase request to them directly. Such a
> > chunk should be filled with some empty data in
> > order to achieve close state. Without that we
> > are risking that some erase operation will be
> > rejected by the drive due to inproper chunk
> > state.
> >
> > This patch implements closing chunk logic in pblk
> > for that case, when creating instance with factory
> > flag in order to fix that issue
> >
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > ---
> > drivers/lightnvm/pblk-core.c | 114 +++++++++++++++++++++++++++++++++++
> > drivers/lightnvm/pblk-init.c |  14 ++++-
> > drivers/lightnvm/pblk.h      |   2 +
> > 3 files changed, 128 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> > index fa4dc05608ff..d3c45393f093 100644
> > --- a/drivers/lightnvm/pblk-core.c
> > +++ b/drivers/lightnvm/pblk-core.c
> > @@ -161,6 +161,120 @@ struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
> >       return meta + ch_off + lun_off + chk_off;
> > }
> >
> > +static void pblk_close_chunk(struct pblk *pblk, struct ppa_addr ppa, int count)
> > +{
> > +     struct nvm_tgt_dev *dev = pblk->dev;
> > +     struct nvm_geo *geo = &dev->geo;
> > +     struct bio *bio;
> > +     struct ppa_addr *ppa_list;
> > +     struct nvm_rq rqd;
> > +     void *meta_list, *data;
> > +     dma_addr_t dma_meta_list, dma_ppa_list;
> > +     int i, rq_ppas, rq_len, ret;
> > +
> > +     meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
> > +     if (!meta_list)
> > +             return;
> > +
> > +     ppa_list = meta_list + pblk_dma_meta_size(pblk);
> > +     dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
> > +
> > +     rq_ppas = pblk_calc_secs(pblk, count, 0, false);
> > +     if (!rq_ppas)
> > +             rq_ppas = pblk->min_write_pgs;
> > +     rq_len = rq_ppas * geo->csecs;
> > +
> > +     data = kzalloc(rq_len, GFP_KERNEL);
> > +     if (!data)
> > +             goto free_meta_list;
> > +
> > +     memset(&rqd, 0, sizeof(struct nvm_rq));
> > +     rqd.opcode = NVM_OP_PWRITE;
> > +     rqd.nr_ppas = rq_ppas;
> > +     rqd.meta_list = meta_list;
> > +     rqd.ppa_list = ppa_list;
> > +     rqd.dma_ppa_list = dma_ppa_list;
> > +     rqd.dma_meta_list = dma_meta_list;
> > +
> > +next_rq:
> > +     bio = bio_map_kern(dev->q, data, rq_len, GFP_KERNEL);
> > +     if (IS_ERR(bio))
> > +             goto out_next;
> > +
> > +     bio->bi_iter.bi_sector = 0; /* artificial bio */
> > +     bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> > +
> > +     rqd.bio = bio;
> > +     for (i = 0; i < rqd.nr_ppas; i++) {
> > +             rqd.ppa_list[i] = ppa;
> > +             rqd.ppa_list[i].m.sec += i;
> > +             pblk_get_meta(pblk, meta_list, i)->lba =
> > +                                     cpu_to_le64(ADDR_EMPTY);
> > +     }
> > +
> > +     ret = nvm_submit_io_sync(dev, &rqd);
> > +     if (ret) {
> > +             bio_put(bio);
> > +             goto out_next;
> > +     }
> > +
> > +     if (rqd.error)
> > +             goto free_data;
> > +
> > +out_next:
> > +     count -= rqd.nr_ppas;
> > +     ppa.m.sec += rqd.nr_ppas;
> > +     if (count > 0)
> > +             goto next_rq;
> > +
> > +free_data:
> > +     kfree(data);
> > +free_meta_list:
> > +     nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
> > +}
> > +
> > +void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
> > +{
> > +     struct nvm_tgt_dev *dev = pblk->dev;
> > +     struct nvm_geo *geo = &dev->geo;
> > +     struct nvm_chk_meta *chunk_meta;
> > +     struct ppa_addr ppa;
> > +     int i, j, k, count;
> > +
> > +     for (i = 0; i < geo->num_chk; i++) {
> > +             for (j = 0; j < geo->num_lun; j++) {
> > +                     for (k = 0; k < geo->num_ch; k++) {
> > +                             ppa.ppa = 0;
> > +                             ppa.m.grp = k;
> > +                             ppa.m.pu = j;
> > +                             ppa.m.chk = i;
> > +
> > +                             chunk_meta = pblk_chunk_get_off(pblk,
> > +                                                             meta, ppa);
> > +                             if (chunk_meta->state == NVM_CHK_ST_OPEN) {
> > +                                     ppa.m.sec = chunk_meta->wp;
> > +                                     count = geo->clba - chunk_meta->wp;
> > +                                     pblk_close_chunk(pblk, ppa, count);
> > +                             }
> > +                     }
> > +             }
> > +     }
> > +}
> > +
> > +bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
> > +{
> > +     struct nvm_tgt_dev *dev = pblk->dev;
> > +     struct nvm_geo *geo = &dev->geo;
> > +     int i;
> > +
> > +     for (i = 0; i < geo->all_luns; i++) {
> > +             if (meta[i].state == NVM_CHK_ST_OPEN)
> > +                     return true;
> > +     }
> > +
> > +     return false;
> > +}
> > +
> > void __pblk_map_invalidate(struct pblk *pblk, struct pblk_line *line,
> >                          u64 paddr)
> > {
> > diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> > index 9913a4514eb6..83abe6960b46 100644
> > --- a/drivers/lightnvm/pblk-init.c
> > +++ b/drivers/lightnvm/pblk-init.c
> > @@ -1028,13 +1028,14 @@ static int pblk_line_meta_init(struct pblk *pblk)
> >       return 0;
> > }
> >
> > -static int pblk_lines_init(struct pblk *pblk)
> > +static int pblk_lines_init(struct pblk *pblk, bool factory_init)
> > {
> >       struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> >       struct pblk_line *line;
> >       void *chunk_meta;
> >       int nr_free_chks = 0;
> >       int i, ret;
> > +     bool retry = false;
> >
> >       ret = pblk_line_meta_init(pblk);
> >       if (ret)
> > @@ -1048,12 +1049,21 @@ static int pblk_lines_init(struct pblk *pblk)
> >       if (ret)
> >               goto fail_free_meta;
> >
> > +get_chunk_meta:
> >       chunk_meta = pblk_get_chunk_meta(pblk);
> >       if (IS_ERR(chunk_meta)) {
> >               ret = PTR_ERR(chunk_meta);
> >               goto fail_free_luns;
> >       }
> >
> > +     if (factory_init && !retry &&
> > +         pblk_are_opened_chunks(pblk, chunk_meta)) {
> > +             pblk_close_opened_chunks(pblk, chunk_meta);
> > +             retry = true;
> > +             vfree(chunk_meta);
> > +             goto get_chunk_meta;
> > +     }
> > +
> >       pblk->lines = kcalloc(l_mg->nr_lines, sizeof(struct pblk_line),
> >                                                               GFP_KERNEL);
> >       if (!pblk->lines) {
> > @@ -1244,7 +1254,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
> >               goto fail;
> >       }
> >
> > -     ret = pblk_lines_init(pblk);
> > +     ret = pblk_lines_init(pblk, flags & NVM_TARGET_FACTORY);
> >       if (ret) {
> >               pblk_err(pblk, "could not initialize lines\n");
> >               goto fail_free_core;
> > diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> > index b266563508e6..b248642c4dfb 100644
> > --- a/drivers/lightnvm/pblk.h
> > +++ b/drivers/lightnvm/pblk.h
> > @@ -793,6 +793,8 @@ struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk);
> > struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
> >                                             struct nvm_chk_meta *lp,
> >                                             struct ppa_addr ppa);
> > +void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
> > +bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
> > void pblk_log_write_err(struct pblk *pblk, struct nvm_rq *rqd);
> > void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd);
> > int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd);
> > --
> > 2.17.1
>
> I know that the OCSSD 2.0 spec does not allow to transition from open to
> free, but to me this is a spec bug as there is no underlying issue in
> reading an open block. Note that all controllers I know support this,
> and the upcoming Denali spec. fixes this too.

+ Klaus whom I discussed this with.
Yeah, i think that "early reset" is a nice feature. It would be nice
to extend the OCSSD spec with a new capabilities bit indicating if
this is indeed supported or not.
Matias: what do you think?

>
> Besides, the factory flag is intended to start a pblk instance
> immediately, without having to pay the price of padding any past device
> state.  If you still want to do this, I think this belongs in a user space tool.
>

Hear, hear!

Serially padding any open chunks during -f create call would be
terrible user experience.
Lets say that padding a chunk takes one second, we would, in a
worst-case scenario in an example disk, be stuck in a syscall for
1*64*1000 seconds ~ 17 hours.

A tool, like dm-zoned's dmzadm would be the right approach, see
Documentation/device-mapper/dm-zoned.txt
All new pblk instances would then have to be pre-formatted with "pblkadm"

A new physical storage format containing a superblock would also be a good idea.

/ Hans

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-03-04  9:05   ` Javier González
@ 2019-03-04 11:30     ` Hans Holmberg
  2019-03-04 11:44       ` Javier González
  0 siblings, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04 11:30 UTC (permalink / raw)
  To: Javier González, Igor Konopko, Matias Bjorling
  Cc: Hans Holmberg, linux-block, Simon Lund, Klaus Jensen

On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote:
>
> > On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >
> > Current lightnvm and pblk implementation does not care
> > about NVMe max data transfer size, which can be smaller
> > than 64*K=256K. This patch fixes issues related to that.

Could you describe *what* issues you are fixing?

> >
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > ---
> > drivers/lightnvm/core.c      | 9 +++++++--
> > drivers/nvme/host/lightnvm.c | 1 +
> > include/linux/lightnvm.h     | 1 +
> > 3 files changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> > index 5f82036fe322..c01f83b8fbaf 100644
> > --- a/drivers/lightnvm/core.c
> > +++ b/drivers/lightnvm/core.c
> > @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> >       struct nvm_target *t;
> >       struct nvm_tgt_dev *tgt_dev;
> >       void *targetdata;
> > +     unsigned int mdts;
> >       int ret;
> >
> >       switch (create->conf.type) {
> > @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> >       tdisk->private_data = targetdata;
> >       tqueue->queuedata = targetdata;
> >
> > -     blk_queue_max_hw_sectors(tqueue,
> > -                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> > +     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
> > +     if (dev->geo.mdts) {
> > +             mdts = min_t(u32, dev->geo.mdts,
> > +                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> > +     }
> > +     blk_queue_max_hw_sectors(tqueue, mdts);
> >
> >       set_capacity(tdisk, tt->capacity(targetdata));
> >       add_disk(tdisk);
> > diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> > index b759c25c89c8..b88a39a3cbd1 100644
> > --- a/drivers/nvme/host/lightnvm.c
> > +++ b/drivers/nvme/host/lightnvm.c
> > @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
> >       geo->csecs = 1 << ns->lba_shift;
> >       geo->sos = ns->ms;
> >       geo->ext = ns->ext;
> > +     geo->mdts = ns->ctrl->max_hw_sectors;
> >
> >       dev->q = q;
> >       memcpy(dev->name, disk_name, DISK_NAME_LEN);
> > diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
> > index 5d865a5d5cdc..d3b02708e5f0 100644
> > --- a/include/linux/lightnvm.h
> > +++ b/include/linux/lightnvm.h
> > @@ -358,6 +358,7 @@ struct nvm_geo {
> >       u16     csecs;          /* sector size */
> >       u16     sos;            /* out-of-band area size */
> >       bool    ext;            /* metadata in extended data buffer */
> > +     u32     mdts;           /* Max data transfer size*/
> >
> >       /* device write constrains */
> >       u32     ws_min;         /* minimum write size */
> > --
> > 2.17.1
>
> I see where you are going with this and I partially agree, but none of
> the OCSSD specs define a way to define this parameter. Thus, adding this
> behavior taken from NVMe in Linux can break current implementations. Is
> this a real life problem for you? Or this is just for NVMe “correctness”?
>
> Javier

Hmm.Looking into the 2.0 spec what it says about vector reads:

(figure 28):"The number of Logical Blocks (NLB): This field indicates
the number of logical blocks to be read. This is a 0’s based value.
Maximum of 64 LBAs is supported."

You got the max limit covered, and the spec  does not say anything
about the minimum number of LBAs to support.

Matias: any thoughts on this?

Javier: How would this patch break current implementations?

Igor: how does this patch fix the mdts restriction? There are no
checks on i.e. the gc read path that ensures that a lower limit than
NVM_MAX_VLBA is enforced.

Thanks,
Hans

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/13] lightnvm: pblk: Count all read errors in stats
  2019-03-04  9:23       ` Javier González
@ 2019-03-04 11:41         ` Hans Holmberg
  2019-03-04 11:45           ` Javier González
  0 siblings, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04 11:41 UTC (permalink / raw)
  To: Javier González
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

On Mon, Mar 4, 2019 at 10:23 AM Javier González <javier@javigon.com> wrote:
>
> > On 4 Mar 2019, at 10.02, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
> >
> > Igor: Have you seen this happening in real life?
> >
> > I think it would be better to count all expected errors and put them
> > in the right bucket (without spamming dmesg). If we need a new bucket
> > for i.e. vendor-specific-errors, let's do that instead.
> >
> > Someone wiser than me told me that every error print in the log is a
> > potential customer call.
> >
> > Javier: Yeah, I think S.M.A.R.T is the way to deliver this
> > information. Why can't we let the drives expose this info and remove
> > this from pblk? What's blocking that?
>
> Until now the spec. We added some new log information in Denali exactly
> for this. But since pblk supports OCSSD 1.2 and 2.0 I think it is needed to
> have it here, at least for debugging.

Why add it to the spec? Why not use whatever everyone else is using?

https://en.wikipedia.org/wiki/S.M.A.R.T. :
"S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often
written as SMART) is a monitoring system included in computer hard
disk drives (HDDs), solid-state drives (SSDs),[1] and eMMC drives. Its
primary function is to detect and report various indicators of drive
reliability with the intent of anticipating imminent hardware
failures."
Sounds like what we want here.

For debugging, a trace point or something(i.e. BPF) would be a better
solution that would not impact hot-path performance.

>
> >
> > Thanks,
> > Hans
> >
> > On Mon, Mar 4, 2019 at 8:42 AM Javier González <javier@javigon.com> wrote:
> >>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>>
> >>> Currently when unknown error occurs on read path
> >>> there is only dmesg information about it, but it
> >>> is not counted in sysfs statistics. Since this is
> >>> still an error we should also count it there.
> >>>
> >>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>> ---
> >>> drivers/lightnvm/pblk-core.c | 1 +
> >>> 1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> >>> index eabcbc119681..a98b2255f963 100644
> >>> --- a/drivers/lightnvm/pblk-core.c
> >>> +++ b/drivers/lightnvm/pblk-core.c
> >>> @@ -493,6 +493,7 @@ void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd)
> >>>              atomic_long_inc(&pblk->read_failed);
> >>>              break;
> >>>      default:
> >>> +             atomic_long_inc(&pblk->read_failed);
> >>>              pblk_err(pblk, "unknown read error:%d\n", rqd->error);
> >>>      }
> >>> #ifdef CONFIG_NVM_PBLK_DEBUG
> >>> --
> >>> 2.17.1
> >>
> >> I left this out intentionally  so that we could correlate the logs from
> >> the controller and the errors in the read path. Since we do not have an
> >> standard way to correlate this on SMART yet, let’s add this now (I
> >> assume that you are using it for something?) and we can separate the
> >> error stats in the future.
> >>
> >> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned
  2019-03-04  9:11       ` Javier González
@ 2019-03-04 11:43         ` Hans Holmberg
  2019-03-04 12:44           ` Igor Konopko
  0 siblings, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04 11:43 UTC (permalink / raw)
  To: Javier González, Igor Konopko
  Cc: Matias Bjørling, Hans Holmberg, linux-block

On Mon, Mar 4, 2019 at 10:11 AM Javier González <javier@javigon.com> wrote:
>
> > On 4 Mar 2019, at 10.05, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
> >
> > I strongly disagree with adding code that would mask implantation errors.
> >
> > If we want more internal checks, we could add an if statement that
> > would only be compiled in if CONFIG_NVM_PBLK_DEBUG is enabled.
> >
>
> Not sure who this is for - better not to top post.
>
> In any case, this is a spec grey zone. I’m ok with cleaning the bits as
> they mean nothing for the reset command. If you feel that strongly about
> this, you can take if with Igor.

Pardon the top-post. It was meant for both you and Igor.

>
> >
> > On Mon, Mar 4, 2019 at 8:48 AM Javier González <javier@javigon.com> wrote:
> >>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>>
> >>> In current pblk implementation of erase command
> >>> there is a chance tha sector bits are set to some
> >>> random values for erase PPA. This is unexpected
> >>> situation, since erase shall be always chunk
> >>> aligned. This patch fixes that issue
> >>>
> >>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>> ---
> >>> drivers/lightnvm/pblk-core.c | 1 +
> >>> drivers/lightnvm/pblk-map.c  | 2 ++
> >>> 2 files changed, 3 insertions(+)
> >>>
> >>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> >>> index a98b2255f963..78b1eea4ab67 100644
> >>> --- a/drivers/lightnvm/pblk-core.c
> >>> +++ b/drivers/lightnvm/pblk-core.c
> >>> @@ -978,6 +978,7 @@ int pblk_line_erase(struct pblk *pblk, struct pblk_line *line)
> >>>
> >>>              ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >>>              ppa.a.blk = line->id;
> >>> +             ppa.a.reserved = 0;
> >>>
> >>>              atomic_dec(&line->left_eblks);
> >>>              WARN_ON(test_and_set_bit(bit, line->erase_bitmap));
> >>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
> >>> index 79df583ea709..aea46b4ec40f 100644
> >>> --- a/drivers/lightnvm/pblk-map.c
> >>> +++ b/drivers/lightnvm/pblk-map.c
> >>> @@ -161,6 +161,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
> >>>
> >>>                      *erase_ppa = ppa_list[i];
> >>>                      erase_ppa->a.blk = e_line->id;
> >>> +                     erase_ppa->a.reserved = 0;
> >>>
> >>>                      spin_unlock(&e_line->lock);
> >>>
> >>> @@ -202,6 +203,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
> >>>              atomic_dec(&e_line->left_eblks);
> >>>              *erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >>>              erase_ppa->a.blk = e_line->id;
> >>> +             erase_ppa->a.reserved = 0;
> >>>      }
> >>>
> >>>      return 0;
> >>> --
> >>> 2.17.1
> >>
> >> I’m fine with adding this, but note that there is actually no
> >> requirement for the erase to be chunk aligned - the only bits that
> >> should be looked at are group, PU and chunk.
> >>
> >> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-03-04 11:30     ` Hans Holmberg
@ 2019-03-04 11:44       ` Javier González
  2019-03-04 12:22         ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04 11:44 UTC (permalink / raw)
  To: Hans Holmberg
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg,
	linux-block, Simon Andreas Frimann Lund, Klaus Birkelund Jensen

[-- Attachment #1: Type: text/plain, Size: 4241 bytes --]



> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote:
> 
> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote:
>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>> 
>>> Current lightnvm and pblk implementation does not care
>>> about NVMe max data transfer size, which can be smaller
>>> than 64*K=256K. This patch fixes issues related to that.
> 
> Could you describe *what* issues you are fixing?
> 
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/core.c      | 9 +++++++--
>>> drivers/nvme/host/lightnvm.c | 1 +
>>> include/linux/lightnvm.h     | 1 +
>>> 3 files changed, 9 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>> index 5f82036fe322..c01f83b8fbaf 100644
>>> --- a/drivers/lightnvm/core.c
>>> +++ b/drivers/lightnvm/core.c
>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>      struct nvm_target *t;
>>>      struct nvm_tgt_dev *tgt_dev;
>>>      void *targetdata;
>>> +     unsigned int mdts;
>>>      int ret;
>>> 
>>>      switch (create->conf.type) {
>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>      tdisk->private_data = targetdata;
>>>      tqueue->queuedata = targetdata;
>>> 
>>> -     blk_queue_max_hw_sectors(tqueue,
>>> -                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>> +     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
>>> +     if (dev->geo.mdts) {
>>> +             mdts = min_t(u32, dev->geo.mdts,
>>> +                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>> +     }
>>> +     blk_queue_max_hw_sectors(tqueue, mdts);
>>> 
>>>      set_capacity(tdisk, tt->capacity(targetdata));
>>>      add_disk(tdisk);
>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>> index b759c25c89c8..b88a39a3cbd1 100644
>>> --- a/drivers/nvme/host/lightnvm.c
>>> +++ b/drivers/nvme/host/lightnvm.c
>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
>>>      geo->csecs = 1 << ns->lba_shift;
>>>      geo->sos = ns->ms;
>>>      geo->ext = ns->ext;
>>> +     geo->mdts = ns->ctrl->max_hw_sectors;
>>> 
>>>      dev->q = q;
>>>      memcpy(dev->name, disk_name, DISK_NAME_LEN);
>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
>>> index 5d865a5d5cdc..d3b02708e5f0 100644
>>> --- a/include/linux/lightnvm.h
>>> +++ b/include/linux/lightnvm.h
>>> @@ -358,6 +358,7 @@ struct nvm_geo {
>>>      u16     csecs;          /* sector size */
>>>      u16     sos;            /* out-of-band area size */
>>>      bool    ext;            /* metadata in extended data buffer */
>>> +     u32     mdts;           /* Max data transfer size*/
>>> 
>>>      /* device write constrains */
>>>      u32     ws_min;         /* minimum write size */
>>> --
>>> 2.17.1
>> 
>> I see where you are going with this and I partially agree, but none of
>> the OCSSD specs define a way to define this parameter. Thus, adding this
>> behavior taken from NVMe in Linux can break current implementations. Is
>> this a real life problem for you? Or this is just for NVMe “correctness”?
>> 
>> Javier
> 
> Hmm.Looking into the 2.0 spec what it says about vector reads:
> 
> (figure 28):"The number of Logical Blocks (NLB): This field indicates
> the number of logical blocks to be read. This is a 0’s based value.
> Maximum of 64 LBAs is supported."
> 
> You got the max limit covered, and the spec  does not say anything
> about the minimum number of LBAs to support.
> 
> Matias: any thoughts on this?
> 
> Javier: How would this patch break current implementations?

Say an OCSSD controller that sets mdts to a value under 64 or does not
set it at all (maybe garbage). Think you can get to one pretty quickly...

> 
> Igor: how does this patch fix the mdts restriction? There are no
> checks on i.e. the gc read path that ensures that a lower limit than
> NVM_MAX_VLBA is enforced.

This is the other part where the implementation breaks.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/13] lightnvm: pblk: Count all read errors in stats
  2019-03-04 11:41         ` Hans Holmberg
@ 2019-03-04 11:45           ` Javier González
  2019-03-04 12:42             ` Igor Konopko
  0 siblings, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04 11:45 UTC (permalink / raw)
  To: Hans Holmberg
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 1861 bytes --]


> On 4 Mar 2019, at 12.41, Hans Holmberg <hans@owltronix.com> wrote:
> 
> On Mon, Mar 4, 2019 at 10:23 AM Javier González <javier@javigon.com> wrote:
>>> On 4 Mar 2019, at 10.02, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>> 
>>> Igor: Have you seen this happening in real life?
>>> 
>>> I think it would be better to count all expected errors and put them
>>> in the right bucket (without spamming dmesg). If we need a new bucket
>>> for i.e. vendor-specific-errors, let's do that instead.
>>> 
>>> Someone wiser than me told me that every error print in the log is a
>>> potential customer call.
>>> 
>>> Javier: Yeah, I think S.M.A.R.T is the way to deliver this
>>> information. Why can't we let the drives expose this info and remove
>>> this from pblk? What's blocking that?
>> 
>> Until now the spec. We added some new log information in Denali exactly
>> for this. But since pblk supports OCSSD 1.2 and 2.0 I think it is needed to
>> have it here, at least for debugging.
> 
> Why add it to the spec? Why not use whatever everyone else is using?
> 
> https://en.wikipedia.org/wiki/S.M.A.R.T. :
> "S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often
> written as SMART) is a monitoring system included in computer hard
> disk drives (HDDs), solid-state drives (SSDs),[1] and eMMC drives. Its
> primary function is to detect and report various indicators of drive
> reliability with the intent of anticipating imminent hardware
> failures."
> Sounds like what we want here.

I know what smart is… You need to define the fields. Maybe you want to
read Denali again - the extensions are couple with smart.

> For debugging, a trace point or something(i.e. BPF) would be a better
> solution that would not impact hot-path performance.

Cool. Look forward to the patches ;)

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio
  2019-03-04  9:48       ` Javier González
@ 2019-03-04 12:14         ` Hans Holmberg
  2019-03-04 12:51           ` Igor Konopko
  2019-03-04 13:04         ` Matias Bjørling
  1 sibling, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04 12:14 UTC (permalink / raw)
  To: Javier González
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg, linux-block

On Mon, Mar 4, 2019 at 10:48 AM Javier González <javier@javigon.com> wrote:
>
>
>
> > On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
> >
> > On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
> >>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>>
> >>> Currently in case of read errors, bi_status is not
> >>> set properly which leads to returning inproper data
> >>> to higher layer. This patch fix that by setting proper
> >>> status in case of read errors
> >>>
> >>> Patch also removes unnecessary warn_once(), which does
> >>> not make sense in that place, since user bio is not used
> >>> for interation with drive and thus bi_status will not be
> >>> set here.
> >>>
> >>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>> ---
> >>> drivers/lightnvm/pblk-read.c | 11 +++++------
> >>> 1 file changed, 5 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> >>> index 3789185144da..39c1d6ccaedb 100644
> >>> --- a/drivers/lightnvm/pblk-read.c
> >>> +++ b/drivers/lightnvm/pblk-read.c
> >>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
> >>>      WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
> >>> }
> >>>
> >>> -static void pblk_end_user_read(struct bio *bio)
> >>> +static void pblk_end_user_read(struct bio *bio, int error)
> >>> {
> >>> -#ifdef CONFIG_NVM_PBLK_DEBUG
> >>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
> >>> -#endif
> >>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
> >>> +             bio_io_error(bio);
> >>>      bio_endio(bio);
> >>> }
> >>>
> >>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
> >>>      struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
> >>>      struct bio *bio = (struct bio *)r_ctx->private;
> >>>
> >>> -     pblk_end_user_read(bio);
> >>> +     pblk_end_user_read(bio, rqd->error);
> >>>      __pblk_end_io_read(pblk, rqd, true);
> >>> }
> >>>
> >>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
> >>>      rqd->bio = NULL;
> >>>      rqd->nr_ppas = nr_secs;
> >>>
> >>> -     bio_endio(bio);
> >>> +     pblk_end_user_read(bio, rqd->error);
> >>>      __pblk_end_io_read(pblk, rqd, false);
> >>> }
> >>>
> >>> --
> >>> 2.17.1
> >>
> >> This is by design. We do not report the read errors as in any other
> >> block device - this is why we clone the read bio.
> >
> > Could you elaborate on why not reporting read errors is a good thing in pblk?
> >
>
> Normal block devices do not report read errors on the completion path
> unless it is a fatal error. This is actually not well understood by the
> upper layers, which tend to assume that the device is completely broken.

So returning bogus data without even a warning is a preferred
solution? You want to force "the upper layers" to do checksumming?

It's fine to mask out NVM_RSP_WARN_HIGHECC, since that is just a
warning that OCSSD 2.0 adds. The data should still be good.
All other errors (see 4.6.1.2.1 in the NVMe 1.3 spec), indicates that
the command did not complete (As far as I can tell)


>
> This is a challenge for OCSSD / Denali / Zone devices as there are cases
> where reads can fail. Unfortunately at this point, we need to mask these
> errors and deal with them in the different layers.
>
> For OCSSD currently, we do this in pblk, which I think fits well the
> model as we exposed a normal block device.
>
> Javier

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-03-04 11:44       ` Javier González
@ 2019-03-04 12:22         ` Hans Holmberg
  2019-03-04 13:04           ` Igor Konopko
  2019-03-04 13:19           ` Javier González
  0 siblings, 2 replies; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04 12:22 UTC (permalink / raw)
  To: Javier González
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg,
	linux-block, Simon Andreas Frimann Lund, Klaus Birkelund Jensen

On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote:
>
>
>
> > On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote:
> >
> > On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote:
> >>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>>
> >>> Current lightnvm and pblk implementation does not care
> >>> about NVMe max data transfer size, which can be smaller
> >>> than 64*K=256K. This patch fixes issues related to that.
> >
> > Could you describe *what* issues you are fixing?
> >
> >>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>> ---
> >>> drivers/lightnvm/core.c      | 9 +++++++--
> >>> drivers/nvme/host/lightnvm.c | 1 +
> >>> include/linux/lightnvm.h     | 1 +
> >>> 3 files changed, 9 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> >>> index 5f82036fe322..c01f83b8fbaf 100644
> >>> --- a/drivers/lightnvm/core.c
> >>> +++ b/drivers/lightnvm/core.c
> >>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> >>>      struct nvm_target *t;
> >>>      struct nvm_tgt_dev *tgt_dev;
> >>>      void *targetdata;
> >>> +     unsigned int mdts;
> >>>      int ret;
> >>>
> >>>      switch (create->conf.type) {
> >>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> >>>      tdisk->private_data = targetdata;
> >>>      tqueue->queuedata = targetdata;
> >>>
> >>> -     blk_queue_max_hw_sectors(tqueue,
> >>> -                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> >>> +     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
> >>> +     if (dev->geo.mdts) {
> >>> +             mdts = min_t(u32, dev->geo.mdts,
> >>> +                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> >>> +     }
> >>> +     blk_queue_max_hw_sectors(tqueue, mdts);
> >>>
> >>>      set_capacity(tdisk, tt->capacity(targetdata));
> >>>      add_disk(tdisk);
> >>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> >>> index b759c25c89c8..b88a39a3cbd1 100644
> >>> --- a/drivers/nvme/host/lightnvm.c
> >>> +++ b/drivers/nvme/host/lightnvm.c
> >>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
> >>>      geo->csecs = 1 << ns->lba_shift;
> >>>      geo->sos = ns->ms;
> >>>      geo->ext = ns->ext;
> >>> +     geo->mdts = ns->ctrl->max_hw_sectors;
> >>>
> >>>      dev->q = q;
> >>>      memcpy(dev->name, disk_name, DISK_NAME_LEN);
> >>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
> >>> index 5d865a5d5cdc..d3b02708e5f0 100644
> >>> --- a/include/linux/lightnvm.h
> >>> +++ b/include/linux/lightnvm.h
> >>> @@ -358,6 +358,7 @@ struct nvm_geo {
> >>>      u16     csecs;          /* sector size */
> >>>      u16     sos;            /* out-of-band area size */
> >>>      bool    ext;            /* metadata in extended data buffer */
> >>> +     u32     mdts;           /* Max data transfer size*/
> >>>
> >>>      /* device write constrains */
> >>>      u32     ws_min;         /* minimum write size */
> >>> --
> >>> 2.17.1
> >>
> >> I see where you are going with this and I partially agree, but none of
> >> the OCSSD specs define a way to define this parameter. Thus, adding this
> >> behavior taken from NVMe in Linux can break current implementations. Is
> >> this a real life problem for you? Or this is just for NVMe “correctness”?
> >>
> >> Javier
> >
> > Hmm.Looking into the 2.0 spec what it says about vector reads:
> >
> > (figure 28):"The number of Logical Blocks (NLB): This field indicates
> > the number of logical blocks to be read. This is a 0’s based value.
> > Maximum of 64 LBAs is supported."
> >
> > You got the max limit covered, and the spec  does not say anything
> > about the minimum number of LBAs to support.
> >
> > Matias: any thoughts on this?
> >
> > Javier: How would this patch break current implementations?
>
> Say an OCSSD controller that sets mdts to a value under 64 or does not
> set it at all (maybe garbage). Think you can get to one pretty quickly...

So we cant make use of a perfectly good, standardized, parameter
because some hypothetical non-compliant device out there might not
provide a sane value?

>
> >
> > Igor: how does this patch fix the mdts restriction? There are no
> > checks on i.e. the gc read path that ensures that a lower limit than
> > NVM_MAX_VLBA is enforced.
>
> This is the other part where the implementation breaks.

No, it just does not fix it.

over-and-out,
Hans
>
> Javier

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/13] lightnvm: pblk: Gracefully handle GC data malloc fail
  2019-03-01 12:50     ` Hans Holmberg
@ 2019-03-04 12:38       ` Igor Konopko
  0 siblings, 0 replies; 91+ messages in thread
From: Igor Konopko @ 2019-03-04 12:38 UTC (permalink / raw)
  To: Hans Holmberg, Javier González
  Cc: Matias Bjørling, Hans Holmberg, linux-block



On 01.03.2019 13:50, Hans Holmberg wrote:
> On Thu, Feb 28, 2019 at 6:09 PM Javier González <javier@javigon.com> wrote:
>>
>>
>>
>>> On 27 Feb 2019, at 12.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>
>>> Currently when we fail on gc rq data allocation
>>> we simply skip the data which we wanted to move
>>> and finally move the line to free state and lose
>>> that data due to that. This patch move the data
>>> allocation to some earlier phase of GC, where we
>>> can still fail gracefully by moving line back
>>> to closed state.
>>>
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-gc.c | 19 +++++++++----------
>>> 1 file changed, 9 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
>>> index 3feadfd9418d..31fc1339faa8 100644
>>> --- a/drivers/lightnvm/pblk-gc.c
>>> +++ b/drivers/lightnvm/pblk-gc.c
>>> @@ -84,8 +84,6 @@ static void pblk_gc_line_ws(struct work_struct *work)
>>>        struct pblk_line_ws *gc_rq_ws = container_of(work,
>>>                                                struct pblk_line_ws, ws);
>>>        struct pblk *pblk = gc_rq_ws->pblk;
>>> -     struct nvm_tgt_dev *dev = pblk->dev;
>>> -     struct nvm_geo *geo = &dev->geo;
>>>        struct pblk_gc *gc = &pblk->gc;
>>>        struct pblk_line *line = gc_rq_ws->line;
>>>        struct pblk_gc_rq *gc_rq = gc_rq_ws->priv;
>>> @@ -93,13 +91,6 @@ static void pblk_gc_line_ws(struct work_struct *work)
>>>
>>>        up(&gc->gc_sem);
>>>
>>> -     gc_rq->data = vmalloc(array_size(gc_rq->nr_secs, geo->csecs));
>>> -     if (!gc_rq->data) {
>>> -             pblk_err(pblk, "could not GC line:%d (%d/%d)\n",
>>> -                                     line->id, *line->vsc, gc_rq->nr_secs);
>>> -             goto out;
>>> -     }
>>> -
>>>        /* Read from GC victim block */
>>>        ret = pblk_submit_read_gc(pblk, gc_rq);
>>>        if (ret) {
>>> @@ -189,6 +180,8 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
>>>        struct pblk_line *line = line_ws->line;
>>>        struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>>        struct pblk_line_meta *lm = &pblk->lm;
>>> +     struct nvm_tgt_dev *dev = pblk->dev;
>>> +     struct nvm_geo *geo = &dev->geo;
>>>        struct pblk_gc *gc = &pblk->gc;
>>>        struct pblk_line_ws *gc_rq_ws;
>>>        struct pblk_gc_rq *gc_rq;
>>> @@ -247,9 +240,13 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
>>>        gc_rq->nr_secs = nr_secs;
>>>        gc_rq->line = line;
>>>
>>> +     gc_rq->data = vmalloc(gc_rq->nr_secs * geo->csecs);
> 
> Why not use array_size to do the size calculation as before? It checks
> for overflows.
> Apart from this, the patch looks good to me.

Sure, you are right array_size should be used here in the same way as it 
was before. Will fix that in v2.

> 
>>> +     if (!gc_rq->data)
>>> +             goto fail_free_gc_rq;
>>> +
>>>        gc_rq_ws = kmalloc(sizeof(struct pblk_line_ws), GFP_KERNEL);
>>>        if (!gc_rq_ws)
>>> -             goto fail_free_gc_rq;
>>> +             goto fail_free_gc_data;
>>>
>>>        gc_rq_ws->pblk = pblk;
>>>        gc_rq_ws->line = line;
>>> @@ -281,6 +278,8 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
>>>
>>>        return;
>>>
>>> +fail_free_gc_data:
>>> +     vfree(gc_rq->data);
>>> fail_free_gc_rq:
>>>        kfree(gc_rq);
>>> fail_free_lba_list:
>>> --
>>> 2.17.1
>>
>> Looks good to me.
>>
>> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/13] lightnvm: pblk: Rollback in gc read
  2019-03-04  8:44     ` Hans Holmberg
@ 2019-03-04 12:39       ` Igor Konopko
  2019-03-04 12:42         ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-03-04 12:39 UTC (permalink / raw)
  To: Hans Holmberg, Javier González
  Cc: Matias Bjorling, Hans Holmberg, linux-block



On 04.03.2019 09:44, Hans Holmberg wrote:
> Did you ever see this in the wild?
> The only time pblk_gc_line returns an error is if
> kmalloc(sizeof(struct pblk_line_ws), GFP_KERNEL); fails, and then
> we're in real trouble :)

I never saw this in real life, but still probably good to have a 
"proper" handling - since we already have "some" handling.

> 
> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
> 
> On Mon, Mar 4, 2019 at 8:38 AM Javier González <javier@javigon.com> wrote:
>>
>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>
>>> Currently in case of error returned by pblk_gc_line
>>> to pblk_gc_read we leave current line unassigned
>>> from all the lists. This patch fixes that issue.
>>>
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-gc.c | 7 ++++++-
>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
>>> index 511ed0d5333c..533da6ea3e15 100644
>>> --- a/drivers/lightnvm/pblk-gc.c
>>> +++ b/drivers/lightnvm/pblk-gc.c
>>> @@ -361,8 +361,13 @@ static int pblk_gc_read(struct pblk *pblk)
>>>
>>>        pblk_gc_kick(pblk);
>>>
>>> -     if (pblk_gc_line(pblk, line))
>>> +     if (pblk_gc_line(pblk, line)) {
>>>                pblk_err(pblk, "failed to GC line %d\n", line->id);
>>> +             /* rollback */
>>> +             spin_lock(&gc->r_lock);
>>> +             list_add_tail(&line->list, &gc->r_list);
>>> +             spin_unlock(&gc->r_lock);
>>> +     }
>>>
>>>        return 0;
>>> }
>>> --
>>> 2.17.1
>>
>> Looks good to me.
>>
>> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/13] lightnvm: pblk: Line reference fix in GC
  2019-02-27 17:14 ` [PATCH 01/13] lightnvm: pblk: Line reference fix in GC Igor Konopko
  2019-03-01 12:20   ` Hans Holmberg
  2019-03-04  7:18   ` Javier González
@ 2019-03-04 12:40   ` Matias Bjørling
  2 siblings, 0 replies; 91+ messages in thread
From: Matias Bjørling @ 2019-03-04 12:40 UTC (permalink / raw)
  To: Igor Konopko, javier, hans.holmberg; +Cc: linux-block

On 2/27/19 6:14 PM, Igor Konopko wrote:
> This patch fixes the error case in GC when
> we both moves line back to closed state and
> release additional reference, what cause illegal
> transition from closed to free on pblk_line_put
> when only gc to free line state transition is
> allowed using that path.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>   drivers/lightnvm/pblk-gc.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index 2fa118c8eb71..3feadfd9418d 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -290,8 +290,11 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
>   fail_free_ws:
>   	kfree(line_ws);
>   
> +	/* Line goes back to closed state, so we cannot release additional
> +	 * reference for line, since we do that only when we want to do
> +	 * gc to free line state transition.
> +	 */
>   	pblk_put_line_back(pblk, line);
> -	kref_put(&line->ref, pblk_line_put);
>   	atomic_dec(&gc->read_inflight_gc);
>   
>   	pblk_err(pblk, "failed to GC line %d\n", line->id);
> 

Thanks Igor. Applied for 5.2.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/13] lightnvm: pblk: Rollback in gc read
  2019-03-04 12:39       ` Igor Konopko
@ 2019-03-04 12:42         ` Hans Holmberg
  0 siblings, 0 replies; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04 12:42 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Javier González, Matias Bjorling, Hans Holmberg, linux-block

On Mon, Mar 4, 2019 at 1:39 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
>
>
> On 04.03.2019 09:44, Hans Holmberg wrote:
> > Did you ever see this in the wild?
> > The only time pblk_gc_line returns an error is if
> > kmalloc(sizeof(struct pblk_line_ws), GFP_KERNEL); fails, and then
> > we're in real trouble :)
>
> I never saw this in real life, but still probably good to have a
> "proper" handling - since we already have "some" handling.

Yes!


>
> >
> > Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
> >
> > On Mon, Mar 4, 2019 at 8:38 AM Javier González <javier@javigon.com> wrote:
> >>
> >>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>>
> >>> Currently in case of error returned by pblk_gc_line
> >>> to pblk_gc_read we leave current line unassigned
> >>> from all the lists. This patch fixes that issue.
> >>>
> >>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>> ---
> >>> drivers/lightnvm/pblk-gc.c | 7 ++++++-
> >>> 1 file changed, 6 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> >>> index 511ed0d5333c..533da6ea3e15 100644
> >>> --- a/drivers/lightnvm/pblk-gc.c
> >>> +++ b/drivers/lightnvm/pblk-gc.c
> >>> @@ -361,8 +361,13 @@ static int pblk_gc_read(struct pblk *pblk)
> >>>
> >>>        pblk_gc_kick(pblk);
> >>>
> >>> -     if (pblk_gc_line(pblk, line))
> >>> +     if (pblk_gc_line(pblk, line)) {
> >>>                pblk_err(pblk, "failed to GC line %d\n", line->id);
> >>> +             /* rollback */
> >>> +             spin_lock(&gc->r_lock);
> >>> +             list_add_tail(&line->list, &gc->r_list);
> >>> +             spin_unlock(&gc->r_lock);
> >>> +     }
> >>>
> >>>        return 0;
> >>> }
> >>> --
> >>> 2.17.1
> >>
> >> Looks good to me.
> >>
> >> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/13] lightnvm: pblk: Count all read errors in stats
  2019-03-04 11:45           ` Javier González
@ 2019-03-04 12:42             ` Igor Konopko
  2019-03-04 12:48               ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-03-04 12:42 UTC (permalink / raw)
  To: Javier González, Hans Holmberg
  Cc: Matias Bjørling, Hans Holmberg, linux-block



On 04.03.2019 12:45, Javier González wrote:
> 
>> On 4 Mar 2019, at 12.41, Hans Holmberg <hans@owltronix.com> wrote:
>>
>> On Mon, Mar 4, 2019 at 10:23 AM Javier González <javier@javigon.com> wrote:
>>>> On 4 Mar 2019, at 10.02, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>>>
>>>> Igor: Have you seen this happening in real life?
>>>>
>>>> I think it would be better to count all expected errors and put them
>>>> in the right bucket (without spamming dmesg). If we need a new bucket
>>>> for i.e. vendor-specific-errors, let's do that instead.

Generally I'm seeing different types of errors (which are typically as 
Javier mention controller errors) in cases such as hot drive removal, etc.

We can skip that patch, since this are kind of corner cases. I can also
create new type of pblk stats, sth. like "controller errors", which 
would collect all the other unexpected errors in one place instead of 
mixing them with real read/write errors as I did.

>>>>
>>>> Someone wiser than me told me that every error print in the log is a
>>>> potential customer call.
>>>>
>>>> Javier: Yeah, I think S.M.A.R.T is the way to deliver this
>>>> information. Why can't we let the drives expose this info and remove
>>>> this from pblk? What's blocking that?
>>>
>>> Until now the spec. We added some new log information in Denali exactly
>>> for this. But since pblk supports OCSSD 1.2 and 2.0 I think it is needed to
>>> have it here, at least for debugging.
>>
>> Why add it to the spec? Why not use whatever everyone else is using?
>>
>> https://en.wikipedia.org/wiki/S.M.A.R.T. :
>> "S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often
>> written as SMART) is a monitoring system included in computer hard
>> disk drives (HDDs), solid-state drives (SSDs),[1] and eMMC drives. Its
>> primary function is to detect and report various indicators of drive
>> reliability with the intent of anticipating imminent hardware
>> failures."
>> Sounds like what we want here.
> 
> I know what smart is… You need to define the fields. Maybe you want to
> read Denali again - the extensions are couple with smart.
> 
>> For debugging, a trace point or something(i.e. BPF) would be a better
>> solution that would not impact hot-path performance.
> 
> Cool. Look forward to the patches ;)
> 
> Javier
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned
  2019-03-04 11:43         ` Hans Holmberg
@ 2019-03-04 12:44           ` Igor Konopko
  2019-03-04 12:57             ` Hans Holmberg
  2019-03-04 13:00             ` Matias Bjørling
  0 siblings, 2 replies; 91+ messages in thread
From: Igor Konopko @ 2019-03-04 12:44 UTC (permalink / raw)
  To: Hans Holmberg, Javier González
  Cc: Matias Bjørling, Hans Holmberg, linux-block



On 04.03.2019 12:43, Hans Holmberg wrote:
> On Mon, Mar 4, 2019 at 10:11 AM Javier González <javier@javigon.com> wrote:
>>
>>> On 4 Mar 2019, at 10.05, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>>
>>> I strongly disagree with adding code that would mask implantation errors.
>>>
>>> If we want more internal checks, we could add an if statement that
>>> would only be compiled in if CONFIG_NVM_PBLK_DEBUG is enabled.
>>>
>>
>> Not sure who this is for - better not to top post.
>>
>> In any case, this is a spec grey zone. I’m ok with cleaning the bits as
>> they mean nothing for the reset command. If you feel that strongly about
>> this, you can take if with Igor.
> 
> Pardon the top-post. It was meant for both you and Igor.
> 

OCSSD 2.0 spec for vector chunk reset (chapter 2.2.2) explicitly says 
"The addresses in the LBA list shall be the first logical block address 
of each chunk to be reset.". So in my understanding we suppose to clear 
the sectors bits of the PPA address in order to be spec compliant.

>>
>>>
>>> On Mon, Mar 4, 2019 at 8:48 AM Javier González <javier@javigon.com> wrote:
>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>
>>>>> In current pblk implementation of erase command
>>>>> there is a chance tha sector bits are set to some
>>>>> random values for erase PPA. This is unexpected
>>>>> situation, since erase shall be always chunk
>>>>> aligned. This patch fixes that issue
>>>>>
>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>> ---
>>>>> drivers/lightnvm/pblk-core.c | 1 +
>>>>> drivers/lightnvm/pblk-map.c  | 2 ++
>>>>> 2 files changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>>> index a98b2255f963..78b1eea4ab67 100644
>>>>> --- a/drivers/lightnvm/pblk-core.c
>>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>>> @@ -978,6 +978,7 @@ int pblk_line_erase(struct pblk *pblk, struct pblk_line *line)
>>>>>
>>>>>               ppa = pblk->luns[bit].bppa; /* set ch and lun */
>>>>>               ppa.a.blk = line->id;
>>>>> +             ppa.a.reserved = 0;
>>>>>
>>>>>               atomic_dec(&line->left_eblks);
>>>>>               WARN_ON(test_and_set_bit(bit, line->erase_bitmap));
>>>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>>>> index 79df583ea709..aea46b4ec40f 100644
>>>>> --- a/drivers/lightnvm/pblk-map.c
>>>>> +++ b/drivers/lightnvm/pblk-map.c
>>>>> @@ -161,6 +161,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>
>>>>>                       *erase_ppa = ppa_list[i];
>>>>>                       erase_ppa->a.blk = e_line->id;
>>>>> +                     erase_ppa->a.reserved = 0;
>>>>>
>>>>>                       spin_unlock(&e_line->lock);
>>>>>
>>>>> @@ -202,6 +203,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>               atomic_dec(&e_line->left_eblks);
>>>>>               *erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
>>>>>               erase_ppa->a.blk = e_line->id;
>>>>> +             erase_ppa->a.reserved = 0;
>>>>>       }
>>>>>
>>>>>       return 0;
>>>>> --
>>>>> 2.17.1
>>>>
>>>> I’m fine with adding this, but note that there is actually no
>>>> requirement for the erase to be chunk aligned - the only bits that
>>>> should be looked at are group, PU and chunk.
>>>>
>>>> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 07/13] lightnvm: pblk: Cleanly fail when there is not enough memory
  2019-03-04  9:24     ` Hans Holmberg
@ 2019-03-04 12:46       ` Igor Konopko
  0 siblings, 0 replies; 91+ messages in thread
From: Igor Konopko @ 2019-03-04 12:46 UTC (permalink / raw)
  To: Hans Holmberg, Javier González
  Cc: Matias Bjorling, Hans Holmberg, linux-block



On 04.03.2019 10:24, Hans Holmberg wrote:
> Hi Igor,
> 
> I think you need to motivate (and document) why each of the new flags
> are needed.

Sure, will add such a description to v2 of the patch.
> 
> Thanks,
> Hans
> 
> On Mon, Mar 4, 2019 at 8:53 AM Javier González <javier@javigon.com> wrote:
>>
>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>
>>> L2P table can be huge in many cases, since
>>> it typically requires 1GB of DRAM for 1TB
>>> of drive. When there is not enough memory
>>> available, OOM killer turns on and kills
>>> random processes, which can be very annoying
>>> for users. This patch changes the flag for
>>> L2P table allocation on order to handle this
>>> situation in more user friendly way
>>>
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-init.c | 9 +++++++--
>>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>> index 8b643d0bffae..e553105b7ba1 100644
>>> --- a/drivers/lightnvm/pblk-init.c
>>> +++ b/drivers/lightnvm/pblk-init.c
>>> @@ -164,9 +164,14 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
>>>        int ret = 0;
>>>
>>>        map_size = pblk_trans_map_size(pblk);
>>> -     pblk->trans_map = vmalloc(map_size);
>>> -     if (!pblk->trans_map)
>>> +     pblk->trans_map = __vmalloc(map_size, GFP_KERNEL | __GFP_NOWARN
>>> +                                     | __GFP_RETRY_MAYFAIL | __GFP_HIGHMEM,
>>> +                                     PAGE_KERNEL);
>>> +     if (!pblk->trans_map) {
>>> +             pblk_err(pblk, "failed to allocate L2P (need %ld of memory)\n",
>>> +                             map_size);
>>>                return -ENOMEM;
>>> +     }
>>>
>>>        pblk_ppa_set_empty(&ppa);
>>>
>>> --
>>> 2.17.1
>>
>> Is there any extra consideration we should take when enabling high
>> memory for the L2P table? If not, looks good to me.
>>
>> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/13] lightnvm: pblk: Count all read errors in stats
  2019-03-04 12:42             ` Igor Konopko
@ 2019-03-04 12:48               ` Hans Holmberg
  0 siblings, 0 replies; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04 12:48 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Javier González, Matias Bjørling, Hans Holmberg, linux-block

On Mon, Mar 4, 2019 at 1:42 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
>
>
> On 04.03.2019 12:45, Javier González wrote:
> >
> >> On 4 Mar 2019, at 12.41, Hans Holmberg <hans@owltronix.com> wrote:
> >>
> >> On Mon, Mar 4, 2019 at 10:23 AM Javier González <javier@javigon.com> wrote:
> >>>> On 4 Mar 2019, at 10.02, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
> >>>>
> >>>> Igor: Have you seen this happening in real life?
> >>>>
> >>>> I think it would be better to count all expected errors and put them
> >>>> in the right bucket (without spamming dmesg). If we need a new bucket
> >>>> for i.e. vendor-specific-errors, let's do that instead.
>
> Generally I'm seeing different types of errors (which are typically as
> Javier mention controller errors) in cases such as hot drive removal, etc.
>
> We can skip that patch, since this are kind of corner cases. I can also
> create new type of pblk stats, sth. like "controller errors", which
> would collect all the other unexpected errors in one place instead of
> mixing them with real read/write errors as I did.

Yes, that makes sense.

Thanks,
Hans

>
> >>>>
> >>>> Someone wiser than me told me that every error print in the log is a
> >>>> potential customer call.
> >>>>
> >>>> Javier: Yeah, I think S.M.A.R.T is the way to deliver this
> >>>> information. Why can't we let the drives expose this info and remove
> >>>> this from pblk? What's blocking that?
> >>>
> >>> Until now the spec. We added some new log information in Denali exactly
> >>> for this. But since pblk supports OCSSD 1.2 and 2.0 I think it is needed to
> >>> have it here, at least for debugging.
> >>
> >> Why add it to the spec? Why not use whatever everyone else is using?
> >>
> >> https://en.wikipedia.org/wiki/S.M.A.R.T. :
> >> "S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often
> >> written as SMART) is a monitoring system included in computer hard
> >> disk drives (HDDs), solid-state drives (SSDs),[1] and eMMC drives. Its
> >> primary function is to detect and report various indicators of drive
> >> reliability with the intent of anticipating imminent hardware
> >> failures."
> >> Sounds like what we want here.
> >
> > I know what smart is… You need to define the fields. Maybe you want to
> > read Denali again - the extensions are couple with smart.
> >
> >> For debugging, a trace point or something(i.e. BPF) would be a better
> >> solution that would not impact hot-path performance.
> >
> > Cool. Look forward to the patches ;)
> >
> > Javier
> >

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/13] lightnvm: pblk: Rollback in gc read
  2019-02-27 17:14 ` [PATCH 04/13] lightnvm: pblk: Rollback in gc read Igor Konopko
  2019-03-04  7:38   ` Javier González
@ 2019-03-04 12:49   ` Matias Bjørling
  1 sibling, 0 replies; 91+ messages in thread
From: Matias Bjørling @ 2019-03-04 12:49 UTC (permalink / raw)
  To: Igor Konopko, javier, hans.holmberg; +Cc: linux-block

On 2/27/19 6:14 PM, Igor Konopko wrote:
> Currently in case of error returned by pblk_gc_line
> to pblk_gc_read we leave current line unassigned
> from all the lists. This patch fixes that issue.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>   drivers/lightnvm/pblk-gc.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index 511ed0d5333c..533da6ea3e15 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -361,8 +361,13 @@ static int pblk_gc_read(struct pblk *pblk)
>   
>   	pblk_gc_kick(pblk);
>   
> -	if (pblk_gc_line(pblk, line))
> +	if (pblk_gc_line(pblk, line)) {
>   		pblk_err(pblk, "failed to GC line %d\n", line->id);
> +		/* rollback */
> +		spin_lock(&gc->r_lock);
> +		list_add_tail(&line->list, &gc->r_list);
> +		spin_unlock(&gc->r_lock);
> +	}
>   
>   	return 0;
>   }
> 

Thanks Igor. I've reworded your description a bit. Applied for 5.2.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio
  2019-03-04 12:14         ` Hans Holmberg
@ 2019-03-04 12:51           ` Igor Konopko
  2019-03-04 13:08             ` Matias Bjørling
  0 siblings, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-03-04 12:51 UTC (permalink / raw)
  To: Hans Holmberg, Javier González
  Cc: Matias Bjørling, Hans Holmberg, linux-block



On 04.03.2019 13:14, Hans Holmberg wrote:
> On Mon, Mar 4, 2019 at 10:48 AM Javier González <javier@javigon.com> wrote:
>>
>>
>>
>>> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>>
>>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>
>>>>> Currently in case of read errors, bi_status is not
>>>>> set properly which leads to returning inproper data
>>>>> to higher layer. This patch fix that by setting proper
>>>>> status in case of read errors
>>>>>
>>>>> Patch also removes unnecessary warn_once(), which does
>>>>> not make sense in that place, since user bio is not used
>>>>> for interation with drive and thus bi_status will not be
>>>>> set here.
>>>>>
>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>> ---
>>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>> index 3789185144da..39c1d6ccaedb 100644
>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>       WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>>>> }
>>>>>
>>>>> -static void pblk_end_user_read(struct bio *bio)
>>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>>> {
>>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>>> -#endif
>>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>>> +             bio_io_error(bio);
>>>>>       bio_endio(bio);
>>>>> }
>>>>>
>>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>>       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>       struct bio *bio = (struct bio *)r_ctx->private;
>>>>>
>>>>> -     pblk_end_user_read(bio);
>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>       __pblk_end_io_read(pblk, rqd, true);
>>>>> }
>>>>>
>>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>       rqd->bio = NULL;
>>>>>       rqd->nr_ppas = nr_secs;
>>>>>
>>>>> -     bio_endio(bio);
>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>       __pblk_end_io_read(pblk, rqd, false);
>>>>> }
>>>>>
>>>>> --
>>>>> 2.17.1
>>>>
>>>> This is by design. We do not report the read errors as in any other
>>>> block device - this is why we clone the read bio.
>>>
>>> Could you elaborate on why not reporting read errors is a good thing in pblk?
>>>
>>
>> Normal block devices do not report read errors on the completion path
>> unless it is a fatal error. This is actually not well understood by the
>> upper layers, which tend to assume that the device is completely broken.
> 
> So returning bogus data without even a warning is a preferred
> solution? You want to force "the upper layers" to do checksumming?
> 
> It's fine to mask out NVM_RSP_WARN_HIGHECC, since that is just a
> warning that OCSSD 2.0 adds. The data should still be good.
> All other errors (see 4.6.1.2.1 in the NVMe 1.3 spec), indicates that
> the command did not complete (As far as I can tell)
> 

My approach was exactly like that. In all cases other than WARN_HIGHECC 
we don't have a valid data. Without setting a bio_io_error() we are 
creating the impression for other layers, that we read the data 
correctly, what is not a case then.

I'm also seeing that this patch is not the only user of bio_io_error() 
API, also other drivers such as md uses is commonly.

> 
>>
>> This is a challenge for OCSSD / Denali / Zone devices as there are cases
>> where reads can fail. Unfortunately at this point, we need to mask these
>> errors and deal with them in the different layers.
>>
>> For OCSSD currently, we do this in pblk, which I think fits well the
>> model as we exposed a normal block device.
>>
>> Javier

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 09/13] lightnvm: pblk: Kick writer for flush requests
  2019-03-04  9:39     ` Hans Holmberg
@ 2019-03-04 12:52       ` Igor Konopko
  0 siblings, 0 replies; 91+ messages in thread
From: Igor Konopko @ 2019-03-04 12:52 UTC (permalink / raw)
  To: Hans Holmberg, Javier González
  Cc: Matias Bjorling, Hans Holmberg, linux-block



On 04.03.2019 10:39, Hans Holmberg wrote:
> On Mon, Mar 4, 2019 at 9:08 AM Javier González <javier@javigon.com> wrote:
>>
>>
>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>
>>> In case when there is no enough sector available in rwb
>>> and there is flush request send we should kick write thread
>>> which is not a case in current implementation. This patch
>>> fixes that issue.
>>>
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-core.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>> index 78b1eea4ab67..f48f2e77f770 100644
>>> --- a/drivers/lightnvm/pblk-core.c
>>> +++ b/drivers/lightnvm/pblk-core.c
>>> @@ -375,8 +375,9 @@ void pblk_write_timer_fn(struct timer_list *t)
>>> void pblk_write_should_kick(struct pblk *pblk)
>>> {
>>>        unsigned int secs_avail = pblk_rb_read_count(&pblk->rwb);
>>> +     unsigned int secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
>>>
>>> -     if (secs_avail >= pblk->min_write_pgs_data)
>>> +     if (secs_avail >= pblk->min_write_pgs_data || secs_to_flush)
>>>                pblk_write_kick(pblk);
>>> }
>>>
>>> --
>>> 2.17.1
>>
>> We already kick the write thread in case of REQ_PREFLUSH in
>> pblk_write_cache(), so no need to kick again.
> 
> Yeah, I thought i fixed this issue in:
> 
> cc9c9a00b10e ("lightnvm: pblk: kick writer on new flush points")
> 
> That commit brought down the test time of some of the xfs sync tests
> with a factor of 20 or so.
> 
> Igor: Have you seen any case of delayed syncs?

I didn't noticed that in the code. My mistake. So definitely my commit 
does not make a sense and we can just forget about that one.

> 
>>
>> Javier

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 12/13] lightnvm: pblk: close opened chunks
  2019-03-04 10:05     ` Hans Holmberg
@ 2019-03-04 12:56       ` Igor Konopko
  2019-03-04 13:03         ` Hans Holmberg
  2019-03-04 13:19       ` Matias Bjørling
  1 sibling, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-03-04 12:56 UTC (permalink / raw)
  To: Hans Holmberg, Javier González, Matias Bjorling
  Cc: Hans Holmberg, linux-block, Klaus Jensen



On 04.03.2019 11:05, Hans Holmberg wrote:
> On Mon, Mar 4, 2019 at 9:46 AM Javier González <javier@javigon.com> wrote:
>>
>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>
>>> When we creating pblk instance with factory
>>> flag, there is a possibility that some chunks
>>> are in open state, which does not allow to
>>> issue erase request to them directly. Such a
>>> chunk should be filled with some empty data in
>>> order to achieve close state. Without that we
>>> are risking that some erase operation will be
>>> rejected by the drive due to inproper chunk
>>> state.
>>>
>>> This patch implements closing chunk logic in pblk
>>> for that case, when creating instance with factory
>>> flag in order to fix that issue
>>>
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-core.c | 114 +++++++++++++++++++++++++++++++++++
>>> drivers/lightnvm/pblk-init.c |  14 ++++-
>>> drivers/lightnvm/pblk.h      |   2 +
>>> 3 files changed, 128 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>> index fa4dc05608ff..d3c45393f093 100644
>>> --- a/drivers/lightnvm/pblk-core.c
>>> +++ b/drivers/lightnvm/pblk-core.c
>>> @@ -161,6 +161,120 @@ struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
>>>        return meta + ch_off + lun_off + chk_off;
>>> }
>>>
>>> +static void pblk_close_chunk(struct pblk *pblk, struct ppa_addr ppa, int count)
>>> +{
>>> +     struct nvm_tgt_dev *dev = pblk->dev;
>>> +     struct nvm_geo *geo = &dev->geo;
>>> +     struct bio *bio;
>>> +     struct ppa_addr *ppa_list;
>>> +     struct nvm_rq rqd;
>>> +     void *meta_list, *data;
>>> +     dma_addr_t dma_meta_list, dma_ppa_list;
>>> +     int i, rq_ppas, rq_len, ret;
>>> +
>>> +     meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
>>> +     if (!meta_list)
>>> +             return;
>>> +
>>> +     ppa_list = meta_list + pblk_dma_meta_size(pblk);
>>> +     dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
>>> +
>>> +     rq_ppas = pblk_calc_secs(pblk, count, 0, false);
>>> +     if (!rq_ppas)
>>> +             rq_ppas = pblk->min_write_pgs;
>>> +     rq_len = rq_ppas * geo->csecs;
>>> +
>>> +     data = kzalloc(rq_len, GFP_KERNEL);
>>> +     if (!data)
>>> +             goto free_meta_list;
>>> +
>>> +     memset(&rqd, 0, sizeof(struct nvm_rq));
>>> +     rqd.opcode = NVM_OP_PWRITE;
>>> +     rqd.nr_ppas = rq_ppas;
>>> +     rqd.meta_list = meta_list;
>>> +     rqd.ppa_list = ppa_list;
>>> +     rqd.dma_ppa_list = dma_ppa_list;
>>> +     rqd.dma_meta_list = dma_meta_list;
>>> +
>>> +next_rq:
>>> +     bio = bio_map_kern(dev->q, data, rq_len, GFP_KERNEL);
>>> +     if (IS_ERR(bio))
>>> +             goto out_next;
>>> +
>>> +     bio->bi_iter.bi_sector = 0; /* artificial bio */
>>> +     bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>>> +
>>> +     rqd.bio = bio;
>>> +     for (i = 0; i < rqd.nr_ppas; i++) {
>>> +             rqd.ppa_list[i] = ppa;
>>> +             rqd.ppa_list[i].m.sec += i;
>>> +             pblk_get_meta(pblk, meta_list, i)->lba =
>>> +                                     cpu_to_le64(ADDR_EMPTY);
>>> +     }
>>> +
>>> +     ret = nvm_submit_io_sync(dev, &rqd);
>>> +     if (ret) {
>>> +             bio_put(bio);
>>> +             goto out_next;
>>> +     }
>>> +
>>> +     if (rqd.error)
>>> +             goto free_data;
>>> +
>>> +out_next:
>>> +     count -= rqd.nr_ppas;
>>> +     ppa.m.sec += rqd.nr_ppas;
>>> +     if (count > 0)
>>> +             goto next_rq;
>>> +
>>> +free_data:
>>> +     kfree(data);
>>> +free_meta_list:
>>> +     nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>> +}
>>> +
>>> +void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
>>> +{
>>> +     struct nvm_tgt_dev *dev = pblk->dev;
>>> +     struct nvm_geo *geo = &dev->geo;
>>> +     struct nvm_chk_meta *chunk_meta;
>>> +     struct ppa_addr ppa;
>>> +     int i, j, k, count;
>>> +
>>> +     for (i = 0; i < geo->num_chk; i++) {
>>> +             for (j = 0; j < geo->num_lun; j++) {
>>> +                     for (k = 0; k < geo->num_ch; k++) {
>>> +                             ppa.ppa = 0;
>>> +                             ppa.m.grp = k;
>>> +                             ppa.m.pu = j;
>>> +                             ppa.m.chk = i;
>>> +
>>> +                             chunk_meta = pblk_chunk_get_off(pblk,
>>> +                                                             meta, ppa);
>>> +                             if (chunk_meta->state == NVM_CHK_ST_OPEN) {
>>> +                                     ppa.m.sec = chunk_meta->wp;
>>> +                                     count = geo->clba - chunk_meta->wp;
>>> +                                     pblk_close_chunk(pblk, ppa, count);
>>> +                             }
>>> +                     }
>>> +             }
>>> +     }
>>> +}
>>> +
>>> +bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
>>> +{
>>> +     struct nvm_tgt_dev *dev = pblk->dev;
>>> +     struct nvm_geo *geo = &dev->geo;
>>> +     int i;
>>> +
>>> +     for (i = 0; i < geo->all_luns; i++) {
>>> +             if (meta[i].state == NVM_CHK_ST_OPEN)
>>> +                     return true;
>>> +     }
>>> +
>>> +     return false;
>>> +}
>>> +
>>> void __pblk_map_invalidate(struct pblk *pblk, struct pblk_line *line,
>>>                           u64 paddr)
>>> {
>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>> index 9913a4514eb6..83abe6960b46 100644
>>> --- a/drivers/lightnvm/pblk-init.c
>>> +++ b/drivers/lightnvm/pblk-init.c
>>> @@ -1028,13 +1028,14 @@ static int pblk_line_meta_init(struct pblk *pblk)
>>>        return 0;
>>> }
>>>
>>> -static int pblk_lines_init(struct pblk *pblk)
>>> +static int pblk_lines_init(struct pblk *pblk, bool factory_init)
>>> {
>>>        struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>>        struct pblk_line *line;
>>>        void *chunk_meta;
>>>        int nr_free_chks = 0;
>>>        int i, ret;
>>> +     bool retry = false;
>>>
>>>        ret = pblk_line_meta_init(pblk);
>>>        if (ret)
>>> @@ -1048,12 +1049,21 @@ static int pblk_lines_init(struct pblk *pblk)
>>>        if (ret)
>>>                goto fail_free_meta;
>>>
>>> +get_chunk_meta:
>>>        chunk_meta = pblk_get_chunk_meta(pblk);
>>>        if (IS_ERR(chunk_meta)) {
>>>                ret = PTR_ERR(chunk_meta);
>>>                goto fail_free_luns;
>>>        }
>>>
>>> +     if (factory_init && !retry &&
>>> +         pblk_are_opened_chunks(pblk, chunk_meta)) {
>>> +             pblk_close_opened_chunks(pblk, chunk_meta);
>>> +             retry = true;
>>> +             vfree(chunk_meta);
>>> +             goto get_chunk_meta;
>>> +     }
>>> +
>>>        pblk->lines = kcalloc(l_mg->nr_lines, sizeof(struct pblk_line),
>>>                                                                GFP_KERNEL);
>>>        if (!pblk->lines) {
>>> @@ -1244,7 +1254,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
>>>                goto fail;
>>>        }
>>>
>>> -     ret = pblk_lines_init(pblk);
>>> +     ret = pblk_lines_init(pblk, flags & NVM_TARGET_FACTORY);
>>>        if (ret) {
>>>                pblk_err(pblk, "could not initialize lines\n");
>>>                goto fail_free_core;
>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>> index b266563508e6..b248642c4dfb 100644
>>> --- a/drivers/lightnvm/pblk.h
>>> +++ b/drivers/lightnvm/pblk.h
>>> @@ -793,6 +793,8 @@ struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk);
>>> struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
>>>                                              struct nvm_chk_meta *lp,
>>>                                              struct ppa_addr ppa);
>>> +void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
>>> +bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
>>> void pblk_log_write_err(struct pblk *pblk, struct nvm_rq *rqd);
>>> void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd);
>>> int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd);
>>> --
>>> 2.17.1
>>
>> I know that the OCSSD 2.0 spec does not allow to transition from open to
>> free, but to me this is a spec bug as there is no underlying issue in
>> reading an open block. Note that all controllers I know support this,
>> and the upcoming Denali spec. fixes this too.
> 
> + Klaus whom I discussed this with.
> Yeah, i think that "early reset" is a nice feature. It would be nice
> to extend the OCSSD spec with a new capabilities bit indicating if
> this is indeed supported or not.
> Matias: what do you think?
> 
>>
>> Besides, the factory flag is intended to start a pblk instance
>> immediately, without having to pay the price of padding any past device
>> state.  If you still want to do this, I think this belongs in a user space tool.
>>
> 
> Hear, hear!
> 
> Serially padding any open chunks during -f create call would be
> terrible user experience.
> Lets say that padding a chunk takes one second, we would, in a
> worst-case scenario in an example disk, be stuck in a syscall for
> 1*64*1000 seconds ~ 17 hours.
> 

You are both right - this can be time consuming. So we can drop the 
"padding" part and let user do that.

What about changing this patch to at least print some warning on dmesg
when we have some open chunks in such a case?

> A tool, like dm-zoned's dmzadm would be the right approach, see
> Documentation/device-mapper/dm-zoned.txt
> All new pblk instances would then have to be pre-formatted with "pblkadm"
> 
> A new physical storage format containing a superblock would also be a good idea.
> 
> / Hans
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned
  2019-03-04 12:44           ` Igor Konopko
@ 2019-03-04 12:57             ` Hans Holmberg
  2019-03-04 13:00             ` Matias Bjørling
  1 sibling, 0 replies; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04 12:57 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Javier González, Matias Bjørling, Hans Holmberg, linux-block

On Mon, Mar 4, 2019 at 1:44 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
>
>
> On 04.03.2019 12:43, Hans Holmberg wrote:
> > On Mon, Mar 4, 2019 at 10:11 AM Javier González <javier@javigon.com> wrote:
> >>
> >>> On 4 Mar 2019, at 10.05, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
> >>>
> >>> I strongly disagree with adding code that would mask implantation errors.
> >>>
> >>> If we want more internal checks, we could add an if statement that
> >>> would only be compiled in if CONFIG_NVM_PBLK_DEBUG is enabled.
> >>>
> >>
> >> Not sure who this is for - better not to top post.
> >>
> >> In any case, this is a spec grey zone. I’m ok with cleaning the bits as
> >> they mean nothing for the reset command. If you feel that strongly about
> >> this, you can take if with Igor.
> >
> > Pardon the top-post. It was meant for both you and Igor.
> >
>
> OCSSD 2.0 spec for vector chunk reset (chapter 2.2.2) explicitly says
> "The addresses in the LBA list shall be the first logical block address
> of each chunk to be reset.". So in my understanding we suppose to clear
> the sectors bits of the PPA address in order to be spec compliant.

Hmm.. but my point is that if pblk->luns[bit].bppa is set to a valid
chunk address, there is no need to clear a.reserved.
that address originates from core.c:196,  and thats valid.

>
> >>
> >>>
> >>> On Mon, Mar 4, 2019 at 8:48 AM Javier González <javier@javigon.com> wrote:
> >>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>>>>
> >>>>> In current pblk implementation of erase command
> >>>>> there is a chance tha sector bits are set to some
> >>>>> random values for erase PPA. This is unexpected
> >>>>> situation, since erase shall be always chunk
> >>>>> aligned. This patch fixes that issue
> >>>>>
> >>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>>>> ---
> >>>>> drivers/lightnvm/pblk-core.c | 1 +
> >>>>> drivers/lightnvm/pblk-map.c  | 2 ++
> >>>>> 2 files changed, 3 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> >>>>> index a98b2255f963..78b1eea4ab67 100644
> >>>>> --- a/drivers/lightnvm/pblk-core.c
> >>>>> +++ b/drivers/lightnvm/pblk-core.c
> >>>>> @@ -978,6 +978,7 @@ int pblk_line_erase(struct pblk *pblk, struct pblk_line *line)
> >>>>>
> >>>>>               ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >>>>>               ppa.a.blk = line->id;
> >>>>> +             ppa.a.reserved = 0;
> >>>>>
> >>>>>               atomic_dec(&line->left_eblks);
> >>>>>               WARN_ON(test_and_set_bit(bit, line->erase_bitmap));
> >>>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
> >>>>> index 79df583ea709..aea46b4ec40f 100644
> >>>>> --- a/drivers/lightnvm/pblk-map.c
> >>>>> +++ b/drivers/lightnvm/pblk-map.c
> >>>>> @@ -161,6 +161,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
> >>>>>
> >>>>>                       *erase_ppa = ppa_list[i];
> >>>>>                       erase_ppa->a.blk = e_line->id;
> >>>>> +                     erase_ppa->a.reserved = 0;
> >>>>>
> >>>>>                       spin_unlock(&e_line->lock);
> >>>>>
> >>>>> @@ -202,6 +203,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
> >>>>>               atomic_dec(&e_line->left_eblks);
> >>>>>               *erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >>>>>               erase_ppa->a.blk = e_line->id;
> >>>>> +             erase_ppa->a.reserved = 0;
> >>>>>       }
> >>>>>
> >>>>>       return 0;
> >>>>> --
> >>>>> 2.17.1
> >>>>
> >>>> I’m fine with adding this, but note that there is actually no
> >>>> requirement for the erase to be chunk aligned - the only bits that
> >>>> should be looked at are group, PU and chunk.
> >>>>
> >>>> Reviewed-by: Javier González <javier@javigon.com>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned
  2019-03-04 12:44           ` Igor Konopko
  2019-03-04 12:57             ` Hans Holmberg
@ 2019-03-04 13:00             ` Matias Bjørling
  2019-03-05  8:20               ` Hans Holmberg
  1 sibling, 1 reply; 91+ messages in thread
From: Matias Bjørling @ 2019-03-04 13:00 UTC (permalink / raw)
  To: Igor Konopko, Hans Holmberg, Javier González
  Cc: Hans Holmberg, linux-block

On 3/4/19 1:44 PM, Igor Konopko wrote:
> 
> 
> On 04.03.2019 12:43, Hans Holmberg wrote:
>> On Mon, Mar 4, 2019 at 10:11 AM Javier González <javier@javigon.com> 
>> wrote:
>>>
>>>> On 4 Mar 2019, at 10.05, Hans Holmberg 
>>>> <hans.ml.holmberg@owltronix.com> wrote:
>>>>
>>>> I strongly disagree with adding code that would mask implantation 
>>>> errors.
>>>>
>>>> If we want more internal checks, we could add an if statement that
>>>> would only be compiled in if CONFIG_NVM_PBLK_DEBUG is enabled.
>>>>
>>>
>>> Not sure who this is for - better not to top post.
>>>
>>> In any case, this is a spec grey zone. I’m ok with cleaning the bits as
>>> they mean nothing for the reset command. If you feel that strongly about
>>> this, you can take if with Igor.
>>
>> Pardon the top-post. It was meant for both you and Igor.
>>
> 
> OCSSD 2.0 spec for vector chunk reset (chapter 2.2.2) explicitly says 
> "The addresses in the LBA list shall be the first logical block address 
> of each chunk to be reset.". So in my understanding we suppose to clear 
> the sectors bits of the PPA address in order to be spec compliant.
> 

Agree. And since ppa_addr is allocated on the stack, it should be either 
memset or the remaining fields should be set to 0. Maybe better to zero 
initialize in declaration?

>>>
>>>>
>>>> On Mon, Mar 4, 2019 at 8:48 AM Javier González <javier@javigon.com> 
>>>> wrote:
>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> 
>>>>>> wrote:
>>>>>>
>>>>>> In current pblk implementation of erase command
>>>>>> there is a chance tha sector bits are set to some
>>>>>> random values for erase PPA. This is unexpected
>>>>>> situation, since erase shall be always chunk
>>>>>> aligned. This patch fixes that issue
>>>>>>
>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>> ---
>>>>>> drivers/lightnvm/pblk-core.c | 1 +
>>>>>> drivers/lightnvm/pblk-map.c  | 2 ++
>>>>>> 2 files changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/lightnvm/pblk-core.c 
>>>>>> b/drivers/lightnvm/pblk-core.c
>>>>>> index a98b2255f963..78b1eea4ab67 100644
>>>>>> --- a/drivers/lightnvm/pblk-core.c
>>>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>>>> @@ -978,6 +978,7 @@ int pblk_line_erase(struct pblk *pblk, struct 
>>>>>> pblk_line *line)
>>>>>>
>>>>>>               ppa = pblk->luns[bit].bppa; /* set ch and lun */
>>>>>>               ppa.a.blk = line->id;
>>>>>> +             ppa.a.reserved = 0;
>>>>>>
>>>>>>               atomic_dec(&line->left_eblks);
>>>>>>               WARN_ON(test_and_set_bit(bit, line->erase_bitmap));
>>>>>> diff --git a/drivers/lightnvm/pblk-map.c 
>>>>>> b/drivers/lightnvm/pblk-map.c
>>>>>> index 79df583ea709..aea46b4ec40f 100644
>>>>>> --- a/drivers/lightnvm/pblk-map.c
>>>>>> +++ b/drivers/lightnvm/pblk-map.c
>>>>>> @@ -161,6 +161,7 @@ int pblk_map_erase_rq(struct pblk *pblk, 
>>>>>> struct nvm_rq *rqd,
>>>>>>
>>>>>>                       *erase_ppa = ppa_list[i];
>>>>>>                       erase_ppa->a.blk = e_line->id;
>>>>>> +                     erase_ppa->a.reserved = 0;
>>>>>>
>>>>>>                       spin_unlock(&e_line->lock);
>>>>>>
>>>>>> @@ -202,6 +203,7 @@ int pblk_map_erase_rq(struct pblk *pblk, 
>>>>>> struct nvm_rq *rqd,
>>>>>>               atomic_dec(&e_line->left_eblks);
>>>>>>               *erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
>>>>>>               erase_ppa->a.blk = e_line->id;
>>>>>> +             erase_ppa->a.reserved = 0;
>>>>>>       }
>>>>>>
>>>>>>       return 0;
>>>>>> -- 
>>>>>> 2.17.1
>>>>>
>>>>> I’m fine with adding this, but note that there is actually no
>>>>> requirement for the erase to be chunk aligned - the only bits that
>>>>> should be looked at are group, PU and chunk.
>>>>>
>>>>> Reviewed-by: Javier González <javier@javigon.com>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 12/13] lightnvm: pblk: close opened chunks
  2019-03-04 12:56       ` Igor Konopko
@ 2019-03-04 13:03         ` Hans Holmberg
  0 siblings, 0 replies; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04 13:03 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Javier González, Matias Bjorling, Hans Holmberg,
	linux-block, Klaus Jensen

On Mon, Mar 4, 2019 at 1:56 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
>
>
> On 04.03.2019 11:05, Hans Holmberg wrote:
> > On Mon, Mar 4, 2019 at 9:46 AM Javier González <javier@javigon.com> wrote:
> >>
> >>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>>
> >>> When we creating pblk instance with factory
> >>> flag, there is a possibility that some chunks
> >>> are in open state, which does not allow to
> >>> issue erase request to them directly. Such a
> >>> chunk should be filled with some empty data in
> >>> order to achieve close state. Without that we
> >>> are risking that some erase operation will be
> >>> rejected by the drive due to inproper chunk
> >>> state.
> >>>
> >>> This patch implements closing chunk logic in pblk
> >>> for that case, when creating instance with factory
> >>> flag in order to fix that issue
> >>>
> >>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>> ---
> >>> drivers/lightnvm/pblk-core.c | 114 +++++++++++++++++++++++++++++++++++
> >>> drivers/lightnvm/pblk-init.c |  14 ++++-
> >>> drivers/lightnvm/pblk.h      |   2 +
> >>> 3 files changed, 128 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> >>> index fa4dc05608ff..d3c45393f093 100644
> >>> --- a/drivers/lightnvm/pblk-core.c
> >>> +++ b/drivers/lightnvm/pblk-core.c
> >>> @@ -161,6 +161,120 @@ struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
> >>>        return meta + ch_off + lun_off + chk_off;
> >>> }
> >>>
> >>> +static void pblk_close_chunk(struct pblk *pblk, struct ppa_addr ppa, int count)
> >>> +{
> >>> +     struct nvm_tgt_dev *dev = pblk->dev;
> >>> +     struct nvm_geo *geo = &dev->geo;
> >>> +     struct bio *bio;
> >>> +     struct ppa_addr *ppa_list;
> >>> +     struct nvm_rq rqd;
> >>> +     void *meta_list, *data;
> >>> +     dma_addr_t dma_meta_list, dma_ppa_list;
> >>> +     int i, rq_ppas, rq_len, ret;
> >>> +
> >>> +     meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
> >>> +     if (!meta_list)
> >>> +             return;
> >>> +
> >>> +     ppa_list = meta_list + pblk_dma_meta_size(pblk);
> >>> +     dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
> >>> +
> >>> +     rq_ppas = pblk_calc_secs(pblk, count, 0, false);
> >>> +     if (!rq_ppas)
> >>> +             rq_ppas = pblk->min_write_pgs;
> >>> +     rq_len = rq_ppas * geo->csecs;
> >>> +
> >>> +     data = kzalloc(rq_len, GFP_KERNEL);
> >>> +     if (!data)
> >>> +             goto free_meta_list;
> >>> +
> >>> +     memset(&rqd, 0, sizeof(struct nvm_rq));
> >>> +     rqd.opcode = NVM_OP_PWRITE;
> >>> +     rqd.nr_ppas = rq_ppas;
> >>> +     rqd.meta_list = meta_list;
> >>> +     rqd.ppa_list = ppa_list;
> >>> +     rqd.dma_ppa_list = dma_ppa_list;
> >>> +     rqd.dma_meta_list = dma_meta_list;
> >>> +
> >>> +next_rq:
> >>> +     bio = bio_map_kern(dev->q, data, rq_len, GFP_KERNEL);
> >>> +     if (IS_ERR(bio))
> >>> +             goto out_next;
> >>> +
> >>> +     bio->bi_iter.bi_sector = 0; /* artificial bio */
> >>> +     bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> >>> +
> >>> +     rqd.bio = bio;
> >>> +     for (i = 0; i < rqd.nr_ppas; i++) {
> >>> +             rqd.ppa_list[i] = ppa;
> >>> +             rqd.ppa_list[i].m.sec += i;
> >>> +             pblk_get_meta(pblk, meta_list, i)->lba =
> >>> +                                     cpu_to_le64(ADDR_EMPTY);
> >>> +     }
> >>> +
> >>> +     ret = nvm_submit_io_sync(dev, &rqd);
> >>> +     if (ret) {
> >>> +             bio_put(bio);
> >>> +             goto out_next;
> >>> +     }
> >>> +
> >>> +     if (rqd.error)
> >>> +             goto free_data;
> >>> +
> >>> +out_next:
> >>> +     count -= rqd.nr_ppas;
> >>> +     ppa.m.sec += rqd.nr_ppas;
> >>> +     if (count > 0)
> >>> +             goto next_rq;
> >>> +
> >>> +free_data:
> >>> +     kfree(data);
> >>> +free_meta_list:
> >>> +     nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
> >>> +}
> >>> +
> >>> +void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
> >>> +{
> >>> +     struct nvm_tgt_dev *dev = pblk->dev;
> >>> +     struct nvm_geo *geo = &dev->geo;
> >>> +     struct nvm_chk_meta *chunk_meta;
> >>> +     struct ppa_addr ppa;
> >>> +     int i, j, k, count;
> >>> +
> >>> +     for (i = 0; i < geo->num_chk; i++) {
> >>> +             for (j = 0; j < geo->num_lun; j++) {
> >>> +                     for (k = 0; k < geo->num_ch; k++) {
> >>> +                             ppa.ppa = 0;
> >>> +                             ppa.m.grp = k;
> >>> +                             ppa.m.pu = j;
> >>> +                             ppa.m.chk = i;
> >>> +
> >>> +                             chunk_meta = pblk_chunk_get_off(pblk,
> >>> +                                                             meta, ppa);
> >>> +                             if (chunk_meta->state == NVM_CHK_ST_OPEN) {
> >>> +                                     ppa.m.sec = chunk_meta->wp;
> >>> +                                     count = geo->clba - chunk_meta->wp;
> >>> +                                     pblk_close_chunk(pblk, ppa, count);
> >>> +                             }
> >>> +                     }
> >>> +             }
> >>> +     }
> >>> +}
> >>> +
> >>> +bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
> >>> +{
> >>> +     struct nvm_tgt_dev *dev = pblk->dev;
> >>> +     struct nvm_geo *geo = &dev->geo;
> >>> +     int i;
> >>> +
> >>> +     for (i = 0; i < geo->all_luns; i++) {
> >>> +             if (meta[i].state == NVM_CHK_ST_OPEN)
> >>> +                     return true;
> >>> +     }
> >>> +
> >>> +     return false;
> >>> +}
> >>> +
> >>> void __pblk_map_invalidate(struct pblk *pblk, struct pblk_line *line,
> >>>                           u64 paddr)
> >>> {
> >>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> >>> index 9913a4514eb6..83abe6960b46 100644
> >>> --- a/drivers/lightnvm/pblk-init.c
> >>> +++ b/drivers/lightnvm/pblk-init.c
> >>> @@ -1028,13 +1028,14 @@ static int pblk_line_meta_init(struct pblk *pblk)
> >>>        return 0;
> >>> }
> >>>
> >>> -static int pblk_lines_init(struct pblk *pblk)
> >>> +static int pblk_lines_init(struct pblk *pblk, bool factory_init)
> >>> {
> >>>        struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> >>>        struct pblk_line *line;
> >>>        void *chunk_meta;
> >>>        int nr_free_chks = 0;
> >>>        int i, ret;
> >>> +     bool retry = false;
> >>>
> >>>        ret = pblk_line_meta_init(pblk);
> >>>        if (ret)
> >>> @@ -1048,12 +1049,21 @@ static int pblk_lines_init(struct pblk *pblk)
> >>>        if (ret)
> >>>                goto fail_free_meta;
> >>>
> >>> +get_chunk_meta:
> >>>        chunk_meta = pblk_get_chunk_meta(pblk);
> >>>        if (IS_ERR(chunk_meta)) {
> >>>                ret = PTR_ERR(chunk_meta);
> >>>                goto fail_free_luns;
> >>>        }
> >>>
> >>> +     if (factory_init && !retry &&
> >>> +         pblk_are_opened_chunks(pblk, chunk_meta)) {
> >>> +             pblk_close_opened_chunks(pblk, chunk_meta);
> >>> +             retry = true;
> >>> +             vfree(chunk_meta);
> >>> +             goto get_chunk_meta;
> >>> +     }
> >>> +
> >>>        pblk->lines = kcalloc(l_mg->nr_lines, sizeof(struct pblk_line),
> >>>                                                                GFP_KERNEL);
> >>>        if (!pblk->lines) {
> >>> @@ -1244,7 +1254,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
> >>>                goto fail;
> >>>        }
> >>>
> >>> -     ret = pblk_lines_init(pblk);
> >>> +     ret = pblk_lines_init(pblk, flags & NVM_TARGET_FACTORY);
> >>>        if (ret) {
> >>>                pblk_err(pblk, "could not initialize lines\n");
> >>>                goto fail_free_core;
> >>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> >>> index b266563508e6..b248642c4dfb 100644
> >>> --- a/drivers/lightnvm/pblk.h
> >>> +++ b/drivers/lightnvm/pblk.h
> >>> @@ -793,6 +793,8 @@ struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk);
> >>> struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
> >>>                                              struct nvm_chk_meta *lp,
> >>>                                              struct ppa_addr ppa);
> >>> +void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
> >>> +bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
> >>> void pblk_log_write_err(struct pblk *pblk, struct nvm_rq *rqd);
> >>> void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd);
> >>> int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd);
> >>> --
> >>> 2.17.1
> >>
> >> I know that the OCSSD 2.0 spec does not allow to transition from open to
> >> free, but to me this is a spec bug as there is no underlying issue in
> >> reading an open block. Note that all controllers I know support this,
> >> and the upcoming Denali spec. fixes this too.
> >
> > + Klaus whom I discussed this with.
> > Yeah, i think that "early reset" is a nice feature. It would be nice
> > to extend the OCSSD spec with a new capabilities bit indicating if
> > this is indeed supported or not.
> > Matias: what do you think?
> >
> >>
> >> Besides, the factory flag is intended to start a pblk instance
> >> immediately, without having to pay the price of padding any past device
> >> state.  If you still want to do this, I think this belongs in a user space tool.
> >>
> >
> > Hear, hear!
> >
> > Serially padding any open chunks during -f create call would be
> > terrible user experience.
> > Lets say that padding a chunk takes one second, we would, in a
> > worst-case scenario in an example disk, be stuck in a syscall for
> > 1*64*1000 seconds ~ 17 hours.
> >
>
> You are both right - this can be time consuming. So we can drop the
> "padding" part and let user do that.
>
> What about changing this patch to at least print some warning on dmesg
> when we have some open chunks in such a case?

Sounds good to me.

>
> > A tool, like dm-zoned's dmzadm would be the right approach, see
> > Documentation/device-mapper/dm-zoned.txt
> > All new pblk instances would then have to be pre-formatted with "pblkadm"
> >
> > A new physical storage format containing a superblock would also be a good idea.
> >
> > / Hans
> >

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio
  2019-03-04  9:48       ` Javier González
  2019-03-04 12:14         ` Hans Holmberg
@ 2019-03-04 13:04         ` Matias Bjørling
  2019-03-04 13:21           ` Javier González
  1 sibling, 1 reply; 91+ messages in thread
From: Matias Bjørling @ 2019-03-04 13:04 UTC (permalink / raw)
  To: Javier González, Hans Holmberg
  Cc: Konopko, Igor J, Hans Holmberg, linux-block

On 3/4/19 10:48 AM, Javier González wrote:
> 
> 
>> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>
>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>
>>>> Currently in case of read errors, bi_status is not
>>>> set properly which leads to returning inproper data
>>>> to higher layer. This patch fix that by setting proper
>>>> status in case of read errors
>>>>
>>>> Patch also removes unnecessary warn_once(), which does
>>>> not make sense in that place, since user bio is not used
>>>> for interation with drive and thus bi_status will not be
>>>> set here.
>>>>
>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>> ---
>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>> index 3789185144da..39c1d6ccaedb 100644
>>>> --- a/drivers/lightnvm/pblk-read.c
>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>       WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>>> }
>>>>
>>>> -static void pblk_end_user_read(struct bio *bio)
>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>> {
>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>> -#endif
>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>> +             bio_io_error(bio);
>>>>       bio_endio(bio);
>>>> }
>>>>
>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>       struct bio *bio = (struct bio *)r_ctx->private;
>>>>
>>>> -     pblk_end_user_read(bio);
>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>       __pblk_end_io_read(pblk, rqd, true);
>>>> }
>>>>
>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>       rqd->bio = NULL;
>>>>       rqd->nr_ppas = nr_secs;
>>>>
>>>> -     bio_endio(bio);
>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>       __pblk_end_io_read(pblk, rqd, false);
>>>> }
>>>>
>>>> --
>>>> 2.17.1
>>>
>>> This is by design. We do not report the read errors as in any other
>>> block device - this is why we clone the read bio.
>>
>> Could you elaborate on why not reporting read errors is a good thing in pblk?
>>
> 
> Normal block devices do not report read errors on the completion path
> unless it is a fatal error. This is actually not well understood by the
> upper layers, which tend to assume that the device is completely broken.
> 
> This is a challenge for OCSSD / Denali / Zone devices as there are cases
> where reads can fail. Unfortunately at this point, we need to mask these
> errors and deal with them in the different layers.

Please don't include zone devices in that list. ZAC/ZBC are 
well-behaved, and an error is a real error.

> 
> For OCSSD currently, we do this in pblk, which I think fits well the
> model as we exposed a normal block device.
> 
> Javier
> 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-03-04 12:22         ` Hans Holmberg
@ 2019-03-04 13:04           ` Igor Konopko
  2019-03-04 13:16             ` Hans Holmberg
  2019-03-04 14:06             ` Javier González
  2019-03-04 13:19           ` Javier González
  1 sibling, 2 replies; 91+ messages in thread
From: Igor Konopko @ 2019-03-04 13:04 UTC (permalink / raw)
  To: Hans Holmberg, Javier González
  Cc: Matias Bjørling, Hans Holmberg, linux-block,
	Simon Andreas Frimann Lund, Klaus Birkelund Jensen



On 04.03.2019 13:22, Hans Holmberg wrote:
> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote:
>>
>>
>>
>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote:
>>>
>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote:
>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>
>>>>> Current lightnvm and pblk implementation does not care
>>>>> about NVMe max data transfer size, which can be smaller
>>>>> than 64*K=256K. This patch fixes issues related to that.
>>>
>>> Could you describe *what* issues you are fixing?

I'm fixing the issue related to controllers which NVMe max data transfer 
size is lower that 256K (for example 128K, which happens for existing 
NVMe controllers and NVMe spec compliant). Such a controllers are not 
able to handle command which contains 64 PPAs, since the the size of 
DMAed buffer will be above the capabilities of such a controller.

>>>
>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>> ---
>>>>> drivers/lightnvm/core.c      | 9 +++++++--
>>>>> drivers/nvme/host/lightnvm.c | 1 +
>>>>> include/linux/lightnvm.h     | 1 +
>>>>> 3 files changed, 9 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>>>> index 5f82036fe322..c01f83b8fbaf 100644
>>>>> --- a/drivers/lightnvm/core.c
>>>>> +++ b/drivers/lightnvm/core.c
>>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>>>       struct nvm_target *t;
>>>>>       struct nvm_tgt_dev *tgt_dev;
>>>>>       void *targetdata;
>>>>> +     unsigned int mdts;
>>>>>       int ret;
>>>>>
>>>>>       switch (create->conf.type) {
>>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>>>       tdisk->private_data = targetdata;
>>>>>       tqueue->queuedata = targetdata;
>>>>>
>>>>> -     blk_queue_max_hw_sectors(tqueue,
>>>>> -                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>>>> +     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
>>>>> +     if (dev->geo.mdts) {
>>>>> +             mdts = min_t(u32, dev->geo.mdts,
>>>>> +                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>>>> +     }
>>>>> +     blk_queue_max_hw_sectors(tqueue, mdts);
>>>>>
>>>>>       set_capacity(tdisk, tt->capacity(targetdata));
>>>>>       add_disk(tdisk);
>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>>> index b759c25c89c8..b88a39a3cbd1 100644
>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
>>>>>       geo->csecs = 1 << ns->lba_shift;
>>>>>       geo->sos = ns->ms;
>>>>>       geo->ext = ns->ext;
>>>>> +     geo->mdts = ns->ctrl->max_hw_sectors;
>>>>>
>>>>>       dev->q = q;
>>>>>       memcpy(dev->name, disk_name, DISK_NAME_LEN);
>>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
>>>>> index 5d865a5d5cdc..d3b02708e5f0 100644
>>>>> --- a/include/linux/lightnvm.h
>>>>> +++ b/include/linux/lightnvm.h
>>>>> @@ -358,6 +358,7 @@ struct nvm_geo {
>>>>>       u16     csecs;          /* sector size */
>>>>>       u16     sos;            /* out-of-band area size */
>>>>>       bool    ext;            /* metadata in extended data buffer */
>>>>> +     u32     mdts;           /* Max data transfer size*/
>>>>>
>>>>>       /* device write constrains */
>>>>>       u32     ws_min;         /* minimum write size */
>>>>> --
>>>>> 2.17.1
>>>>
>>>> I see where you are going with this and I partially agree, but none of
>>>> the OCSSD specs define a way to define this parameter. Thus, adding this
>>>> behavior taken from NVMe in Linux can break current implementations. Is
>>>> this a real life problem for you? Or this is just for NVMe “correctness”?
>>>>
>>>> Javier
>>>
>>> Hmm.Looking into the 2.0 spec what it says about vector reads:
>>>
>>> (figure 28):"The number of Logical Blocks (NLB): This field indicates
>>> the number of logical blocks to be read. This is a 0’s based value.
>>> Maximum of 64 LBAs is supported."
>>>
>>> You got the max limit covered, and the spec  does not say anything
>>> about the minimum number of LBAs to support.
>>>
>>> Matias: any thoughts on this?
>>>
>>> Javier: How would this patch break current implementations?
>>
>> Say an OCSSD controller that sets mdts to a value under 64 or does not
>> set it at all (maybe garbage). Think you can get to one pretty quickly...
> 
> So we cant make use of a perfectly good, standardized, parameter
> because some hypothetical non-compliant device out there might not
> provide a sane value?
> 
>>
>>>
>>> Igor: how does this patch fix the mdts restriction? There are no
>>> checks on i.e. the gc read path that ensures that a lower limit than
>>> NVM_MAX_VLBA is enforced.
>>
>> This is the other part where the implementation breaks.
> 
> No, it just does not fix it.
> 
> over-and-out,
> Hans

IO requests issued from both garbage collector and writer thread are 
upper limiter by pblk->max_write_pgs variable. This variable is 
calculated based on queue_max_hw_sectors() already in pblk.

User reads on the other hand are splitted by block layer based on 
queue_max_hw_sectors() too.

Is there any other path which I'm missing, which will still tries to 
issue IOs with higher IO size? Or am I missing sth else in my logic?

>>
>> Javier

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio
  2019-03-04 12:51           ` Igor Konopko
@ 2019-03-04 13:08             ` Matias Bjørling
  2019-03-04 13:45               ` Javier González
  0 siblings, 1 reply; 91+ messages in thread
From: Matias Bjørling @ 2019-03-04 13:08 UTC (permalink / raw)
  To: Igor Konopko, Hans Holmberg, Javier González
  Cc: Hans Holmberg, linux-block

On 3/4/19 1:51 PM, Igor Konopko wrote:
> 
> 
> On 04.03.2019 13:14, Hans Holmberg wrote:
>> On Mon, Mar 4, 2019 at 10:48 AM Javier González <javier@javigon.com> 
>> wrote:
>>>
>>>
>>>
>>>> On 4 Mar 2019, at 10.35, Hans Holmberg 
>>>> <hans.ml.holmberg@owltronix.com> wrote:
>>>>
>>>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> 
>>>> wrote:
>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> 
>>>>>> wrote:
>>>>>>
>>>>>> Currently in case of read errors, bi_status is not
>>>>>> set properly which leads to returning inproper data
>>>>>> to higher layer. This patch fix that by setting proper
>>>>>> status in case of read errors
>>>>>>
>>>>>> Patch also removes unnecessary warn_once(), which does
>>>>>> not make sense in that place, since user bio is not used
>>>>>> for interation with drive and thus bi_status will not be
>>>>>> set here.
>>>>>>
>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>> ---
>>>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/lightnvm/pblk-read.c 
>>>>>> b/drivers/lightnvm/pblk-read.c
>>>>>> index 3789185144da..39c1d6ccaedb 100644
>>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk 
>>>>>> *pblk, struct nvm_rq *rqd,
>>>>>>       WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random 
>>>>>> request\n");
>>>>>> }
>>>>>>
>>>>>> -static void pblk_end_user_read(struct bio *bio)
>>>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>>>> {
>>>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>>>> -#endif
>>>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>>>> +             bio_io_error(bio);
>>>>>>       bio_endio(bio);
>>>>>> }
>>>>>>
>>>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>>>       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>>       struct bio *bio = (struct bio *)r_ctx->private;
>>>>>>
>>>>>> -     pblk_end_user_read(bio);
>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>       __pblk_end_io_read(pblk, rqd, true);
>>>>>> }
>>>>>>
>>>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct 
>>>>>> nvm_rq *rqd)
>>>>>>       rqd->bio = NULL;
>>>>>>       rqd->nr_ppas = nr_secs;
>>>>>>
>>>>>> -     bio_endio(bio);
>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>       __pblk_end_io_read(pblk, rqd, false);
>>>>>> }
>>>>>>
>>>>>> -- 
>>>>>> 2.17.1
>>>>>
>>>>> This is by design. We do not report the read errors as in any other
>>>>> block device - this is why we clone the read bio.
>>>>
>>>> Could you elaborate on why not reporting read errors is a good thing 
>>>> in pblk?
>>>>
>>>
>>> Normal block devices do not report read errors on the completion path
>>> unless it is a fatal error. This is actually not well understood by the
>>> upper layers, which tend to assume that the device is completely broken.
>>
>> So returning bogus data without even a warning is a preferred
>> solution? You want to force "the upper layers" to do checksumming?
>>
>> It's fine to mask out NVM_RSP_WARN_HIGHECC, since that is just a
>> warning that OCSSD 2.0 adds. The data should still be good.
>> All other errors (see 4.6.1.2.1 in the NVMe 1.3 spec), indicates that
>> the command did not complete (As far as I can tell)
>>
> 
> My approach was exactly like that. In all cases other than WARN_HIGHECC 
> we don't have a valid data. Without setting a bio_io_error() we are 
> creating the impression for other layers, that we read the data 
> correctly, what is not a case then.
> 
> I'm also seeing that this patch is not the only user of bio_io_error() 
> API, also other drivers such as md uses is commonly.

Yes agree. This is an actual error in pblk that lets it return bogus data.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 10/13] lightnvm: pblk: Reduce L2P DRAM footprint
  2019-02-27 17:14 ` [PATCH 10/13] lightnvm: pblk: Reduce L2P DRAM footprint Igor Konopko
  2019-03-04  8:17   ` Javier González
@ 2019-03-04 13:11   ` Matias Bjørling
  1 sibling, 0 replies; 91+ messages in thread
From: Matias Bjørling @ 2019-03-04 13:11 UTC (permalink / raw)
  To: Igor Konopko, javier, hans.holmberg; +Cc: linux-block

On 2/27/19 6:14 PM, Igor Konopko wrote:
> Currently L2P map size is calculated based on
> the total number of available sectors, which is
> redundant, since it contains some number of
> over provisioning (11% by default).
> 
> The goal of this patch is to change this size
> to the real capacity and thus reduce the DRAM
> footprint significantly - with default op value
> it is approx. 110MB of DRAM less for every 1TB
> of pblk drive.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>   drivers/lightnvm/pblk-core.c     | 8 ++++----
>   drivers/lightnvm/pblk-init.c     | 7 +++----
>   drivers/lightnvm/pblk-read.c     | 2 +-
>   drivers/lightnvm/pblk-recovery.c | 2 +-
>   drivers/lightnvm/pblk.h          | 1 -
>   5 files changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index f48f2e77f770..2e424c0275c1 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -2024,7 +2024,7 @@ void pblk_update_map(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
>   	struct ppa_addr ppa_l2p;
>   
>   	/* logic error: lba out-of-bounds. Ignore update */
> -	if (!(lba < pblk->rl.nr_secs)) {
> +	if (!(lba < pblk->capacity)) {
>   		WARN(1, "pblk: corrupted L2P map request\n");
>   		return;
>   	}
> @@ -2064,7 +2064,7 @@ int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa_new,
>   #endif
>   
>   	/* logic error: lba out-of-bounds. Ignore update */
> -	if (!(lba < pblk->rl.nr_secs)) {
> +	if (!(lba < pblk->capacity)) {
>   		WARN(1, "pblk: corrupted L2P map request\n");
>   		return 0;
>   	}
> @@ -2110,7 +2110,7 @@ void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
>   	}
>   
>   	/* logic error: lba out-of-bounds. Ignore update */
> -	if (!(lba < pblk->rl.nr_secs)) {
> +	if (!(lba < pblk->capacity)) {
>   		WARN(1, "pblk: corrupted L2P map request\n");
>   		return;
>   	}
> @@ -2168,7 +2168,7 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>   		lba = lba_list[i];
>   		if (lba != ADDR_EMPTY) {
>   			/* logic error: lba out-of-bounds. Ignore update */
> -			if (!(lba < pblk->rl.nr_secs)) {
> +			if (!(lba < pblk->capacity)) {
>   				WARN(1, "pblk: corrupted L2P map request\n");
>   				continue;
>   			}
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index e553105b7ba1..9913a4514eb6 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -105,7 +105,7 @@ static size_t pblk_trans_map_size(struct pblk *pblk)
>   	if (pblk->addrf_len < 32)
>   		entry_size = 4;
>   
> -	return entry_size * pblk->rl.nr_secs;
> +	return entry_size * pblk->capacity;
>   }
>   
>   #ifdef CONFIG_NVM_PBLK_DEBUG
> @@ -175,7 +175,7 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
>   
>   	pblk_ppa_set_empty(&ppa);
>   
> -	for (i = 0; i < pblk->rl.nr_secs; i++)
> +	for (i = 0; i < pblk->capacity; i++)
>   		pblk_trans_map_set(pblk, i, ppa);
>   
>   	ret = pblk_l2p_recover(pblk, factory_init);
> @@ -706,7 +706,6 @@ static int pblk_set_provision(struct pblk *pblk, int nr_free_chks)
>   	 * on user capacity consider only provisioned blocks
>   	 */
>   	pblk->rl.total_blocks = nr_free_chks;
> -	pblk->rl.nr_secs = nr_free_chks * geo->clba;
>   
>   	/* Consider sectors used for metadata */
>   	sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
> @@ -1289,7 +1288,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
>   
>   	pblk_info(pblk, "luns:%u, lines:%d, secs:%llu, buf entries:%u\n",
>   			geo->all_luns, pblk->l_mg.nr_lines,
> -			(unsigned long long)pblk->rl.nr_secs,
> +			(unsigned long long)pblk->capacity,
>   			pblk->rwb.nr_entries);
>   
>   	wake_up_process(pblk->writer_ts);
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 39c1d6ccaedb..65697463def8 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -561,7 +561,7 @@ static int read_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
>   		goto out;
>   
>   	/* logic error: lba out-of-bounds */
> -	if (lba >= pblk->rl.nr_secs) {
> +	if (lba >= pblk->capacity) {
>   		WARN(1, "pblk: read lba out of bounds\n");
>   		goto out;
>   	}
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index d86f580036d3..83b467b5edc7 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -474,7 +474,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>   
>   		lba_list[paddr++] = cpu_to_le64(lba);
>   
> -		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
> +		if (lba == ADDR_EMPTY || lba >= pblk->capacity)
>   			continue;
>   
>   		line->nr_valid_lbas++;
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index a6386d5acd73..a92377530930 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -305,7 +305,6 @@ struct pblk_rl {
>   
>   	struct timer_list u_timer;
>   
> -	unsigned long long nr_secs;
>   	unsigned long total_blocks;
>   
>   	atomic_t free_blocks;		/* Total number of free blocks (+ OP) */
> 

Thanks Igor. Applied for 5.2. I've also updated the text a bit.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-03-04 13:04           ` Igor Konopko
@ 2019-03-04 13:16             ` Hans Holmberg
  2019-03-04 14:06             ` Javier González
  1 sibling, 0 replies; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04 13:16 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Javier González, Matias Bjørling, Hans Holmberg,
	linux-block, Simon Andreas Frimann Lund, Klaus Birkelund Jensen

On Mon, Mar 4, 2019 at 2:04 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
>
>
> On 04.03.2019 13:22, Hans Holmberg wrote:
> > On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote:
> >>
> >>
> >>
> >>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote:
> >>>
> >>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote:
> >>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>>>>
> >>>>> Current lightnvm and pblk implementation does not care
> >>>>> about NVMe max data transfer size, which can be smaller
> >>>>> than 64*K=256K. This patch fixes issues related to that.
> >>>
> >>> Could you describe *what* issues you are fixing?
>
> I'm fixing the issue related to controllers which NVMe max data transfer
> size is lower that 256K (for example 128K, which happens for existing
> NVMe controllers and NVMe spec compliant). Such a controllers are not
> able to handle command which contains 64 PPAs, since the the size of
> DMAed buffer will be above the capabilities of such a controller.

Thanks for the explanation, that would be great to have in the commit message.

>
> >>>
> >>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>>>> ---
> >>>>> drivers/lightnvm/core.c      | 9 +++++++--
> >>>>> drivers/nvme/host/lightnvm.c | 1 +
> >>>>> include/linux/lightnvm.h     | 1 +
> >>>>> 3 files changed, 9 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> >>>>> index 5f82036fe322..c01f83b8fbaf 100644
> >>>>> --- a/drivers/lightnvm/core.c
> >>>>> +++ b/drivers/lightnvm/core.c
> >>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> >>>>>       struct nvm_target *t;
> >>>>>       struct nvm_tgt_dev *tgt_dev;
> >>>>>       void *targetdata;
> >>>>> +     unsigned int mdts;
> >>>>>       int ret;
> >>>>>
> >>>>>       switch (create->conf.type) {
> >>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> >>>>>       tdisk->private_data = targetdata;
> >>>>>       tqueue->queuedata = targetdata;
> >>>>>
> >>>>> -     blk_queue_max_hw_sectors(tqueue,
> >>>>> -                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> >>>>> +     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
> >>>>> +     if (dev->geo.mdts) {
> >>>>> +             mdts = min_t(u32, dev->geo.mdts,
> >>>>> +                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> >>>>> +     }
> >>>>> +     blk_queue_max_hw_sectors(tqueue, mdts);
> >>>>>
> >>>>>       set_capacity(tdisk, tt->capacity(targetdata));
> >>>>>       add_disk(tdisk);
> >>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> >>>>> index b759c25c89c8..b88a39a3cbd1 100644
> >>>>> --- a/drivers/nvme/host/lightnvm.c
> >>>>> +++ b/drivers/nvme/host/lightnvm.c
> >>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
> >>>>>       geo->csecs = 1 << ns->lba_shift;
> >>>>>       geo->sos = ns->ms;
> >>>>>       geo->ext = ns->ext;
> >>>>> +     geo->mdts = ns->ctrl->max_hw_sectors;
> >>>>>
> >>>>>       dev->q = q;
> >>>>>       memcpy(dev->name, disk_name, DISK_NAME_LEN);
> >>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
> >>>>> index 5d865a5d5cdc..d3b02708e5f0 100644
> >>>>> --- a/include/linux/lightnvm.h
> >>>>> +++ b/include/linux/lightnvm.h
> >>>>> @@ -358,6 +358,7 @@ struct nvm_geo {
> >>>>>       u16     csecs;          /* sector size */
> >>>>>       u16     sos;            /* out-of-band area size */
> >>>>>       bool    ext;            /* metadata in extended data buffer */
> >>>>> +     u32     mdts;           /* Max data transfer size*/
> >>>>>
> >>>>>       /* device write constrains */
> >>>>>       u32     ws_min;         /* minimum write size */
> >>>>> --
> >>>>> 2.17.1
> >>>>
> >>>> I see where you are going with this and I partially agree, but none of
> >>>> the OCSSD specs define a way to define this parameter. Thus, adding this
> >>>> behavior taken from NVMe in Linux can break current implementations. Is
> >>>> this a real life problem for you? Or this is just for NVMe “correctness”?
> >>>>
> >>>> Javier
> >>>
> >>> Hmm.Looking into the 2.0 spec what it says about vector reads:
> >>>
> >>> (figure 28):"The number of Logical Blocks (NLB): This field indicates
> >>> the number of logical blocks to be read. This is a 0’s based value.
> >>> Maximum of 64 LBAs is supported."
> >>>
> >>> You got the max limit covered, and the spec  does not say anything
> >>> about the minimum number of LBAs to support.
> >>>
> >>> Matias: any thoughts on this?
> >>>
> >>> Javier: How would this patch break current implementations?
> >>
> >> Say an OCSSD controller that sets mdts to a value under 64 or does not
> >> set it at all (maybe garbage). Think you can get to one pretty quickly...
> >
> > So we cant make use of a perfectly good, standardized, parameter
> > because some hypothetical non-compliant device out there might not
> > provide a sane value?
> >
> >>
> >>>
> >>> Igor: how does this patch fix the mdts restriction? There are no
> >>> checks on i.e. the gc read path that ensures that a lower limit than
> >>> NVM_MAX_VLBA is enforced.
> >>
> >> This is the other part where the implementation breaks.
> >
> > No, it just does not fix it.
> >
> > over-and-out,
> > Hans
>
> IO requests issued from both garbage collector and writer thread are
> upper limiter by pblk->max_write_pgs variable. This variable is
> calculated based on queue_max_hw_sectors() already in pblk.
>
> User reads on the other hand are splitted by block layer based on
> queue_max_hw_sectors() too.

Mea culpa, right you are!

>
> Is there any other path which I'm missing, which will still tries to
> issue IOs with higher IO size? Or am I missing sth else in my logic?

I missed the adjustments of max_write_pgs in pblk_core_init. We should be good.

Thanks,
Hans

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 12/13] lightnvm: pblk: close opened chunks
  2019-03-04  8:27   ` Javier González
  2019-03-04 10:05     ` Hans Holmberg
@ 2019-03-04 13:18     ` Matias Bjørling
  2019-03-04 13:47       ` Javier González
  1 sibling, 1 reply; 91+ messages in thread
From: Matias Bjørling @ 2019-03-04 13:18 UTC (permalink / raw)
  To: Javier González, Konopko, Igor J; +Cc: Hans Holmberg, linux-block

On 3/4/19 9:27 AM, Javier González wrote:
>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>> When we creating pblk instance with factory
>> flag, there is a possibility that some chunks
>> are in open state, which does not allow to
>> issue erase request to them directly. Such a
>> chunk should be filled with some empty data in
>> order to achieve close state. Without that we
>> are risking that some erase operation will be
>> rejected by the drive due to inproper chunk
>> state.
>>
>> This patch implements closing chunk logic in pblk
>> for that case, when creating instance with factory
>> flag in order to fix that issue
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>> drivers/lightnvm/pblk-core.c | 114 +++++++++++++++++++++++++++++++++++
>> drivers/lightnvm/pblk-init.c |  14 ++++-
>> drivers/lightnvm/pblk.h      |   2 +
>> 3 files changed, 128 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index fa4dc05608ff..d3c45393f093 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -161,6 +161,120 @@ struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
>> 	return meta + ch_off + lun_off + chk_off;
>> }
>>
>> +static void pblk_close_chunk(struct pblk *pblk, struct ppa_addr ppa, int count)
>> +{
>> +	struct nvm_tgt_dev *dev = pblk->dev;
>> +	struct nvm_geo *geo = &dev->geo;
>> +	struct bio *bio;
>> +	struct ppa_addr *ppa_list;
>> +	struct nvm_rq rqd;
>> +	void *meta_list, *data;
>> +	dma_addr_t dma_meta_list, dma_ppa_list;
>> +	int i, rq_ppas, rq_len, ret;
>> +
>> +	meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
>> +	if (!meta_list)
>> +		return;
>> +
>> +	ppa_list = meta_list + pblk_dma_meta_size(pblk);
>> +	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
>> +
>> +	rq_ppas = pblk_calc_secs(pblk, count, 0, false);
>> +	if (!rq_ppas)
>> +		rq_ppas = pblk->min_write_pgs;
>> +	rq_len = rq_ppas * geo->csecs;
>> +
>> +	data = kzalloc(rq_len, GFP_KERNEL);
>> +	if (!data)
>> +		goto free_meta_list;
>> +
>> +	memset(&rqd, 0, sizeof(struct nvm_rq));
>> +	rqd.opcode = NVM_OP_PWRITE;
>> +	rqd.nr_ppas = rq_ppas;
>> +	rqd.meta_list = meta_list;
>> +	rqd.ppa_list = ppa_list;
>> +	rqd.dma_ppa_list = dma_ppa_list;
>> +	rqd.dma_meta_list = dma_meta_list;
>> +
>> +next_rq:
>> +	bio = bio_map_kern(dev->q, data, rq_len, GFP_KERNEL);
>> +	if (IS_ERR(bio))
>> +		goto out_next;
>> +
>> +	bio->bi_iter.bi_sector = 0; /* artificial bio */
>> +	bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>> +
>> +	rqd.bio = bio;
>> +	for (i = 0; i < rqd.nr_ppas; i++) {
>> +		rqd.ppa_list[i] = ppa;
>> +		rqd.ppa_list[i].m.sec += i;
>> +		pblk_get_meta(pblk, meta_list, i)->lba =
>> +					cpu_to_le64(ADDR_EMPTY);
>> +	}
>> +
>> +	ret = nvm_submit_io_sync(dev, &rqd);
>> +	if (ret) {
>> +		bio_put(bio);
>> +		goto out_next;
>> +	}
>> +
>> +	if (rqd.error)
>> +		goto free_data;
>> +
>> +out_next:
>> +	count -= rqd.nr_ppas;
>> +	ppa.m.sec += rqd.nr_ppas;
>> +	if (count > 0)
>> +		goto next_rq;
>> +
>> +free_data:
>> +	kfree(data);
>> +free_meta_list:
>> +	nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>> +}
>> +
>> +void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
>> +{
>> +	struct nvm_tgt_dev *dev = pblk->dev;
>> +	struct nvm_geo *geo = &dev->geo;
>> +	struct nvm_chk_meta *chunk_meta;
>> +	struct ppa_addr ppa;
>> +	int i, j, k, count;
>> +
>> +	for (i = 0; i < geo->num_chk; i++) {
>> +		for (j = 0; j < geo->num_lun; j++) {
>> +			for (k = 0; k < geo->num_ch; k++) {
>> +				ppa.ppa = 0;
>> +				ppa.m.grp = k;
>> +				ppa.m.pu = j;
>> +				ppa.m.chk = i;
>> +
>> +				chunk_meta = pblk_chunk_get_off(pblk,
>> +								meta, ppa);
>> +				if (chunk_meta->state == NVM_CHK_ST_OPEN) {
>> +					ppa.m.sec = chunk_meta->wp;
>> +					count = geo->clba - chunk_meta->wp;
>> +					pblk_close_chunk(pblk, ppa, count);
>> +				}
>> +			}
>> +		}
>> +	}
>> +}
>> +
>> +bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
>> +{
>> +	struct nvm_tgt_dev *dev = pblk->dev;
>> +	struct nvm_geo *geo = &dev->geo;
>> +	int i;
>> +
>> +	for (i = 0; i < geo->all_luns; i++) {
>> +		if (meta[i].state == NVM_CHK_ST_OPEN)
>> +			return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> void __pblk_map_invalidate(struct pblk *pblk, struct pblk_line *line,
>> 			   u64 paddr)
>> {
>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>> index 9913a4514eb6..83abe6960b46 100644
>> --- a/drivers/lightnvm/pblk-init.c
>> +++ b/drivers/lightnvm/pblk-init.c
>> @@ -1028,13 +1028,14 @@ static int pblk_line_meta_init(struct pblk *pblk)
>> 	return 0;
>> }
>>
>> -static int pblk_lines_init(struct pblk *pblk)
>> +static int pblk_lines_init(struct pblk *pblk, bool factory_init)
>> {
>> 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>> 	struct pblk_line *line;
>> 	void *chunk_meta;
>> 	int nr_free_chks = 0;
>> 	int i, ret;
>> +	bool retry = false;
>>
>> 	ret = pblk_line_meta_init(pblk);
>> 	if (ret)
>> @@ -1048,12 +1049,21 @@ static int pblk_lines_init(struct pblk *pblk)
>> 	if (ret)
>> 		goto fail_free_meta;
>>
>> +get_chunk_meta:
>> 	chunk_meta = pblk_get_chunk_meta(pblk);
>> 	if (IS_ERR(chunk_meta)) {
>> 		ret = PTR_ERR(chunk_meta);
>> 		goto fail_free_luns;
>> 	}
>>
>> +	if (factory_init && !retry &&
>> +	    pblk_are_opened_chunks(pblk, chunk_meta)) {
>> +		pblk_close_opened_chunks(pblk, chunk_meta);
>> +		retry = true;
>> +		vfree(chunk_meta);
>> +		goto get_chunk_meta;
>> +	}
>> +
>> 	pblk->lines = kcalloc(l_mg->nr_lines, sizeof(struct pblk_line),
>> 								GFP_KERNEL);
>> 	if (!pblk->lines) {
>> @@ -1244,7 +1254,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
>> 		goto fail;
>> 	}
>>
>> -	ret = pblk_lines_init(pblk);
>> +	ret = pblk_lines_init(pblk, flags & NVM_TARGET_FACTORY);
>> 	if (ret) {
>> 		pblk_err(pblk, "could not initialize lines\n");
>> 		goto fail_free_core;
>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>> index b266563508e6..b248642c4dfb 100644
>> --- a/drivers/lightnvm/pblk.h
>> +++ b/drivers/lightnvm/pblk.h
>> @@ -793,6 +793,8 @@ struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk);
>> struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
>> 					      struct nvm_chk_meta *lp,
>> 					      struct ppa_addr ppa);
>> +void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
>> +bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
>> void pblk_log_write_err(struct pblk *pblk, struct nvm_rq *rqd);
>> void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd);
>> int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd);
>> --
>> 2.17.1
> 
> I know that the OCSSD 2.0 spec does not allow to transition from open to
> free, but to me this is a spec bug as there is no underlying issue in
> reading an open block. Note that all controllers I know support this,
> and the upcoming Denali spec. fixes this too.

The issue is not whether the chunk can be read. It is that going from 
Open -> Empty -> Open causes an erase on some implementations, and 
causes the media to wear out prematurely.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 12/13] lightnvm: pblk: close opened chunks
  2019-03-04 10:05     ` Hans Holmberg
  2019-03-04 12:56       ` Igor Konopko
@ 2019-03-04 13:19       ` Matias Bjørling
  2019-03-04 13:48         ` Javier González
  1 sibling, 1 reply; 91+ messages in thread
From: Matias Bjørling @ 2019-03-04 13:19 UTC (permalink / raw)
  To: Hans Holmberg, Javier González
  Cc: Konopko, Igor J, Hans Holmberg, linux-block, Klaus Jensen

> 
> + Klaus whom I discussed this with.
> Yeah, i think that "early reset" is a nice feature. It would be nice
> to extend the OCSSD spec with a new capabilities bit indicating if
> this is indeed supported or not.
> Matias: what do you think?

I don't mind as long as it is gated by a feature bit. An ECN can be made 
to fix that up and then software can be updated to read that bit for 
what to do.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-03-04 12:22         ` Hans Holmberg
  2019-03-04 13:04           ` Igor Konopko
@ 2019-03-04 13:19           ` Javier González
  2019-03-04 13:25             ` Matias Bjørling
  1 sibling, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04 13:19 UTC (permalink / raw)
  To: Hans Holmberg
  Cc: Konopko, Igor J, Matias Bjørling, Hans Holmberg,
	linux-block, Simon Andreas Frimann Lund, Klaus Birkelund Jensen

[-- Attachment #1: Type: text/plain, Size: 5419 bytes --]


> On 4 Mar 2019, at 13.22, Hans Holmberg <hans@owltronix.com> wrote:
> 
> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote:
>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote:
>>> 
>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote:
>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>> 
>>>>> Current lightnvm and pblk implementation does not care
>>>>> about NVMe max data transfer size, which can be smaller
>>>>> than 64*K=256K. This patch fixes issues related to that.
>>> 
>>> Could you describe *what* issues you are fixing?
>>> 
>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>> ---
>>>>> drivers/lightnvm/core.c      | 9 +++++++--
>>>>> drivers/nvme/host/lightnvm.c | 1 +
>>>>> include/linux/lightnvm.h     | 1 +
>>>>> 3 files changed, 9 insertions(+), 2 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>>>> index 5f82036fe322..c01f83b8fbaf 100644
>>>>> --- a/drivers/lightnvm/core.c
>>>>> +++ b/drivers/lightnvm/core.c
>>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>>>     struct nvm_target *t;
>>>>>     struct nvm_tgt_dev *tgt_dev;
>>>>>     void *targetdata;
>>>>> +     unsigned int mdts;
>>>>>     int ret;
>>>>> 
>>>>>     switch (create->conf.type) {
>>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>>>     tdisk->private_data = targetdata;
>>>>>     tqueue->queuedata = targetdata;
>>>>> 
>>>>> -     blk_queue_max_hw_sectors(tqueue,
>>>>> -                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>>>> +     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
>>>>> +     if (dev->geo.mdts) {
>>>>> +             mdts = min_t(u32, dev->geo.mdts,
>>>>> +                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>>>> +     }
>>>>> +     blk_queue_max_hw_sectors(tqueue, mdts);
>>>>> 
>>>>>     set_capacity(tdisk, tt->capacity(targetdata));
>>>>>     add_disk(tdisk);
>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>>> index b759c25c89c8..b88a39a3cbd1 100644
>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
>>>>>     geo->csecs = 1 << ns->lba_shift;
>>>>>     geo->sos = ns->ms;
>>>>>     geo->ext = ns->ext;
>>>>> +     geo->mdts = ns->ctrl->max_hw_sectors;
>>>>> 
>>>>>     dev->q = q;
>>>>>     memcpy(dev->name, disk_name, DISK_NAME_LEN);
>>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
>>>>> index 5d865a5d5cdc..d3b02708e5f0 100644
>>>>> --- a/include/linux/lightnvm.h
>>>>> +++ b/include/linux/lightnvm.h
>>>>> @@ -358,6 +358,7 @@ struct nvm_geo {
>>>>>     u16     csecs;          /* sector size */
>>>>>     u16     sos;            /* out-of-band area size */
>>>>>     bool    ext;            /* metadata in extended data buffer */
>>>>> +     u32     mdts;           /* Max data transfer size*/
>>>>> 
>>>>>     /* device write constrains */
>>>>>     u32     ws_min;         /* minimum write size */
>>>>> --
>>>>> 2.17.1
>>>> 
>>>> I see where you are going with this and I partially agree, but none of
>>>> the OCSSD specs define a way to define this parameter. Thus, adding this
>>>> behavior taken from NVMe in Linux can break current implementations. Is
>>>> this a real life problem for you? Or this is just for NVMe “correctness”?
>>>> 
>>>> Javier
>>> 
>>> Hmm.Looking into the 2.0 spec what it says about vector reads:
>>> 
>>> (figure 28):"The number of Logical Blocks (NLB): This field indicates
>>> the number of logical blocks to be read. This is a 0’s based value.
>>> Maximum of 64 LBAs is supported."
>>> 
>>> You got the max limit covered, and the spec  does not say anything
>>> about the minimum number of LBAs to support.
>>> 
>>> Matias: any thoughts on this?
>>> 
>>> Javier: How would this patch break current implementations?
>> 
>> Say an OCSSD controller that sets mdts to a value under 64 or does not
>> set it at all (maybe garbage). Think you can get to one pretty quickly...
> 
> So we cant make use of a perfectly good, standardized, parameter
> because some hypothetical non-compliant device out there might not
> provide a sane value?

The OCSSD standard has never used NVMe parameters, so there is no
compliant / non-compliant. In fact, until we changed OCSSD 2.0 to
get the sector and OOB sizes from the standard identify
command, we used to have them in the geometry.

If you did not catch it in the first reference, this concern is explicitly
related to OCSSD controllers already out there - some of which you
should be caring about.

If we are to use this information in the future, I would advocate to
first make changes in the spec and then in the code, not the other way
around.

> 
>>> Igor: how does this patch fix the mdts restriction? There are no
>>> checks on i.e. the gc read path that ensures that a lower limit than
>>> NVM_MAX_VLBA is enforced.
>> 
>> This is the other part where the implementation breaks.
> 
> No, it just does not fix it.

It is broken in _this_ implementation.

> 
> over-and-out,
> Hans
>> Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio
  2019-03-04 13:04         ` Matias Bjørling
@ 2019-03-04 13:21           ` Javier González
  0 siblings, 0 replies; 91+ messages in thread
From: Javier González @ 2019-03-04 13:21 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Hans Holmberg, Konopko, Igor J, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 3418 bytes --]


> On 4 Mar 2019, at 14.04, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> On 3/4/19 10:48 AM, Javier González wrote:
>>> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>> 
>>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>> 
>>>>> Currently in case of read errors, bi_status is not
>>>>> set properly which leads to returning inproper data
>>>>> to higher layer. This patch fix that by setting proper
>>>>> status in case of read errors
>>>>> 
>>>>> Patch also removes unnecessary warn_once(), which does
>>>>> not make sense in that place, since user bio is not used
>>>>> for interation with drive and thus bi_status will not be
>>>>> set here.
>>>>> 
>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>> ---
>>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>> index 3789185144da..39c1d6ccaedb 100644
>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>      WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>>>> }
>>>>> 
>>>>> -static void pblk_end_user_read(struct bio *bio)
>>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>>> {
>>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>>> -#endif
>>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>>> +             bio_io_error(bio);
>>>>>      bio_endio(bio);
>>>>> }
>>>>> 
>>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>>      struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>      struct bio *bio = (struct bio *)r_ctx->private;
>>>>> 
>>>>> -     pblk_end_user_read(bio);
>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>      __pblk_end_io_read(pblk, rqd, true);
>>>>> }
>>>>> 
>>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>      rqd->bio = NULL;
>>>>>      rqd->nr_ppas = nr_secs;
>>>>> 
>>>>> -     bio_endio(bio);
>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>      __pblk_end_io_read(pblk, rqd, false);
>>>>> }
>>>>> 
>>>>> --
>>>>> 2.17.1
>>>> 
>>>> This is by design. We do not report the read errors as in any other
>>>> block device - this is why we clone the read bio.
>>> 
>>> Could you elaborate on why not reporting read errors is a good thing in pblk?
>> Normal block devices do not report read errors on the completion path
>> unless it is a fatal error. This is actually not well understood by the
>> upper layers, which tend to assume that the device is completely broken.
>> This is a challenge for OCSSD / Denali / Zone devices as there are cases
>> where reads can fail. Unfortunately at this point, we need to mask these
>> errors and deal with them in the different layers.
> 
> Please don't include zone devices in that list. ZAC/ZBC are
> well-behaved, and an error is a real error.

They have worked around this for years. AFAIK the read path still
returns predefined data, not an error. So, as I see it is the same

Javier


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-03-04 13:19           ` Javier González
@ 2019-03-04 13:25             ` Matias Bjørling
  2019-03-04 13:44               ` Javier González
  0 siblings, 1 reply; 91+ messages in thread
From: Matias Bjørling @ 2019-03-04 13:25 UTC (permalink / raw)
  To: Javier González, Hans Holmberg
  Cc: Konopko, Igor J, Hans Holmberg, linux-block,
	Simon Andreas Frimann Lund, Klaus Birkelund Jensen

On 3/4/19 2:19 PM, Javier González wrote:
> 
>> On 4 Mar 2019, at 13.22, Hans Holmberg <hans@owltronix.com> wrote:
>>
>> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote:
>>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote:
>>>>
>>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote:
>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>>
>>>>>> Current lightnvm and pblk implementation does not care
>>>>>> about NVMe max data transfer size, which can be smaller
>>>>>> than 64*K=256K. This patch fixes issues related to that.
>>>>
>>>> Could you describe *what* issues you are fixing?
>>>>
>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>> ---
>>>>>> drivers/lightnvm/core.c      | 9 +++++++--
>>>>>> drivers/nvme/host/lightnvm.c | 1 +
>>>>>> include/linux/lightnvm.h     | 1 +
>>>>>> 3 files changed, 9 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>>>>> index 5f82036fe322..c01f83b8fbaf 100644
>>>>>> --- a/drivers/lightnvm/core.c
>>>>>> +++ b/drivers/lightnvm/core.c
>>>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>>>>      struct nvm_target *t;
>>>>>>      struct nvm_tgt_dev *tgt_dev;
>>>>>>      void *targetdata;
>>>>>> +     unsigned int mdts;
>>>>>>      int ret;
>>>>>>
>>>>>>      switch (create->conf.type) {
>>>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>>>>      tdisk->private_data = targetdata;
>>>>>>      tqueue->queuedata = targetdata;
>>>>>>
>>>>>> -     blk_queue_max_hw_sectors(tqueue,
>>>>>> -                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>>>>> +     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
>>>>>> +     if (dev->geo.mdts) {
>>>>>> +             mdts = min_t(u32, dev->geo.mdts,
>>>>>> +                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>>>>> +     }
>>>>>> +     blk_queue_max_hw_sectors(tqueue, mdts);
>>>>>>
>>>>>>      set_capacity(tdisk, tt->capacity(targetdata));
>>>>>>      add_disk(tdisk);
>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>>>> index b759c25c89c8..b88a39a3cbd1 100644
>>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
>>>>>>      geo->csecs = 1 << ns->lba_shift;
>>>>>>      geo->sos = ns->ms;
>>>>>>      geo->ext = ns->ext;
>>>>>> +     geo->mdts = ns->ctrl->max_hw_sectors;
>>>>>>
>>>>>>      dev->q = q;
>>>>>>      memcpy(dev->name, disk_name, DISK_NAME_LEN);
>>>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
>>>>>> index 5d865a5d5cdc..d3b02708e5f0 100644
>>>>>> --- a/include/linux/lightnvm.h
>>>>>> +++ b/include/linux/lightnvm.h
>>>>>> @@ -358,6 +358,7 @@ struct nvm_geo {
>>>>>>      u16     csecs;          /* sector size */
>>>>>>      u16     sos;            /* out-of-band area size */
>>>>>>      bool    ext;            /* metadata in extended data buffer */
>>>>>> +     u32     mdts;           /* Max data transfer size*/
>>>>>>
>>>>>>      /* device write constrains */
>>>>>>      u32     ws_min;         /* minimum write size */
>>>>>> --
>>>>>> 2.17.1
>>>>>
>>>>> I see where you are going with this and I partially agree, but none of
>>>>> the OCSSD specs define a way to define this parameter. Thus, adding this
>>>>> behavior taken from NVMe in Linux can break current implementations. Is
>>>>> this a real life problem for you? Or this is just for NVMe “correctness”?
>>>>>
>>>>> Javier
>>>>
>>>> Hmm.Looking into the 2.0 spec what it says about vector reads:
>>>>
>>>> (figure 28):"The number of Logical Blocks (NLB): This field indicates
>>>> the number of logical blocks to be read. This is a 0’s based value.
>>>> Maximum of 64 LBAs is supported."
>>>>
>>>> You got the max limit covered, and the spec  does not say anything
>>>> about the minimum number of LBAs to support.
>>>>
>>>> Matias: any thoughts on this?
>>>>
>>>> Javier: How would this patch break current implementations?
>>>
>>> Say an OCSSD controller that sets mdts to a value under 64 or does not
>>> set it at all (maybe garbage). Think you can get to one pretty quickly...
>>
>> So we cant make use of a perfectly good, standardized, parameter
>> because some hypothetical non-compliant device out there might not
>> provide a sane value?
> 
> The OCSSD standard has never used NVMe parameters, so there is no
> compliant / non-compliant. In fact, until we changed OCSSD 2.0 to
> get the sector and OOB sizes from the standard identify
> command, we used to have them in the geometry.

What the hell? Yes it has. The whole OCSSD spec is dependent on the NVMe 
spec. It is using many commands from the NVMe specification, which is 
not defined in the OCSSD specification.

The MDTS field should be respected in all case, similarly to how the 
block layer respects it. Since the lightnvm subsystem are hooking in on 
the side, this also be honoured by pblk (or the lightnvm subsystem 
should fix it up)

> 
> If you did not catch it in the first reference, this concern is explicitly
> related to OCSSD controllers already out there - some of which you
> should be caring about.
> 
> If we are to use this information in the future, I would advocate to
> first make changes in the spec and then in the code, not the other way
> around.
> 
>>
>>>> Igor: how does this patch fix the mdts restriction? There are no
>>>> checks on i.e. the gc read path that ensures that a lower limit than
>>>> NVM_MAX_VLBA is enforced.
>>>
>>> This is the other part where the implementation breaks.
>>
>> No, it just does not fix it.
> 
> It is broken in _this_ implementation.
> 
>>
>> over-and-out,
>> Hans
>>> Javier


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-03-04 13:25             ` Matias Bjørling
@ 2019-03-04 13:44               ` Javier González
  2019-03-04 14:24                 ` Hans Holmberg
  2019-03-04 14:58                 ` Matias Bjørling
  0 siblings, 2 replies; 91+ messages in thread
From: Javier González @ 2019-03-04 13:44 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Hans Holmberg, Konopko, Igor J, Hans Holmberg, linux-block,
	Simon Andreas Frimann Lund, Klaus Birkelund Jensen

[-- Attachment #1: Type: text/plain, Size: 6901 bytes --]


> On 4 Mar 2019, at 14.25, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> On 3/4/19 2:19 PM, Javier González wrote:
>>> On 4 Mar 2019, at 13.22, Hans Holmberg <hans@owltronix.com> wrote:
>>> 
>>> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote:
>>>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote:
>>>>> 
>>>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote:
>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>>> 
>>>>>>> Current lightnvm and pblk implementation does not care
>>>>>>> about NVMe max data transfer size, which can be smaller
>>>>>>> than 64*K=256K. This patch fixes issues related to that.
>>>>> 
>>>>> Could you describe *what* issues you are fixing?
>>>>> 
>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>>> ---
>>>>>>> drivers/lightnvm/core.c      | 9 +++++++--
>>>>>>> drivers/nvme/host/lightnvm.c | 1 +
>>>>>>> include/linux/lightnvm.h     | 1 +
>>>>>>> 3 files changed, 9 insertions(+), 2 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>>>>>> index 5f82036fe322..c01f83b8fbaf 100644
>>>>>>> --- a/drivers/lightnvm/core.c
>>>>>>> +++ b/drivers/lightnvm/core.c
>>>>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>>>>>     struct nvm_target *t;
>>>>>>>     struct nvm_tgt_dev *tgt_dev;
>>>>>>>     void *targetdata;
>>>>>>> +     unsigned int mdts;
>>>>>>>     int ret;
>>>>>>> 
>>>>>>>     switch (create->conf.type) {
>>>>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>>>>>     tdisk->private_data = targetdata;
>>>>>>>     tqueue->queuedata = targetdata;
>>>>>>> 
>>>>>>> -     blk_queue_max_hw_sectors(tqueue,
>>>>>>> -                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>>>>>> +     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
>>>>>>> +     if (dev->geo.mdts) {
>>>>>>> +             mdts = min_t(u32, dev->geo.mdts,
>>>>>>> +                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>>>>>> +     }
>>>>>>> +     blk_queue_max_hw_sectors(tqueue, mdts);
>>>>>>> 
>>>>>>>     set_capacity(tdisk, tt->capacity(targetdata));
>>>>>>>     add_disk(tdisk);
>>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>>>>> index b759c25c89c8..b88a39a3cbd1 100644
>>>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
>>>>>>>     geo->csecs = 1 << ns->lba_shift;
>>>>>>>     geo->sos = ns->ms;
>>>>>>>     geo->ext = ns->ext;
>>>>>>> +     geo->mdts = ns->ctrl->max_hw_sectors;
>>>>>>> 
>>>>>>>     dev->q = q;
>>>>>>>     memcpy(dev->name, disk_name, DISK_NAME_LEN);
>>>>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
>>>>>>> index 5d865a5d5cdc..d3b02708e5f0 100644
>>>>>>> --- a/include/linux/lightnvm.h
>>>>>>> +++ b/include/linux/lightnvm.h
>>>>>>> @@ -358,6 +358,7 @@ struct nvm_geo {
>>>>>>>     u16     csecs;          /* sector size */
>>>>>>>     u16     sos;            /* out-of-band area size */
>>>>>>>     bool    ext;            /* metadata in extended data buffer */
>>>>>>> +     u32     mdts;           /* Max data transfer size*/
>>>>>>> 
>>>>>>>     /* device write constrains */
>>>>>>>     u32     ws_min;         /* minimum write size */
>>>>>>> --
>>>>>>> 2.17.1
>>>>>> 
>>>>>> I see where you are going with this and I partially agree, but none of
>>>>>> the OCSSD specs define a way to define this parameter. Thus, adding this
>>>>>> behavior taken from NVMe in Linux can break current implementations. Is
>>>>>> this a real life problem for you? Or this is just for NVMe “correctness”?
>>>>>> 
>>>>>> Javier
>>>>> 
>>>>> Hmm.Looking into the 2.0 spec what it says about vector reads:
>>>>> 
>>>>> (figure 28):"The number of Logical Blocks (NLB): This field indicates
>>>>> the number of logical blocks to be read. This is a 0’s based value.
>>>>> Maximum of 64 LBAs is supported."
>>>>> 
>>>>> You got the max limit covered, and the spec  does not say anything
>>>>> about the minimum number of LBAs to support.
>>>>> 
>>>>> Matias: any thoughts on this?
>>>>> 
>>>>> Javier: How would this patch break current implementations?
>>>> 
>>>> Say an OCSSD controller that sets mdts to a value under 64 or does not
>>>> set it at all (maybe garbage). Think you can get to one pretty quickly...
>>> 
>>> So we cant make use of a perfectly good, standardized, parameter
>>> because some hypothetical non-compliant device out there might not
>>> provide a sane value?
>> The OCSSD standard has never used NVMe parameters, so there is no
>> compliant / non-compliant. In fact, until we changed OCSSD 2.0 to
>> get the sector and OOB sizes from the standard identify
>> command, we used to have them in the geometry.
> 
> What the hell? Yes it has. The whole OCSSD spec is dependent on the
> NVMe spec. It is using many commands from the NVMe specification,
> which is not defined in the OCSSD specification.
> 

First, lower the tone.

Second, no, it has not and never has, starting with all the write
constrains, continuing with the vector commands, etc. You cannot choose
what you want to be compliant with and what you do not. OCSSD uses the
NVMe protocol but it is self sufficient with its geometry for all the
read / write / erase paths - it even depends on different PCIe class
codes to be identified… To do this in the way the rest of the spec is
defined, we either add a filed to the geometry or explicitly mention
that MDTS is used, as we do with the sector and metadata sizes.

Third, as a maintainer of this subsystem you should care about devices
in the field that might break due to such a change (supported by the
company you work for or not) - even if you can argue whether the change
is compliant or not.

And Hans, as a representative of a company that has such devices out
there, you should care too.

What if we add a quirk in the feature bits for this so that newer
devices can implement this and older devices can still function?

> The MDTS field should be respected in all case, similarly to how the
> block layer respects it. Since the lightnvm subsystem are hooking in
> on the side, this also be honoured by pblk (or the lightnvm subsystem
> should fix it up)
> 

This said, pblk does not care which value you give, it uses what the
subsystem tells it - this is not arguing for this change not to be
implemented.

The only thing we should care about if implementing this is removing the
constant defining 64 ppas and making allocations dynamic in the partial
read and GC paths.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio
  2019-03-04 13:08             ` Matias Bjørling
@ 2019-03-04 13:45               ` Javier González
  2019-03-04 15:12                 ` Matias Bjørling
  0 siblings, 1 reply; 91+ messages in thread
From: Javier González @ 2019-03-04 13:45 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Konopko, Igor J, Hans Holmberg, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 4787 bytes --]


> On 4 Mar 2019, at 14.08, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> On 3/4/19 1:51 PM, Igor Konopko wrote:
>> On 04.03.2019 13:14, Hans Holmberg wrote:
>>> On Mon, Mar 4, 2019 at 10:48 AM Javier González <javier@javigon.com> wrote:
>>>>> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>>>> 
>>>>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>>> 
>>>>>>> Currently in case of read errors, bi_status is not
>>>>>>> set properly which leads to returning inproper data
>>>>>>> to higher layer. This patch fix that by setting proper
>>>>>>> status in case of read errors
>>>>>>> 
>>>>>>> Patch also removes unnecessary warn_once(), which does
>>>>>>> not make sense in that place, since user bio is not used
>>>>>>> for interation with drive and thus bi_status will not be
>>>>>>> set here.
>>>>>>> 
>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>>> ---
>>>>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>>>> index 3789185144da..39c1d6ccaedb 100644
>>>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>       WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>>>>>> }
>>>>>>> 
>>>>>>> -static void pblk_end_user_read(struct bio *bio)
>>>>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>>>>> {
>>>>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>>>>> -#endif
>>>>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>>>>> +             bio_io_error(bio);
>>>>>>>       bio_endio(bio);
>>>>>>> }
>>>>>>> 
>>>>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>>>>       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>>>       struct bio *bio = (struct bio *)r_ctx->private;
>>>>>>> 
>>>>>>> -     pblk_end_user_read(bio);
>>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>>       __pblk_end_io_read(pblk, rqd, true);
>>>>>>> }
>>>>>>> 
>>>>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>>       rqd->bio = NULL;
>>>>>>>       rqd->nr_ppas = nr_secs;
>>>>>>> 
>>>>>>> -     bio_endio(bio);
>>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>>       __pblk_end_io_read(pblk, rqd, false);
>>>>>>> }
>>>>>>> 
>>>>>>> --
>>>>>>> 2.17.1
>>>>>> 
>>>>>> This is by design. We do not report the read errors as in any other
>>>>>> block device - this is why we clone the read bio.
>>>>> 
>>>>> Could you elaborate on why not reporting read errors is a good thing in pblk?
>>>> 
>>>> Normal block devices do not report read errors on the completion path
>>>> unless it is a fatal error. This is actually not well understood by the
>>>> upper layers, which tend to assume that the device is completely broken.
>>> 
>>> So returning bogus data without even a warning is a preferred
>>> solution? You want to force "the upper layers" to do checksumming?
>>> 
>>> It's fine to mask out NVM_RSP_WARN_HIGHECC, since that is just a
>>> warning that OCSSD 2.0 adds. The data should still be good.
>>> All other errors (see 4.6.1.2.1 in the NVMe 1.3 spec), indicates that
>>> the command did not complete (As far as I can tell)
>> My approach was exactly like that. In all cases other than WARN_HIGHECC we don't have a valid data. Without setting a bio_io_error() we are creating the impression for other layers, that we read the data correctly, what is not a case then.
>> I'm also seeing that this patch is not the only user of bio_io_error() API, also other drivers such as md uses is commonly.
> 
> Yes agree. This is an actual error in pblk that lets it return bogus data.

I am not against returning an error, I am just saying that this is not
normal behavior on the read path.

The problem is that the upper layers might interpret that the device is
broken completely, which is not true for a spa failing. Think for
example of a host reading under mw_cunits - in reality this is not even
a device problem but a host bug that might result in a fatal error.

Matias: I am surprised to see you answer this way - when I tried to
define a sane read error path with meaningful errors starting in the
spec and all the way up the stack you were the first one to argue for
reads to always succeed no matter what. In fact, using ZBC/ZAC as an
example...

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 12/13] lightnvm: pblk: close opened chunks
  2019-03-04 13:18     ` Matias Bjørling
@ 2019-03-04 13:47       ` Javier González
  0 siblings, 0 replies; 91+ messages in thread
From: Javier González @ 2019-03-04 13:47 UTC (permalink / raw)
  To: Matias Bjørling; +Cc: Konopko, Igor J, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 7810 bytes --]

> On 4 Mar 2019, at 14.18, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> On 3/4/19 9:27 AM, Javier González wrote:
>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>> 
>>> When we creating pblk instance with factory
>>> flag, there is a possibility that some chunks
>>> are in open state, which does not allow to
>>> issue erase request to them directly. Such a
>>> chunk should be filled with some empty data in
>>> order to achieve close state. Without that we
>>> are risking that some erase operation will be
>>> rejected by the drive due to inproper chunk
>>> state.
>>> 
>>> This patch implements closing chunk logic in pblk
>>> for that case, when creating instance with factory
>>> flag in order to fix that issue
>>> 
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-core.c | 114 +++++++++++++++++++++++++++++++++++
>>> drivers/lightnvm/pblk-init.c |  14 ++++-
>>> drivers/lightnvm/pblk.h      |   2 +
>>> 3 files changed, 128 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>> index fa4dc05608ff..d3c45393f093 100644
>>> --- a/drivers/lightnvm/pblk-core.c
>>> +++ b/drivers/lightnvm/pblk-core.c
>>> @@ -161,6 +161,120 @@ struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
>>> 	return meta + ch_off + lun_off + chk_off;
>>> }
>>> 
>>> +static void pblk_close_chunk(struct pblk *pblk, struct ppa_addr ppa, int count)
>>> +{
>>> +	struct nvm_tgt_dev *dev = pblk->dev;
>>> +	struct nvm_geo *geo = &dev->geo;
>>> +	struct bio *bio;
>>> +	struct ppa_addr *ppa_list;
>>> +	struct nvm_rq rqd;
>>> +	void *meta_list, *data;
>>> +	dma_addr_t dma_meta_list, dma_ppa_list;
>>> +	int i, rq_ppas, rq_len, ret;
>>> +
>>> +	meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
>>> +	if (!meta_list)
>>> +		return;
>>> +
>>> +	ppa_list = meta_list + pblk_dma_meta_size(pblk);
>>> +	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
>>> +
>>> +	rq_ppas = pblk_calc_secs(pblk, count, 0, false);
>>> +	if (!rq_ppas)
>>> +		rq_ppas = pblk->min_write_pgs;
>>> +	rq_len = rq_ppas * geo->csecs;
>>> +
>>> +	data = kzalloc(rq_len, GFP_KERNEL);
>>> +	if (!data)
>>> +		goto free_meta_list;
>>> +
>>> +	memset(&rqd, 0, sizeof(struct nvm_rq));
>>> +	rqd.opcode = NVM_OP_PWRITE;
>>> +	rqd.nr_ppas = rq_ppas;
>>> +	rqd.meta_list = meta_list;
>>> +	rqd.ppa_list = ppa_list;
>>> +	rqd.dma_ppa_list = dma_ppa_list;
>>> +	rqd.dma_meta_list = dma_meta_list;
>>> +
>>> +next_rq:
>>> +	bio = bio_map_kern(dev->q, data, rq_len, GFP_KERNEL);
>>> +	if (IS_ERR(bio))
>>> +		goto out_next;
>>> +
>>> +	bio->bi_iter.bi_sector = 0; /* artificial bio */
>>> +	bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>>> +
>>> +	rqd.bio = bio;
>>> +	for (i = 0; i < rqd.nr_ppas; i++) {
>>> +		rqd.ppa_list[i] = ppa;
>>> +		rqd.ppa_list[i].m.sec += i;
>>> +		pblk_get_meta(pblk, meta_list, i)->lba =
>>> +					cpu_to_le64(ADDR_EMPTY);
>>> +	}
>>> +
>>> +	ret = nvm_submit_io_sync(dev, &rqd);
>>> +	if (ret) {
>>> +		bio_put(bio);
>>> +		goto out_next;
>>> +	}
>>> +
>>> +	if (rqd.error)
>>> +		goto free_data;
>>> +
>>> +out_next:
>>> +	count -= rqd.nr_ppas;
>>> +	ppa.m.sec += rqd.nr_ppas;
>>> +	if (count > 0)
>>> +		goto next_rq;
>>> +
>>> +free_data:
>>> +	kfree(data);
>>> +free_meta_list:
>>> +	nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>> +}
>>> +
>>> +void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
>>> +{
>>> +	struct nvm_tgt_dev *dev = pblk->dev;
>>> +	struct nvm_geo *geo = &dev->geo;
>>> +	struct nvm_chk_meta *chunk_meta;
>>> +	struct ppa_addr ppa;
>>> +	int i, j, k, count;
>>> +
>>> +	for (i = 0; i < geo->num_chk; i++) {
>>> +		for (j = 0; j < geo->num_lun; j++) {
>>> +			for (k = 0; k < geo->num_ch; k++) {
>>> +				ppa.ppa = 0;
>>> +				ppa.m.grp = k;
>>> +				ppa.m.pu = j;
>>> +				ppa.m.chk = i;
>>> +
>>> +				chunk_meta = pblk_chunk_get_off(pblk,
>>> +								meta, ppa);
>>> +				if (chunk_meta->state == NVM_CHK_ST_OPEN) {
>>> +					ppa.m.sec = chunk_meta->wp;
>>> +					count = geo->clba - chunk_meta->wp;
>>> +					pblk_close_chunk(pblk, ppa, count);
>>> +				}
>>> +			}
>>> +		}
>>> +	}
>>> +}
>>> +
>>> +bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *meta)
>>> +{
>>> +	struct nvm_tgt_dev *dev = pblk->dev;
>>> +	struct nvm_geo *geo = &dev->geo;
>>> +	int i;
>>> +
>>> +	for (i = 0; i < geo->all_luns; i++) {
>>> +		if (meta[i].state == NVM_CHK_ST_OPEN)
>>> +			return true;
>>> +	}
>>> +
>>> +	return false;
>>> +}
>>> +
>>> void __pblk_map_invalidate(struct pblk *pblk, struct pblk_line *line,
>>> 			   u64 paddr)
>>> {
>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>> index 9913a4514eb6..83abe6960b46 100644
>>> --- a/drivers/lightnvm/pblk-init.c
>>> +++ b/drivers/lightnvm/pblk-init.c
>>> @@ -1028,13 +1028,14 @@ static int pblk_line_meta_init(struct pblk *pblk)
>>> 	return 0;
>>> }
>>> 
>>> -static int pblk_lines_init(struct pblk *pblk)
>>> +static int pblk_lines_init(struct pblk *pblk, bool factory_init)
>>> {
>>> 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>> 	struct pblk_line *line;
>>> 	void *chunk_meta;
>>> 	int nr_free_chks = 0;
>>> 	int i, ret;
>>> +	bool retry = false;
>>> 
>>> 	ret = pblk_line_meta_init(pblk);
>>> 	if (ret)
>>> @@ -1048,12 +1049,21 @@ static int pblk_lines_init(struct pblk *pblk)
>>> 	if (ret)
>>> 		goto fail_free_meta;
>>> 
>>> +get_chunk_meta:
>>> 	chunk_meta = pblk_get_chunk_meta(pblk);
>>> 	if (IS_ERR(chunk_meta)) {
>>> 		ret = PTR_ERR(chunk_meta);
>>> 		goto fail_free_luns;
>>> 	}
>>> 
>>> +	if (factory_init && !retry &&
>>> +	    pblk_are_opened_chunks(pblk, chunk_meta)) {
>>> +		pblk_close_opened_chunks(pblk, chunk_meta);
>>> +		retry = true;
>>> +		vfree(chunk_meta);
>>> +		goto get_chunk_meta;
>>> +	}
>>> +
>>> 	pblk->lines = kcalloc(l_mg->nr_lines, sizeof(struct pblk_line),
>>> 								GFP_KERNEL);
>>> 	if (!pblk->lines) {
>>> @@ -1244,7 +1254,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
>>> 		goto fail;
>>> 	}
>>> 
>>> -	ret = pblk_lines_init(pblk);
>>> +	ret = pblk_lines_init(pblk, flags & NVM_TARGET_FACTORY);
>>> 	if (ret) {
>>> 		pblk_err(pblk, "could not initialize lines\n");
>>> 		goto fail_free_core;
>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>> index b266563508e6..b248642c4dfb 100644
>>> --- a/drivers/lightnvm/pblk.h
>>> +++ b/drivers/lightnvm/pblk.h
>>> @@ -793,6 +793,8 @@ struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk);
>>> struct nvm_chk_meta *pblk_chunk_get_off(struct pblk *pblk,
>>> 					      struct nvm_chk_meta *lp,
>>> 					      struct ppa_addr ppa);
>>> +void pblk_close_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
>>> +bool pblk_are_opened_chunks(struct pblk *pblk, struct nvm_chk_meta *_meta);
>>> void pblk_log_write_err(struct pblk *pblk, struct nvm_rq *rqd);
>>> void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd);
>>> int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd);
>>> --
>>> 2.17.1
>> I know that the OCSSD 2.0 spec does not allow to transition from open to
>> free, but to me this is a spec bug as there is no underlying issue in
>> reading an open block. Note that all controllers I know support this,
>> and the upcoming Denali spec. fixes this too.
> 
> The issue is not whether the chunk can be read. It is that going from
> Open -> Empty -> Open causes an erase on some implementations, and
> causes the media to wear out prematurely.

If the host is padding data to be able to close, the effect on wear is the same.


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 12/13] lightnvm: pblk: close opened chunks
  2019-03-04 13:19       ` Matias Bjørling
@ 2019-03-04 13:48         ` Javier González
  0 siblings, 0 replies; 91+ messages in thread
From: Javier González @ 2019-03-04 13:48 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Hans Holmberg, Konopko, Igor J, Hans Holmberg, linux-block,
	Klaus Birkelund Jensen

[-- Attachment #1: Type: text/plain, Size: 519 bytes --]

> On 4 Mar 2019, at 14.19, Matias Bjørling <mb@lightnvm.io> wrote:
> 
>> + Klaus whom I discussed this with.
>> Yeah, i think that "early reset" is a nice feature. It would be nice
>> to extend the OCSSD spec with a new capabilities bit indicating if
>> this is indeed supported or not.
>> Matias: what do you think?
> 
> I don't mind as long as it is gated by a feature bit. An ECN can be
> made to fix that up and then software can be updated to read that bit
> for what to do.

Works for me.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-03-04 13:04           ` Igor Konopko
  2019-03-04 13:16             ` Hans Holmberg
@ 2019-03-04 14:06             ` Javier González
  1 sibling, 0 replies; 91+ messages in thread
From: Javier González @ 2019-03-04 14:06 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Hans Holmberg, Matias Bjørling, Hans Holmberg, linux-block,
	Simon Andreas Frimann Lund, Klaus Birkelund Jensen



> On 4 Mar 2019, at 14.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> 
> 
>> On 04.03.2019 13:22, Hans Holmberg wrote:
>>> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote:
>>> 
>>> 
>>> 
>>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote:
>>>> 
>>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote:
>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>> 
>>>>>> Current lightnvm and pblk implementation does not care
>>>>>> about NVMe max data transfer size, which can be smaller
>>>>>> than 64*K=256K. This patch fixes issues related to that.
>>>> 
>>>> Could you describe *what* issues you are fixing?
> 
> I'm fixing the issue related to controllers which NVMe max data transfer size is lower that 256K (for example 128K, which happens for existing NVMe controllers and NVMe spec compliant). Such a controllers are not able to handle command which contains 64 PPAs, since the the size of DMAed buffer will be above the capabilities of such a controller.
> 

OK. Then let’s try to get support for this ASAP. If the rest agree on a feature bit then it should be straightforward. If not, let’s at least add a warning if MDTS < 64 so that people are able to identify where the missing performance went when updating the kernel if they are not using MDTS on their OCSSD implementations. 

Javier

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-03-04 13:44               ` Javier González
@ 2019-03-04 14:24                 ` Hans Holmberg
  2019-03-04 14:27                   ` Javier González
  2019-03-04 14:58                 ` Matias Bjørling
  1 sibling, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-04 14:24 UTC (permalink / raw)
  To: Javier González
  Cc: Matias Bjørling, Konopko, Igor J, Hans Holmberg,
	linux-block, Simon Andreas Frimann Lund, Klaus Birkelund Jensen

On Mon, Mar 4, 2019 at 2:44 PM Javier González <javier@javigon.com> wrote:
>
>
> > On 4 Mar 2019, at 14.25, Matias Bjørling <mb@lightnvm.io> wrote:
> >
> > On 3/4/19 2:19 PM, Javier González wrote:
> >>> On 4 Mar 2019, at 13.22, Hans Holmberg <hans@owltronix.com> wrote:
> >>>
> >>> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote:
> >>>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote:
> >>>>>
> >>>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote:
> >>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>>>>>>
> >>>>>>> Current lightnvm and pblk implementation does not care
> >>>>>>> about NVMe max data transfer size, which can be smaller
> >>>>>>> than 64*K=256K. This patch fixes issues related to that.
> >>>>>
> >>>>> Could you describe *what* issues you are fixing?
> >>>>>
> >>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>>>>>> ---
> >>>>>>> drivers/lightnvm/core.c      | 9 +++++++--
> >>>>>>> drivers/nvme/host/lightnvm.c | 1 +
> >>>>>>> include/linux/lightnvm.h     | 1 +
> >>>>>>> 3 files changed, 9 insertions(+), 2 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> >>>>>>> index 5f82036fe322..c01f83b8fbaf 100644
> >>>>>>> --- a/drivers/lightnvm/core.c
> >>>>>>> +++ b/drivers/lightnvm/core.c
> >>>>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> >>>>>>>     struct nvm_target *t;
> >>>>>>>     struct nvm_tgt_dev *tgt_dev;
> >>>>>>>     void *targetdata;
> >>>>>>> +     unsigned int mdts;
> >>>>>>>     int ret;
> >>>>>>>
> >>>>>>>     switch (create->conf.type) {
> >>>>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> >>>>>>>     tdisk->private_data = targetdata;
> >>>>>>>     tqueue->queuedata = targetdata;
> >>>>>>>
> >>>>>>> -     blk_queue_max_hw_sectors(tqueue,
> >>>>>>> -                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> >>>>>>> +     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
> >>>>>>> +     if (dev->geo.mdts) {
> >>>>>>> +             mdts = min_t(u32, dev->geo.mdts,
> >>>>>>> +                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> >>>>>>> +     }
> >>>>>>> +     blk_queue_max_hw_sectors(tqueue, mdts);
> >>>>>>>
> >>>>>>>     set_capacity(tdisk, tt->capacity(targetdata));
> >>>>>>>     add_disk(tdisk);
> >>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> >>>>>>> index b759c25c89c8..b88a39a3cbd1 100644
> >>>>>>> --- a/drivers/nvme/host/lightnvm.c
> >>>>>>> +++ b/drivers/nvme/host/lightnvm.c
> >>>>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
> >>>>>>>     geo->csecs = 1 << ns->lba_shift;
> >>>>>>>     geo->sos = ns->ms;
> >>>>>>>     geo->ext = ns->ext;
> >>>>>>> +     geo->mdts = ns->ctrl->max_hw_sectors;
> >>>>>>>
> >>>>>>>     dev->q = q;
> >>>>>>>     memcpy(dev->name, disk_name, DISK_NAME_LEN);
> >>>>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
> >>>>>>> index 5d865a5d5cdc..d3b02708e5f0 100644
> >>>>>>> --- a/include/linux/lightnvm.h
> >>>>>>> +++ b/include/linux/lightnvm.h
> >>>>>>> @@ -358,6 +358,7 @@ struct nvm_geo {
> >>>>>>>     u16     csecs;          /* sector size */
> >>>>>>>     u16     sos;            /* out-of-band area size */
> >>>>>>>     bool    ext;            /* metadata in extended data buffer */
> >>>>>>> +     u32     mdts;           /* Max data transfer size*/
> >>>>>>>
> >>>>>>>     /* device write constrains */
> >>>>>>>     u32     ws_min;         /* minimum write size */
> >>>>>>> --
> >>>>>>> 2.17.1
> >>>>>>
> >>>>>> I see where you are going with this and I partially agree, but none of
> >>>>>> the OCSSD specs define a way to define this parameter. Thus, adding this
> >>>>>> behavior taken from NVMe in Linux can break current implementations. Is
> >>>>>> this a real life problem for you? Or this is just for NVMe “correctness”?
> >>>>>>
> >>>>>> Javier
> >>>>>
> >>>>> Hmm.Looking into the 2.0 spec what it says about vector reads:
> >>>>>
> >>>>> (figure 28):"The number of Logical Blocks (NLB): This field indicates
> >>>>> the number of logical blocks to be read. This is a 0’s based value.
> >>>>> Maximum of 64 LBAs is supported."
> >>>>>
> >>>>> You got the max limit covered, and the spec  does not say anything
> >>>>> about the minimum number of LBAs to support.
> >>>>>
> >>>>> Matias: any thoughts on this?
> >>>>>
> >>>>> Javier: How would this patch break current implementations?
> >>>>
> >>>> Say an OCSSD controller that sets mdts to a value under 64 or does not
> >>>> set it at all (maybe garbage). Think you can get to one pretty quickly...
> >>>
> >>> So we cant make use of a perfectly good, standardized, parameter
> >>> because some hypothetical non-compliant device out there might not
> >>> provide a sane value?
> >> The OCSSD standard has never used NVMe parameters, so there is no
> >> compliant / non-compliant. In fact, until we changed OCSSD 2.0 to
> >> get the sector and OOB sizes from the standard identify
> >> command, we used to have them in the geometry.
> >
> > What the hell? Yes it has. The whole OCSSD spec is dependent on the
> > NVMe spec. It is using many commands from the NVMe specification,
> > which is not defined in the OCSSD specification.
> >
>
> First, lower the tone.
>
> Second, no, it has not and never has, starting with all the write
> constrains, continuing with the vector commands, etc. You cannot choose
> what you want to be compliant with and what you do not. OCSSD uses the
> NVMe protocol but it is self sufficient with its geometry for all the
> read / write / erase paths - it even depends on different PCIe class
> codes to be identified… To do this in the way the rest of the spec is
> defined, we either add a filed to the geometry or explicitly mention
> that MDTS is used, as we do with the sector and metadata sizes.
>
> Third, as a maintainer of this subsystem you should care about devices
> in the field that might break due to such a change (supported by the
> company you work for or not) - even if you can argue whether the change
> is compliant or not.
>
> And Hans, as a representative of a company that has such devices out
> there, you should care too.

If you worry about me doing my job, you need not to.
I test. So far I have not found any regressions in this patchset.

Please keep your open source hat on Javier.

>
> What if we add a quirk in the feature bits for this so that newer
> devices can implement this and older devices can still function?
>
> > The MDTS field should be respected in all case, similarly to how the
> > block layer respects it. Since the lightnvm subsystem are hooking in
> > on the side, this also be honoured by pblk (or the lightnvm subsystem
> > should fix it up)
> >
>
> This said, pblk does not care which value you give, it uses what the
> subsystem tells it - this is not arguing for this change not to be
> implemented.
>
> The only thing we should care about if implementing this is removing the
> constant defining 64 ppas and making allocations dynamic in the partial
> read and GC paths.
>
> Javier

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-03-04 14:24                 ` Hans Holmberg
@ 2019-03-04 14:27                   ` Javier González
  0 siblings, 0 replies; 91+ messages in thread
From: Javier González @ 2019-03-04 14:27 UTC (permalink / raw)
  To: Hans Holmberg
  Cc: Matias Bjørling, Konopko, Igor J, Hans Holmberg,
	linux-block, Simon Andreas Frimann Lund, Klaus Birkelund Jensen

[-- Attachment #1: Type: text/plain, Size: 6888 bytes --]


> On 4 Mar 2019, at 15.24, Hans Holmberg <hans@owltronix.com> wrote:
> 
> On Mon, Mar 4, 2019 at 2:44 PM Javier González <javier@javigon.com> wrote:
>>> On 4 Mar 2019, at 14.25, Matias Bjørling <mb@lightnvm.io> wrote:
>>> 
>>> On 3/4/19 2:19 PM, Javier González wrote:
>>>>> On 4 Mar 2019, at 13.22, Hans Holmberg <hans@owltronix.com> wrote:
>>>>> 
>>>>> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote:
>>>>>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote:
>>>>>>> 
>>>>>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote:
>>>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>>>>> 
>>>>>>>>> Current lightnvm and pblk implementation does not care
>>>>>>>>> about NVMe max data transfer size, which can be smaller
>>>>>>>>> than 64*K=256K. This patch fixes issues related to that.
>>>>>>> 
>>>>>>> Could you describe *what* issues you are fixing?
>>>>>>> 
>>>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>>>>> ---
>>>>>>>>> drivers/lightnvm/core.c      | 9 +++++++--
>>>>>>>>> drivers/nvme/host/lightnvm.c | 1 +
>>>>>>>>> include/linux/lightnvm.h     | 1 +
>>>>>>>>> 3 files changed, 9 insertions(+), 2 deletions(-)
>>>>>>>>> 
>>>>>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>>>>>>>> index 5f82036fe322..c01f83b8fbaf 100644
>>>>>>>>> --- a/drivers/lightnvm/core.c
>>>>>>>>> +++ b/drivers/lightnvm/core.c
>>>>>>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>>>>>>>    struct nvm_target *t;
>>>>>>>>>    struct nvm_tgt_dev *tgt_dev;
>>>>>>>>>    void *targetdata;
>>>>>>>>> +     unsigned int mdts;
>>>>>>>>>    int ret;
>>>>>>>>> 
>>>>>>>>>    switch (create->conf.type) {
>>>>>>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>>>>>>>    tdisk->private_data = targetdata;
>>>>>>>>>    tqueue->queuedata = targetdata;
>>>>>>>>> 
>>>>>>>>> -     blk_queue_max_hw_sectors(tqueue,
>>>>>>>>> -                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>>>>>>>> +     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
>>>>>>>>> +     if (dev->geo.mdts) {
>>>>>>>>> +             mdts = min_t(u32, dev->geo.mdts,
>>>>>>>>> +                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>>>>>>>> +     }
>>>>>>>>> +     blk_queue_max_hw_sectors(tqueue, mdts);
>>>>>>>>> 
>>>>>>>>>    set_capacity(tdisk, tt->capacity(targetdata));
>>>>>>>>>    add_disk(tdisk);
>>>>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>>>>>>> index b759c25c89c8..b88a39a3cbd1 100644
>>>>>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>>>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
>>>>>>>>>    geo->csecs = 1 << ns->lba_shift;
>>>>>>>>>    geo->sos = ns->ms;
>>>>>>>>>    geo->ext = ns->ext;
>>>>>>>>> +     geo->mdts = ns->ctrl->max_hw_sectors;
>>>>>>>>> 
>>>>>>>>>    dev->q = q;
>>>>>>>>>    memcpy(dev->name, disk_name, DISK_NAME_LEN);
>>>>>>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
>>>>>>>>> index 5d865a5d5cdc..d3b02708e5f0 100644
>>>>>>>>> --- a/include/linux/lightnvm.h
>>>>>>>>> +++ b/include/linux/lightnvm.h
>>>>>>>>> @@ -358,6 +358,7 @@ struct nvm_geo {
>>>>>>>>>    u16     csecs;          /* sector size */
>>>>>>>>>    u16     sos;            /* out-of-band area size */
>>>>>>>>>    bool    ext;            /* metadata in extended data buffer */
>>>>>>>>> +     u32     mdts;           /* Max data transfer size*/
>>>>>>>>> 
>>>>>>>>>    /* device write constrains */
>>>>>>>>>    u32     ws_min;         /* minimum write size */
>>>>>>>>> --
>>>>>>>>> 2.17.1
>>>>>>>> 
>>>>>>>> I see where you are going with this and I partially agree, but none of
>>>>>>>> the OCSSD specs define a way to define this parameter. Thus, adding this
>>>>>>>> behavior taken from NVMe in Linux can break current implementations. Is
>>>>>>>> this a real life problem for you? Or this is just for NVMe “correctness”?
>>>>>>>> 
>>>>>>>> Javier
>>>>>>> 
>>>>>>> Hmm.Looking into the 2.0 spec what it says about vector reads:
>>>>>>> 
>>>>>>> (figure 28):"The number of Logical Blocks (NLB): This field indicates
>>>>>>> the number of logical blocks to be read. This is a 0’s based value.
>>>>>>> Maximum of 64 LBAs is supported."
>>>>>>> 
>>>>>>> You got the max limit covered, and the spec  does not say anything
>>>>>>> about the minimum number of LBAs to support.
>>>>>>> 
>>>>>>> Matias: any thoughts on this?
>>>>>>> 
>>>>>>> Javier: How would this patch break current implementations?
>>>>>> 
>>>>>> Say an OCSSD controller that sets mdts to a value under 64 or does not
>>>>>> set it at all (maybe garbage). Think you can get to one pretty quickly...
>>>>> 
>>>>> So we cant make use of a perfectly good, standardized, parameter
>>>>> because some hypothetical non-compliant device out there might not
>>>>> provide a sane value?
>>>> The OCSSD standard has never used NVMe parameters, so there is no
>>>> compliant / non-compliant. In fact, until we changed OCSSD 2.0 to
>>>> get the sector and OOB sizes from the standard identify
>>>> command, we used to have them in the geometry.
>>> 
>>> What the hell? Yes it has. The whole OCSSD spec is dependent on the
>>> NVMe spec. It is using many commands from the NVMe specification,
>>> which is not defined in the OCSSD specification.
>> 
>> First, lower the tone.
>> 
>> Second, no, it has not and never has, starting with all the write
>> constrains, continuing with the vector commands, etc. You cannot choose
>> what you want to be compliant with and what you do not. OCSSD uses the
>> NVMe protocol but it is self sufficient with its geometry for all the
>> read / write / erase paths - it even depends on different PCIe class
>> codes to be identified… To do this in the way the rest of the spec is
>> defined, we either add a filed to the geometry or explicitly mention
>> that MDTS is used, as we do with the sector and metadata sizes.
>> 
>> Third, as a maintainer of this subsystem you should care about devices
>> in the field that might break due to such a change (supported by the
>> company you work for or not) - even if you can argue whether the change
>> is compliant or not.
>> 
>> And Hans, as a representative of a company that has such devices out
>> there, you should care too.
> 
> If you worry about me doing my job, you need not to.
> I test. So far I have not found any regressions in this patchset.
> 
> Please keep your open source hat on Javier.

Ok. You said it.

Then please apply the patch :)

Reviewed-by: Javier González <javier@javigon.com>

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device
  2019-03-04 13:44               ` Javier González
  2019-03-04 14:24                 ` Hans Holmberg
@ 2019-03-04 14:58                 ` Matias Bjørling
  1 sibling, 0 replies; 91+ messages in thread
From: Matias Bjørling @ 2019-03-04 14:58 UTC (permalink / raw)
  To: Javier González
  Cc: Hans Holmberg, Konopko, Igor J, Hans Holmberg, linux-block,
	Simon Andreas Frimann Lund, Klaus Birkelund Jensen

On 3/4/19 2:44 PM, Javier González wrote:
> 
>> On 4 Mar 2019, at 14.25, Matias Bjørling <mb@lightnvm.io> wrote:
>>
>> On 3/4/19 2:19 PM, Javier González wrote:
>>>> On 4 Mar 2019, at 13.22, Hans Holmberg <hans@owltronix.com> wrote:
>>>>
>>>> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@javigon.com> wrote:
>>>>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@owltronix.com> wrote:
>>>>>>
>>>>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@javigon.com> wrote:
>>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>>>>
>>>>>>>> Current lightnvm and pblk implementation does not care
>>>>>>>> about NVMe max data transfer size, which can be smaller
>>>>>>>> than 64*K=256K. This patch fixes issues related to that.
>>>>>>
>>>>>> Could you describe *what* issues you are fixing?
>>>>>>
>>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>>>> ---
>>>>>>>> drivers/lightnvm/core.c      | 9 +++++++--
>>>>>>>> drivers/nvme/host/lightnvm.c | 1 +
>>>>>>>> include/linux/lightnvm.h     | 1 +
>>>>>>>> 3 files changed, 9 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>>>>>>> index 5f82036fe322..c01f83b8fbaf 100644
>>>>>>>> --- a/drivers/lightnvm/core.c
>>>>>>>> +++ b/drivers/lightnvm/core.c
>>>>>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>>>>>>      struct nvm_target *t;
>>>>>>>>      struct nvm_tgt_dev *tgt_dev;
>>>>>>>>      void *targetdata;
>>>>>>>> +     unsigned int mdts;
>>>>>>>>      int ret;
>>>>>>>>
>>>>>>>>      switch (create->conf.type) {
>>>>>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
>>>>>>>>      tdisk->private_data = targetdata;
>>>>>>>>      tqueue->queuedata = targetdata;
>>>>>>>>
>>>>>>>> -     blk_queue_max_hw_sectors(tqueue,
>>>>>>>> -                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>>>>>>> +     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
>>>>>>>> +     if (dev->geo.mdts) {
>>>>>>>> +             mdts = min_t(u32, dev->geo.mdts,
>>>>>>>> +                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
>>>>>>>> +     }
>>>>>>>> +     blk_queue_max_hw_sectors(tqueue, mdts);
>>>>>>>>
>>>>>>>>      set_capacity(tdisk, tt->capacity(targetdata));
>>>>>>>>      add_disk(tdisk);
>>>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>>>>>> index b759c25c89c8..b88a39a3cbd1 100644
>>>>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
>>>>>>>>      geo->csecs = 1 << ns->lba_shift;
>>>>>>>>      geo->sos = ns->ms;
>>>>>>>>      geo->ext = ns->ext;
>>>>>>>> +     geo->mdts = ns->ctrl->max_hw_sectors;
>>>>>>>>
>>>>>>>>      dev->q = q;
>>>>>>>>      memcpy(dev->name, disk_name, DISK_NAME_LEN);
>>>>>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
>>>>>>>> index 5d865a5d5cdc..d3b02708e5f0 100644
>>>>>>>> --- a/include/linux/lightnvm.h
>>>>>>>> +++ b/include/linux/lightnvm.h
>>>>>>>> @@ -358,6 +358,7 @@ struct nvm_geo {
>>>>>>>>      u16     csecs;          /* sector size */
>>>>>>>>      u16     sos;            /* out-of-band area size */
>>>>>>>>      bool    ext;            /* metadata in extended data buffer */
>>>>>>>> +     u32     mdts;           /* Max data transfer size*/
>>>>>>>>
>>>>>>>>      /* device write constrains */
>>>>>>>>      u32     ws_min;         /* minimum write size */
>>>>>>>> --
>>>>>>>> 2.17.1
>>>>>>>
>>>>>>> I see where you are going with this and I partially agree, but none of
>>>>>>> the OCSSD specs define a way to define this parameter. Thus, adding this
>>>>>>> behavior taken from NVMe in Linux can break current implementations. Is
>>>>>>> this a real life problem for you? Or this is just for NVMe “correctness”?
>>>>>>>
>>>>>>> Javier
>>>>>>
>>>>>> Hmm.Looking into the 2.0 spec what it says about vector reads:
>>>>>>
>>>>>> (figure 28):"The number of Logical Blocks (NLB): This field indicates
>>>>>> the number of logical blocks to be read. This is a 0’s based value.
>>>>>> Maximum of 64 LBAs is supported."
>>>>>>
>>>>>> You got the max limit covered, and the spec  does not say anything
>>>>>> about the minimum number of LBAs to support.
>>>>>>
>>>>>> Matias: any thoughts on this?
>>>>>>
>>>>>> Javier: How would this patch break current implementations?
>>>>>
>>>>> Say an OCSSD controller that sets mdts to a value under 64 or does not
>>>>> set it at all (maybe garbage). Think you can get to one pretty quickly...
>>>>
>>>> So we cant make use of a perfectly good, standardized, parameter
>>>> because some hypothetical non-compliant device out there might not
>>>> provide a sane value?
>>> The OCSSD standard has never used NVMe parameters, so there is no
>>> compliant / non-compliant. In fact, until we changed OCSSD 2.0 to
>>> get the sector and OOB sizes from the standard identify
>>> command, we used to have them in the geometry.
>>
>> What the hell? Yes it has. The whole OCSSD spec is dependent on the
>> NVMe spec. It is using many commands from the NVMe specification,
>> which is not defined in the OCSSD specification.
>>
> 
> First, lower the tone. >
> Second, no, it has not and never has, starting with all the write
> constrains, continuing with the vector commands, etc.  > You cannot choose
> what you want to be compliant with and what you do not. OCSSD uses the
> NVMe protocol but it is self sufficient with its geometry for all the
> read / write / erase paths - it even depends on different PCIe class
> codes to be identified… 

No. It does not.

To do this in the way the rest of the spec is
> defined, we either add a filed to the geometry or explicitly mention
> that MDTS is used, as we do with the sector and metadata sizes.
> 
> Third, as a maintainer of this subsystem you should care about devices
> in the field that might break due to such a change (supported by the
> company you work for or not) - even if you can argue whether the change
> is compliant or not.

Same as Hans. If you worry about me doing my job, you need not to.

> 
> And Hans, as a representative of a company that has such devices out
> there, you should care too.
> 
> What if we add a quirk in the feature bits for this so that newer
> devices can implement this and older devices can still function?
> 
>> The MDTS field should be respected in all case, similarly to how the
>> block layer respects it. Since the lightnvm subsystem are hooking in
>> on the side, this also be honoured by pblk (or the lightnvm subsystem
>> should fix it up)
>>
> 
> This said, pblk does not care which value you give, it uses what the
> subsystem tells it - this is not arguing for this change not to be
> implemented.
> 
> The only thing we should care about if implementing this is removing the
> constant defining 64 ppas and making allocations dynamic in the partial
> read and GC paths.
> 
> Javier
> 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio
  2019-03-04 13:45               ` Javier González
@ 2019-03-04 15:12                 ` Matias Bjørling
  2019-03-05  6:43                   ` Javier González
  0 siblings, 1 reply; 91+ messages in thread
From: Matias Bjørling @ 2019-03-04 15:12 UTC (permalink / raw)
  To: Javier González
  Cc: Konopko, Igor J, Hans Holmberg, Hans Holmberg, linux-block

On 3/4/19 2:45 PM, Javier González wrote:
> 
>> On 4 Mar 2019, at 14.08, Matias Bjørling <mb@lightnvm.io> wrote:
>>
>> On 3/4/19 1:51 PM, Igor Konopko wrote:
>>> On 04.03.2019 13:14, Hans Holmberg wrote:
>>>> On Mon, Mar 4, 2019 at 10:48 AM Javier González <javier@javigon.com> wrote:
>>>>>> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>>>>>
>>>>>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>>>>
>>>>>>>> Currently in case of read errors, bi_status is not
>>>>>>>> set properly which leads to returning inproper data
>>>>>>>> to higher layer. This patch fix that by setting proper
>>>>>>>> status in case of read errors
>>>>>>>>
>>>>>>>> Patch also removes unnecessary warn_once(), which does
>>>>>>>> not make sense in that place, since user bio is not used
>>>>>>>> for interation with drive and thus bi_status will not be
>>>>>>>> set here.
>>>>>>>>
>>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>>>> ---
>>>>>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>>>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>>>>> index 3789185144da..39c1d6ccaedb 100644
>>>>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>        WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>>>>>>> }
>>>>>>>>
>>>>>>>> -static void pblk_end_user_read(struct bio *bio)
>>>>>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>>>>>> {
>>>>>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>>>>>> -#endif
>>>>>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>>>>>> +             bio_io_error(bio);
>>>>>>>>        bio_endio(bio);
>>>>>>>> }
>>>>>>>>
>>>>>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>>>>>        struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>>>>        struct bio *bio = (struct bio *)r_ctx->private;
>>>>>>>>
>>>>>>>> -     pblk_end_user_read(bio);
>>>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>>>        __pblk_end_io_read(pblk, rqd, true);
>>>>>>>> }
>>>>>>>>
>>>>>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>>>        rqd->bio = NULL;
>>>>>>>>        rqd->nr_ppas = nr_secs;
>>>>>>>>
>>>>>>>> -     bio_endio(bio);
>>>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>>>        __pblk_end_io_read(pblk, rqd, false);
>>>>>>>> }
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.17.1
>>>>>>>
>>>>>>> This is by design. We do not report the read errors as in any other
>>>>>>> block device - this is why we clone the read bio.
>>>>>>
>>>>>> Could you elaborate on why not reporting read errors is a good thing in pblk?
>>>>>
>>>>> Normal block devices do not report read errors on the completion path
>>>>> unless it is a fatal error. This is actually not well understood by the
>>>>> upper layers, which tend to assume that the device is completely broken.
>>>>
>>>> So returning bogus data without even a warning is a preferred
>>>> solution? You want to force "the upper layers" to do checksumming?
>>>>
>>>> It's fine to mask out NVM_RSP_WARN_HIGHECC, since that is just a
>>>> warning that OCSSD 2.0 adds. The data should still be good.
>>>> All other errors (see 4.6.1.2.1 in the NVMe 1.3 spec), indicates that
>>>> the command did not complete (As far as I can tell)
>>> My approach was exactly like that. In all cases other than WARN_HIGHECC we don't have a valid data. Without setting a bio_io_error() we are creating the impression for other layers, that we read the data correctly, what is not a case then.
>>> I'm also seeing that this patch is not the only user of bio_io_error() API, also other drivers such as md uses is commonly.
>>
>> Yes agree. This is an actual error in pblk that lets it return bogus data.
> 
> I am not against returning an error, I am just saying that this is not
> normal behavior on the read path.
> 
> The problem is that the upper layers might interpret that the device is
> broken completely, which is not true for a spa failing. Think for
> example of a host reading under mw_cunits - in reality this is not even
> a device problem but a host bug that might result in a fatal error.

Agree, and the host should manage it. The drive shall return a fatal 
error when it is asked to return invalid data. E.g., mw_cunits.

> 
> Matias: I am surprised to see you answer this way - when I tried to
> define a sane read error path with meaningful errors starting in the
> spec and all the way up the stack you were the first one to argue for
> reads to always succeed no matter what. In fact, using ZBC/ZAC as an
> example...

What I objected against was having error messages for each type of error 
that could happen. Instead of a single or few error types (that triggers 
the same set of recovery procedure at the host-side).

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio
  2019-03-04 15:12                 ` Matias Bjørling
@ 2019-03-05  6:43                   ` Javier González
  0 siblings, 0 replies; 91+ messages in thread
From: Javier González @ 2019-03-05  6:43 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Konopko, Igor J, Hans Holmberg, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 6125 bytes --]

> On 4 Mar 2019, at 16.12, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> On 3/4/19 2:45 PM, Javier González wrote:
>>> On 4 Mar 2019, at 14.08, Matias Bjørling <mb@lightnvm.io> wrote:
>>> 
>>> On 3/4/19 1:51 PM, Igor Konopko wrote:
>>>> On 04.03.2019 13:14, Hans Holmberg wrote:
>>>>> On Mon, Mar 4, 2019 at 10:48 AM Javier González <javier@javigon.com> wrote:
>>>>>>> On 4 Mar 2019, at 10.35, Hans Holmberg <hans.ml.holmberg@owltronix.com> wrote:
>>>>>>> 
>>>>>>> On Mon, Mar 4, 2019 at 9:03 AM Javier González <javier@javigon.com> wrote:
>>>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>>>>>> 
>>>>>>>>> Currently in case of read errors, bi_status is not
>>>>>>>>> set properly which leads to returning inproper data
>>>>>>>>> to higher layer. This patch fix that by setting proper
>>>>>>>>> status in case of read errors
>>>>>>>>> 
>>>>>>>>> Patch also removes unnecessary warn_once(), which does
>>>>>>>>> not make sense in that place, since user bio is not used
>>>>>>>>> for interation with drive and thus bi_status will not be
>>>>>>>>> set here.
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>>>>> ---
>>>>>>>>> drivers/lightnvm/pblk-read.c | 11 +++++------
>>>>>>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>>>>> 
>>>>>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>>>>>> index 3789185144da..39c1d6ccaedb 100644
>>>>>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>>>>>> @@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>>       WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> -static void pblk_end_user_read(struct bio *bio)
>>>>>>>>> +static void pblk_end_user_read(struct bio *bio, int error)
>>>>>>>>> {
>>>>>>>>> -#ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>>>> -     WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
>>>>>>>>> -#endif
>>>>>>>>> +     if (error && error != NVM_RSP_WARN_HIGHECC)
>>>>>>>>> +             bio_io_error(bio);
>>>>>>>>>       bio_endio(bio);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> @@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>>>>>>       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>>>>>       struct bio *bio = (struct bio *)r_ctx->private;
>>>>>>>>> 
>>>>>>>>> -     pblk_end_user_read(bio);
>>>>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>>>>       __pblk_end_io_read(pblk, rqd, true);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> @@ -292,7 +291,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>>>>       rqd->bio = NULL;
>>>>>>>>>       rqd->nr_ppas = nr_secs;
>>>>>>>>> 
>>>>>>>>> -     bio_endio(bio);
>>>>>>>>> +     pblk_end_user_read(bio, rqd->error);
>>>>>>>>>       __pblk_end_io_read(pblk, rqd, false);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 2.17.1
>>>>>>>> 
>>>>>>>> This is by design. We do not report the read errors as in any other
>>>>>>>> block device - this is why we clone the read bio.
>>>>>>> 
>>>>>>> Could you elaborate on why not reporting read errors is a good thing in pblk?
>>>>>> 
>>>>>> Normal block devices do not report read errors on the completion path
>>>>>> unless it is a fatal error. This is actually not well understood by the
>>>>>> upper layers, which tend to assume that the device is completely broken.
>>>>> 
>>>>> So returning bogus data without even a warning is a preferred
>>>>> solution? You want to force "the upper layers" to do checksumming?
>>>>> 
>>>>> It's fine to mask out NVM_RSP_WARN_HIGHECC, since that is just a
>>>>> warning that OCSSD 2.0 adds. The data should still be good.
>>>>> All other errors (see 4.6.1.2.1 in the NVMe 1.3 spec), indicates that
>>>>> the command did not complete (As far as I can tell)
>>>> My approach was exactly like that. In all cases other than WARN_HIGHECC we don't have a valid data. Without setting a bio_io_error() we are creating the impression for other layers, that we read the data correctly, what is not a case then.
>>>> I'm also seeing that this patch is not the only user of bio_io_error() API, also other drivers such as md uses is commonly.
>>> 
>>> Yes agree. This is an actual error in pblk that lets it return bogus data.
>> I am not against returning an error, I am just saying that this is not
>> normal behavior on the read path.
>> The problem is that the upper layers might interpret that the device is
>> broken completely, which is not true for a spa failing. Think for
>> example of a host reading under mw_cunits - in reality this is not even
>> a device problem but a host bug that might result in a fatal error.
> 
> Agree, and the host should manage it. The drive shall return a fatal error when it is asked to return invalid data. E.g., mw_cunits.
> 
>> Matias: I am surprised to see you answer this way - when I tried to
>> define a sane read error path with meaningful errors starting in the
>> spec and all the way up the stack you were the first one to argue for
>> reads to always succeed no matter what. In fact, using ZBC/ZAC as an
>> example...
> 
> What I objected against was having error messages for each type of
> error that could happen. Instead of a single or few error types (that
> triggers the same set of recovery procedure at the host-side).

Ok. So just to agree on a good way to move forward: do you think it is a
good idea for the LightNVM subsystem to propagate errors for the read
path moving forward? This would include reading ahead the write pointer,
which _does not_ report errors in ZAC/ZBC either. Here, note that
currently in pblk, we do not prevent a read to go to the device in any
case, and it is the device’s responsibility to return an error. However,
since OCSSD 2.0 does not define a rich error model many errors are
masked under generic buckets.

If so, we need to define which errors are propagated and which are not.
Can you share some insights, using OCSSD 2.0 as a base?

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned
  2019-03-04 13:00             ` Matias Bjørling
@ 2019-03-05  8:20               ` Hans Holmberg
  2019-03-05  8:26                 ` Igor Konopko
  0 siblings, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-05  8:20 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Igor Konopko, Javier González, Hans Holmberg, linux-block

On Mon, Mar 4, 2019 at 2:00 PM Matias Bjørling <mb@lightnvm.io> wrote:
>
> On 3/4/19 1:44 PM, Igor Konopko wrote:
> >
> >
> > On 04.03.2019 12:43, Hans Holmberg wrote:
> >> On Mon, Mar 4, 2019 at 10:11 AM Javier González <javier@javigon.com>
> >> wrote:
> >>>
> >>>> On 4 Mar 2019, at 10.05, Hans Holmberg
> >>>> <hans.ml.holmberg@owltronix.com> wrote:
> >>>>
> >>>> I strongly disagree with adding code that would mask implantation
> >>>> errors.
> >>>>
> >>>> If we want more internal checks, we could add an if statement that
> >>>> would only be compiled in if CONFIG_NVM_PBLK_DEBUG is enabled.
> >>>>
> >>>
> >>> Not sure who this is for - better not to top post.
> >>>
> >>> In any case, this is a spec grey zone. I’m ok with cleaning the bits as
> >>> they mean nothing for the reset command. If you feel that strongly about
> >>> this, you can take if with Igor.
> >>
> >> Pardon the top-post. It was meant for both you and Igor.
> >>
> >
> > OCSSD 2.0 spec for vector chunk reset (chapter 2.2.2) explicitly says
> > "The addresses in the LBA list shall be the first logical block address
> > of each chunk to be reset.". So in my understanding we suppose to clear
> > the sectors bits of the PPA address in order to be spec compliant.
> >
>
> Agree. And since ppa_addr is allocated on the stack, it should be either
> memset or the remaining fields should be set to 0. Maybe better to zero
> initialize in declaration?

Ah, I thought this was not needed, as ppa is initialized as:

ppa = pblk->luns[bit].bppa; /* set ch and lun */

and luns[bit].bppa is initialized to on a value that originally comes
from drivers/lightnvm/core.c:196
(and that's explicitly zeroing all 64 bits before setting ch and lun)

Let me know if i don't make sense here.

>
> >>>
> >>>>
> >>>> On Mon, Mar 4, 2019 at 8:48 AM Javier González <javier@javigon.com>
> >>>> wrote:
> >>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> In current pblk implementation of erase command
> >>>>>> there is a chance tha sector bits are set to some
> >>>>>> random values for erase PPA. This is unexpected
> >>>>>> situation, since erase shall be always chunk
> >>>>>> aligned. This patch fixes that issue
> >>>>>>
> >>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>>>>> ---
> >>>>>> drivers/lightnvm/pblk-core.c | 1 +
> >>>>>> drivers/lightnvm/pblk-map.c  | 2 ++
> >>>>>> 2 files changed, 3 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/lightnvm/pblk-core.c
> >>>>>> b/drivers/lightnvm/pblk-core.c
> >>>>>> index a98b2255f963..78b1eea4ab67 100644
> >>>>>> --- a/drivers/lightnvm/pblk-core.c
> >>>>>> +++ b/drivers/lightnvm/pblk-core.c
> >>>>>> @@ -978,6 +978,7 @@ int pblk_line_erase(struct pblk *pblk, struct
> >>>>>> pblk_line *line)
> >>>>>>
> >>>>>>               ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >>>>>>               ppa.a.blk = line->id;
> >>>>>> +             ppa.a.reserved = 0;
> >>>>>>
> >>>>>>               atomic_dec(&line->left_eblks);
> >>>>>>               WARN_ON(test_and_set_bit(bit, line->erase_bitmap));
> >>>>>> diff --git a/drivers/lightnvm/pblk-map.c
> >>>>>> b/drivers/lightnvm/pblk-map.c
> >>>>>> index 79df583ea709..aea46b4ec40f 100644
> >>>>>> --- a/drivers/lightnvm/pblk-map.c
> >>>>>> +++ b/drivers/lightnvm/pblk-map.c
> >>>>>> @@ -161,6 +161,7 @@ int pblk_map_erase_rq(struct pblk *pblk,
> >>>>>> struct nvm_rq *rqd,
> >>>>>>
> >>>>>>                       *erase_ppa = ppa_list[i];
> >>>>>>                       erase_ppa->a.blk = e_line->id;
> >>>>>> +                     erase_ppa->a.reserved = 0;
> >>>>>>
> >>>>>>                       spin_unlock(&e_line->lock);
> >>>>>>
> >>>>>> @@ -202,6 +203,7 @@ int pblk_map_erase_rq(struct pblk *pblk,
> >>>>>> struct nvm_rq *rqd,
> >>>>>>               atomic_dec(&e_line->left_eblks);
> >>>>>>               *erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >>>>>>               erase_ppa->a.blk = e_line->id;
> >>>>>> +             erase_ppa->a.reserved = 0;
> >>>>>>       }
> >>>>>>
> >>>>>>       return 0;
> >>>>>> --
> >>>>>> 2.17.1
> >>>>>
> >>>>> I’m fine with adding this, but note that there is actually no
> >>>>> requirement for the erase to be chunk aligned - the only bits that
> >>>>> should be looked at are group, PU and chunk.
> >>>>>
> >>>>> Reviewed-by: Javier González <javier@javigon.com>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned
  2019-03-05  8:20               ` Hans Holmberg
@ 2019-03-05  8:26                 ` Igor Konopko
  2019-03-05  8:40                   ` Hans Holmberg
  0 siblings, 1 reply; 91+ messages in thread
From: Igor Konopko @ 2019-03-05  8:26 UTC (permalink / raw)
  To: Hans Holmberg, Matias Bjørling
  Cc: Javier González, Hans Holmberg, linux-block



On 05.03.2019 09:20, Hans Holmberg wrote:
> On Mon, Mar 4, 2019 at 2:00 PM Matias Bjørling <mb@lightnvm.io> wrote:
>>
>> On 3/4/19 1:44 PM, Igor Konopko wrote:
>>>
>>>
>>> On 04.03.2019 12:43, Hans Holmberg wrote:
>>>> On Mon, Mar 4, 2019 at 10:11 AM Javier González <javier@javigon.com>
>>>> wrote:
>>>>>
>>>>>> On 4 Mar 2019, at 10.05, Hans Holmberg
>>>>>> <hans.ml.holmberg@owltronix.com> wrote:
>>>>>>
>>>>>> I strongly disagree with adding code that would mask implantation
>>>>>> errors.
>>>>>>
>>>>>> If we want more internal checks, we could add an if statement that
>>>>>> would only be compiled in if CONFIG_NVM_PBLK_DEBUG is enabled.
>>>>>>
>>>>>
>>>>> Not sure who this is for - better not to top post.
>>>>>
>>>>> In any case, this is a spec grey zone. I’m ok with cleaning the bits as
>>>>> they mean nothing for the reset command. If you feel that strongly about
>>>>> this, you can take if with Igor.
>>>>
>>>> Pardon the top-post. It was meant for both you and Igor.
>>>>
>>>
>>> OCSSD 2.0 spec for vector chunk reset (chapter 2.2.2) explicitly says
>>> "The addresses in the LBA list shall be the first logical block address
>>> of each chunk to be reset.". So in my understanding we suppose to clear
>>> the sectors bits of the PPA address in order to be spec compliant.
>>>
>>
>> Agree. And since ppa_addr is allocated on the stack, it should be either
>> memset or the remaining fields should be set to 0. Maybe better to zero
>> initialize in declaration?
> 
> Ah, I thought this was not needed, as ppa is initialized as:
> 
> ppa = pblk->luns[bit].bppa; /* set ch and lun */
> 
> and luns[bit].bppa is initialized to on a value that originally comes
> from drivers/lightnvm/core.c:196
> (and that's explicitly zeroing all 64 bits before setting ch and lun)
> 
> Let me know if i don't make sense here.
> 

I just noticed the same.

In two places (pblk-core:1095 and pblk-map:205) we are using values 
initialized previously in core.c - so my changes are not needed here.

But still there is one place (pblk-map:163) where we initializing 
erase_ppa based on ppa_list[i], which has PPA sector set in most of the 
cases, so this zeroing is still needed here.

>>
>>>>>
>>>>>>
>>>>>> On Mon, Mar 4, 2019 at 8:48 AM Javier González <javier@javigon.com>
>>>>>> wrote:
>>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> In current pblk implementation of erase command
>>>>>>>> there is a chance tha sector bits are set to some
>>>>>>>> random values for erase PPA. This is unexpected
>>>>>>>> situation, since erase shall be always chunk
>>>>>>>> aligned. This patch fixes that issue
>>>>>>>>
>>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>>>> ---
>>>>>>>> drivers/lightnvm/pblk-core.c | 1 +
>>>>>>>> drivers/lightnvm/pblk-map.c  | 2 ++
>>>>>>>> 2 files changed, 3 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/lightnvm/pblk-core.c
>>>>>>>> b/drivers/lightnvm/pblk-core.c
>>>>>>>> index a98b2255f963..78b1eea4ab67 100644
>>>>>>>> --- a/drivers/lightnvm/pblk-core.c
>>>>>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>>>>>> @@ -978,6 +978,7 @@ int pblk_line_erase(struct pblk *pblk, struct
>>>>>>>> pblk_line *line)
>>>>>>>>
>>>>>>>>                ppa = pblk->luns[bit].bppa; /* set ch and lun */
>>>>>>>>                ppa.a.blk = line->id;
>>>>>>>> +             ppa.a.reserved = 0;
>>>>>>>>
>>>>>>>>                atomic_dec(&line->left_eblks);
>>>>>>>>                WARN_ON(test_and_set_bit(bit, line->erase_bitmap));
>>>>>>>> diff --git a/drivers/lightnvm/pblk-map.c
>>>>>>>> b/drivers/lightnvm/pblk-map.c
>>>>>>>> index 79df583ea709..aea46b4ec40f 100644
>>>>>>>> --- a/drivers/lightnvm/pblk-map.c
>>>>>>>> +++ b/drivers/lightnvm/pblk-map.c
>>>>>>>> @@ -161,6 +161,7 @@ int pblk_map_erase_rq(struct pblk *pblk,
>>>>>>>> struct nvm_rq *rqd,
>>>>>>>>
>>>>>>>>                        *erase_ppa = ppa_list[i];
>>>>>>>>                        erase_ppa->a.blk = e_line->id;
>>>>>>>> +                     erase_ppa->a.reserved = 0;
>>>>>>>>
>>>>>>>>                        spin_unlock(&e_line->lock);
>>>>>>>>
>>>>>>>> @@ -202,6 +203,7 @@ int pblk_map_erase_rq(struct pblk *pblk,
>>>>>>>> struct nvm_rq *rqd,
>>>>>>>>                atomic_dec(&e_line->left_eblks);
>>>>>>>>                *erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
>>>>>>>>                erase_ppa->a.blk = e_line->id;
>>>>>>>> +             erase_ppa->a.reserved = 0;
>>>>>>>>        }
>>>>>>>>
>>>>>>>>        return 0;
>>>>>>>> --
>>>>>>>> 2.17.1
>>>>>>>
>>>>>>> I’m fine with adding this, but note that there is actually no
>>>>>>> requirement for the erase to be chunk aligned - the only bits that
>>>>>>> should be looked at are group, PU and chunk.
>>>>>>>
>>>>>>> Reviewed-by: Javier González <javier@javigon.com>
>>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned
  2019-03-05  8:26                 ` Igor Konopko
@ 2019-03-05  8:40                   ` Hans Holmberg
       [not found]                     ` <61b7e62a-d229-95b1-2572-336ab1bd67cb@intel.com>
  0 siblings, 1 reply; 91+ messages in thread
From: Hans Holmberg @ 2019-03-05  8:40 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Matias Bjørling, Javier González, Hans Holmberg, linux-block

On Tue, Mar 5, 2019 at 9:26 AM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
>
>
> On 05.03.2019 09:20, Hans Holmberg wrote:
> > On Mon, Mar 4, 2019 at 2:00 PM Matias Bjørling <mb@lightnvm.io> wrote:
> >>
> >> On 3/4/19 1:44 PM, Igor Konopko wrote:
> >>>
> >>>
> >>> On 04.03.2019 12:43, Hans Holmberg wrote:
> >>>> On Mon, Mar 4, 2019 at 10:11 AM Javier González <javier@javigon.com>
> >>>> wrote:
> >>>>>
> >>>>>> On 4 Mar 2019, at 10.05, Hans Holmberg
> >>>>>> <hans.ml.holmberg@owltronix.com> wrote:
> >>>>>>
> >>>>>> I strongly disagree with adding code that would mask implantation
> >>>>>> errors.
> >>>>>>
> >>>>>> If we want more internal checks, we could add an if statement that
> >>>>>> would only be compiled in if CONFIG_NVM_PBLK_DEBUG is enabled.
> >>>>>>
> >>>>>
> >>>>> Not sure who this is for - better not to top post.
> >>>>>
> >>>>> In any case, this is a spec grey zone. I’m ok with cleaning the bits as
> >>>>> they mean nothing for the reset command. If you feel that strongly about
> >>>>> this, you can take if with Igor.
> >>>>
> >>>> Pardon the top-post. It was meant for both you and Igor.
> >>>>
> >>>
> >>> OCSSD 2.0 spec for vector chunk reset (chapter 2.2.2) explicitly says
> >>> "The addresses in the LBA list shall be the first logical block address
> >>> of each chunk to be reset.". So in my understanding we suppose to clear
> >>> the sectors bits of the PPA address in order to be spec compliant.
> >>>
> >>
> >> Agree. And since ppa_addr is allocated on the stack, it should be either
> >> memset or the remaining fields should be set to 0. Maybe better to zero
> >> initialize in declaration?
> >
> > Ah, I thought this was not needed, as ppa is initialized as:
> >
> > ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >
> > and luns[bit].bppa is initialized to on a value that originally comes
> > from drivers/lightnvm/core.c:196
> > (and that's explicitly zeroing all 64 bits before setting ch and lun)
> >
> > Let me know if i don't make sense here.
> >
>
> I just noticed the same.
>
> In two places (pblk-core:1095 and pblk-map:205) we are using values
> initialized previously in core.c - so my changes are not needed here.
>
> But still there is one place (pblk-map:163) where we initializing
> erase_ppa based on ppa_list[i], which has PPA sector set in most of the
> cases, so this zeroing is still needed here.

Yes, you are right, thanks for pointing it out. Are you ok with just
changing this?

> >>
> >>>>>
> >>>>>>
> >>>>>> On Mon, Mar 4, 2019 at 8:48 AM Javier González <javier@javigon.com>
> >>>>>> wrote:
> >>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> In current pblk implementation of erase command
> >>>>>>>> there is a chance tha sector bits are set to some
> >>>>>>>> random values for erase PPA. This is unexpected
> >>>>>>>> situation, since erase shall be always chunk
> >>>>>>>> aligned. This patch fixes that issue
> >>>>>>>>
> >>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>>>>>>> ---
> >>>>>>>> drivers/lightnvm/pblk-core.c | 1 +
> >>>>>>>> drivers/lightnvm/pblk-map.c  | 2 ++
> >>>>>>>> 2 files changed, 3 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/lightnvm/pblk-core.c
> >>>>>>>> b/drivers/lightnvm/pblk-core.c
> >>>>>>>> index a98b2255f963..78b1eea4ab67 100644
> >>>>>>>> --- a/drivers/lightnvm/pblk-core.c
> >>>>>>>> +++ b/drivers/lightnvm/pblk-core.c
> >>>>>>>> @@ -978,6 +978,7 @@ int pblk_line_erase(struct pblk *pblk, struct
> >>>>>>>> pblk_line *line)
> >>>>>>>>
> >>>>>>>>                ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >>>>>>>>                ppa.a.blk = line->id;
> >>>>>>>> +             ppa.a.reserved = 0;
> >>>>>>>>
> >>>>>>>>                atomic_dec(&line->left_eblks);
> >>>>>>>>                WARN_ON(test_and_set_bit(bit, line->erase_bitmap));
> >>>>>>>> diff --git a/drivers/lightnvm/pblk-map.c
> >>>>>>>> b/drivers/lightnvm/pblk-map.c
> >>>>>>>> index 79df583ea709..aea46b4ec40f 100644
> >>>>>>>> --- a/drivers/lightnvm/pblk-map.c
> >>>>>>>> +++ b/drivers/lightnvm/pblk-map.c
> >>>>>>>> @@ -161,6 +161,7 @@ int pblk_map_erase_rq(struct pblk *pblk,
> >>>>>>>> struct nvm_rq *rqd,
> >>>>>>>>
> >>>>>>>>                        *erase_ppa = ppa_list[i];
> >>>>>>>>                        erase_ppa->a.blk = e_line->id;
> >>>>>>>> +                     erase_ppa->a.reserved = 0;
> >>>>>>>>
> >>>>>>>>                        spin_unlock(&e_line->lock);
> >>>>>>>>
> >>>>>>>> @@ -202,6 +203,7 @@ int pblk_map_erase_rq(struct pblk *pblk,
> >>>>>>>> struct nvm_rq *rqd,
> >>>>>>>>                atomic_dec(&e_line->left_eblks);
> >>>>>>>>                *erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >>>>>>>>                erase_ppa->a.blk = e_line->id;
> >>>>>>>> +             erase_ppa->a.reserved = 0;
> >>>>>>>>        }
> >>>>>>>>
> >>>>>>>>        return 0;
> >>>>>>>> --
> >>>>>>>> 2.17.1
> >>>>>>>
> >>>>>>> I’m fine with adding this, but note that there is actually no
> >>>>>>> requirement for the erase to be chunk aligned - the only bits that
> >>>>>>> should be looked at are group, PU and chunk.
> >>>>>>>
> >>>>>>> Reviewed-by: Javier González <javier@javigon.com>
> >>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned
       [not found]                     ` <61b7e62a-d229-95b1-2572-336ab1bd67cb@intel.com>
@ 2019-03-05  8:55                       ` Hans Holmberg
  0 siblings, 0 replies; 91+ messages in thread
From: Hans Holmberg @ 2019-03-05  8:55 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Matias Bjørling, Javier González, Hans Holmberg, linux-block

On Tue, Mar 5, 2019 at 9:51 AM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
>
>
> On 05.03.2019 09:40, Hans Holmberg wrote:
> > On Tue, Mar 5, 2019 at 9:26 AM Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>
> >>
> >>
> >> On 05.03.2019 09:20, Hans Holmberg wrote:
> >>> On Mon, Mar 4, 2019 at 2:00 PM Matias Bjørling <mb@lightnvm.io> wrote:
> >>>>
> >>>> On 3/4/19 1:44 PM, Igor Konopko wrote:
> >>>>>
> >>>>>
> >>>>> On 04.03.2019 12:43, Hans Holmberg wrote:
> >>>>>> On Mon, Mar 4, 2019 at 10:11 AM Javier González <javier@javigon.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> On 4 Mar 2019, at 10.05, Hans Holmberg
> >>>>>>>> <hans.ml.holmberg@owltronix.com> wrote:
> >>>>>>>>
> >>>>>>>> I strongly disagree with adding code that would mask implantation
> >>>>>>>> errors.
> >>>>>>>>
> >>>>>>>> If we want more internal checks, we could add an if statement that
> >>>>>>>> would only be compiled in if CONFIG_NVM_PBLK_DEBUG is enabled.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Not sure who this is for - better not to top post.
> >>>>>>>
> >>>>>>> In any case, this is a spec grey zone. I’m ok with cleaning the bits as
> >>>>>>> they mean nothing for the reset command. If you feel that strongly about
> >>>>>>> this, you can take if with Igor.
> >>>>>>
> >>>>>> Pardon the top-post. It was meant for both you and Igor.
> >>>>>>
> >>>>>
> >>>>> OCSSD 2.0 spec for vector chunk reset (chapter 2.2.2) explicitly says
> >>>>> "The addresses in the LBA list shall be the first logical block address
> >>>>> of each chunk to be reset.". So in my understanding we suppose to clear
> >>>>> the sectors bits of the PPA address in order to be spec compliant.
> >>>>>
> >>>>
> >>>> Agree. And since ppa_addr is allocated on the stack, it should be either
> >>>> memset or the remaining fields should be set to 0. Maybe better to zero
> >>>> initialize in declaration?
> >>>
> >>> Ah, I thought this was not needed, as ppa is initialized as:
> >>>
> >>> ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >>>
> >>> and luns[bit].bppa is initialized to on a value that originally comes
> >>> from drivers/lightnvm/core.c:196
> >>> (and that's explicitly zeroing all 64 bits before setting ch and lun)
> >>>
> >>> Let me know if i don't make sense here.
> >>>
> >>
> >> I just noticed the same.
> >>
> >> In two places (pblk-core:1095 and pblk-map:205) we are using values
> >> initialized previously in core.c - so my changes are not needed here.
> >>
> >> But still there is one place (pblk-map:163) where we initializing
> >> erase_ppa based on ppa_list[i], which has PPA sector set in most of the
> >> cases, so this zeroing is still needed here.
> >
> > Yes, you are right, thanks for pointing it out. Are you ok with just
> > changing this?
>
> Yes, that's my plan for v2.

Great, thanks.

> >
> >>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Mar 4, 2019 at 8:48 AM Javier González <javier@javigon.com>
> >>>>>>>> wrote:
> >>>>>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@intel.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> In current pblk implementation of erase command
> >>>>>>>>>> there is a chance tha sector bits are set to some
> >>>>>>>>>> random values for erase PPA. This is unexpected
> >>>>>>>>>> situation, since erase shall be always chunk
> >>>>>>>>>> aligned. This patch fixes that issue
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>>>>>>>>> ---
> >>>>>>>>>> drivers/lightnvm/pblk-core.c | 1 +
> >>>>>>>>>> drivers/lightnvm/pblk-map.c  | 2 ++
> >>>>>>>>>> 2 files changed, 3 insertions(+)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/lightnvm/pblk-core.c
> >>>>>>>>>> b/drivers/lightnvm/pblk-core.c
> >>>>>>>>>> index a98b2255f963..78b1eea4ab67 100644
> >>>>>>>>>> --- a/drivers/lightnvm/pblk-core.c
> >>>>>>>>>> +++ b/drivers/lightnvm/pblk-core.c
> >>>>>>>>>> @@ -978,6 +978,7 @@ int pblk_line_erase(struct pblk *pblk, struct
> >>>>>>>>>> pblk_line *line)
> >>>>>>>>>>
> >>>>>>>>>>                 ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >>>>>>>>>>                 ppa.a.blk = line->id;
> >>>>>>>>>> +             ppa.a.reserved = 0;
> >>>>>>>>>>
> >>>>>>>>>>                 atomic_dec(&line->left_eblks);
> >>>>>>>>>>                 WARN_ON(test_and_set_bit(bit, line->erase_bitmap));
> >>>>>>>>>> diff --git a/drivers/lightnvm/pblk-map.c
> >>>>>>>>>> b/drivers/lightnvm/pblk-map.c
> >>>>>>>>>> index 79df583ea709..aea46b4ec40f 100644
> >>>>>>>>>> --- a/drivers/lightnvm/pblk-map.c
> >>>>>>>>>> +++ b/drivers/lightnvm/pblk-map.c
> >>>>>>>>>> @@ -161,6 +161,7 @@ int pblk_map_erase_rq(struct pblk *pblk,
> >>>>>>>>>> struct nvm_rq *rqd,
> >>>>>>>>>>
> >>>>>>>>>>                         *erase_ppa = ppa_list[i];
> >>>>>>>>>>                         erase_ppa->a.blk = e_line->id;
> >>>>>>>>>> +                     erase_ppa->a.reserved = 0;
> >>>>>>>>>>
> >>>>>>>>>>                         spin_unlock(&e_line->lock);
> >>>>>>>>>>
> >>>>>>>>>> @@ -202,6 +203,7 @@ int pblk_map_erase_rq(struct pblk *pblk,
> >>>>>>>>>> struct nvm_rq *rqd,
> >>>>>>>>>>                 atomic_dec(&e_line->left_eblks);
> >>>>>>>>>>                 *erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
> >>>>>>>>>>                 erase_ppa->a.blk = e_line->id;
> >>>>>>>>>> +             erase_ppa->a.reserved = 0;
> >>>>>>>>>>         }
> >>>>>>>>>>
> >>>>>>>>>>         return 0;
> >>>>>>>>>> --
> >>>>>>>>>> 2.17.1
> >>>>>>>>>
> >>>>>>>>> I’m fine with adding this, but note that there is actually no
> >>>>>>>>> requirement for the erase to be chunk aligned - the only bits that
> >>>>>>>>> should be looked at are group, PU and chunk.
> >>>>>>>>>
> >>>>>>>>> Reviewed-by: Javier González <javier@javigon.com>
> >>>>

^ permalink raw reply	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2019-03-05  8:55 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-27 17:14 [PATCH 00/13] lightnvm: bugfixes and improvements Igor Konopko
2019-02-27 17:14 ` [PATCH 01/13] lightnvm: pblk: Line reference fix in GC Igor Konopko
2019-03-01 12:20   ` Hans Holmberg
2019-03-04  7:18   ` Javier González
2019-03-04 12:40   ` Matias Bjørling
2019-02-27 17:14 ` [PATCH 02/13] lightnvm: pblk: Gracefully handle GC data malloc fail Igor Konopko
2019-02-28 17:08   ` Javier González
2019-03-01 12:50     ` Hans Holmberg
2019-03-04 12:38       ` Igor Konopko
2019-02-27 17:14 ` [PATCH 03/13] lightnvm: pblk: Fix put line back behaviour Igor Konopko
2019-03-01 13:27   ` Hans Holmberg
2019-03-04  7:22   ` Javier González
2019-02-27 17:14 ` [PATCH 04/13] lightnvm: pblk: Rollback in gc read Igor Konopko
2019-03-04  7:38   ` Javier González
2019-03-04  8:44     ` Hans Holmberg
2019-03-04 12:39       ` Igor Konopko
2019-03-04 12:42         ` Hans Holmberg
2019-03-04 12:49   ` Matias Bjørling
2019-02-27 17:14 ` [PATCH 05/13] lightnvm: pblk: Count all read errors in stats Igor Konopko
2019-03-04  7:42   ` Javier González
2019-03-04  9:02     ` Hans Holmberg
2019-03-04  9:23       ` Javier González
2019-03-04 11:41         ` Hans Holmberg
2019-03-04 11:45           ` Javier González
2019-03-04 12:42             ` Igor Konopko
2019-03-04 12:48               ` Hans Holmberg
2019-02-27 17:14 ` [PATCH 06/13] lightnvm: pblk: Ensure that erase is chunk aligned Igor Konopko
2019-03-04  7:48   ` Javier González
2019-03-04  9:05     ` Hans Holmberg
2019-03-04  9:11       ` Javier González
2019-03-04 11:43         ` Hans Holmberg
2019-03-04 12:44           ` Igor Konopko
2019-03-04 12:57             ` Hans Holmberg
2019-03-04 13:00             ` Matias Bjørling
2019-03-05  8:20               ` Hans Holmberg
2019-03-05  8:26                 ` Igor Konopko
2019-03-05  8:40                   ` Hans Holmberg
     [not found]                     ` <61b7e62a-d229-95b1-2572-336ab1bd67cb@intel.com>
2019-03-05  8:55                       ` Hans Holmberg
2019-02-27 17:14 ` [PATCH 07/13] lightnvm: pblk: Cleanly fail when there is not enough memory Igor Konopko
2019-03-04  7:53   ` Javier González
2019-03-04  9:24     ` Hans Holmberg
2019-03-04 12:46       ` Igor Konopko
2019-02-27 17:14 ` [PATCH 08/13] lightnvm: pblk: Set proper read stutus in bio Igor Konopko
2019-03-04  8:03   ` Javier González
2019-03-04  9:35     ` Hans Holmberg
2019-03-04  9:48       ` Javier González
2019-03-04 12:14         ` Hans Holmberg
2019-03-04 12:51           ` Igor Konopko
2019-03-04 13:08             ` Matias Bjørling
2019-03-04 13:45               ` Javier González
2019-03-04 15:12                 ` Matias Bjørling
2019-03-05  6:43                   ` Javier González
2019-03-04 13:04         ` Matias Bjørling
2019-03-04 13:21           ` Javier González
2019-02-27 17:14 ` [PATCH 09/13] lightnvm: pblk: Kick writer for flush requests Igor Konopko
2019-03-04  8:08   ` Javier González
2019-03-04  9:39     ` Hans Holmberg
2019-03-04 12:52       ` Igor Konopko
2019-02-27 17:14 ` [PATCH 10/13] lightnvm: pblk: Reduce L2P DRAM footprint Igor Konopko
2019-03-04  8:17   ` Javier González
2019-03-04  9:29     ` Hans Holmberg
2019-03-04 13:11   ` Matias Bjørling
2019-02-27 17:14 ` [PATCH 11/13] lightnvm: pblk: Remove unused smeta_ssec field Igor Konopko
2019-03-04  8:21   ` Javier González
2019-03-04  9:40     ` Hans Holmberg
2019-02-27 17:14 ` [PATCH 12/13] lightnvm: pblk: close opened chunks Igor Konopko
2019-03-04  8:27   ` Javier González
2019-03-04 10:05     ` Hans Holmberg
2019-03-04 12:56       ` Igor Konopko
2019-03-04 13:03         ` Hans Holmberg
2019-03-04 13:19       ` Matias Bjørling
2019-03-04 13:48         ` Javier González
2019-03-04 13:18     ` Matias Bjørling
2019-03-04 13:47       ` Javier González
2019-02-27 17:14 ` [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device Igor Konopko
2019-03-04  9:05   ` Javier González
2019-03-04 11:30     ` Hans Holmberg
2019-03-04 11:44       ` Javier González
2019-03-04 12:22         ` Hans Holmberg
2019-03-04 13:04           ` Igor Konopko
2019-03-04 13:16             ` Hans Holmberg
2019-03-04 14:06             ` Javier González
2019-03-04 13:19           ` Javier González
2019-03-04 13:25             ` Matias Bjørling
2019-03-04 13:44               ` Javier González
2019-03-04 14:24                 ` Hans Holmberg
2019-03-04 14:27                   ` Javier González
2019-03-04 14:58                 ` Matias Bjørling
2019-02-28 16:36 ` [PATCH 00/13] lightnvm: bugfixes and improvements Matias Bjørling
2019-02-28 17:15   ` Javier González
2019-03-01 10:23   ` Hans Holmberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).