linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL 00/26] lightnvm updates for 5.2
@ 2019-05-04 18:37 Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 01/26] lightnvm: pblk: line reference fix in GC Matias Bjørling
                   ` (26 more replies)
  0 siblings, 27 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Matias Bjørling

Hi Jens,

Can you please pick up the following patches for the 5.2 window if
it is too late.

Igor and Marcin from Intel has been very active in this release,
fixing up a ton of race conditions, improving memory usage,
and making pblk more compatible with existing OCSSD devices.

Thank you!
Matias

Chansol Kim (1):
  lightnvm: pblk: fix bio leak when bio is split

Igor Konopko (23):
  lightnvm: pblk: line reference fix in GC
  lightnvm: pblk: rollback on error during gc read
  lightnvm: pblk: reduce L2P memory footprint
  lightnvm: pblk: remove unused smeta_ssec field
  lightnvm: pblk: gracefully handle GC vmalloc fail
  lightnvm: pblk: fix race during put line
  lightnvm: pblk: ensure that erase is chunk aligned
  lightnvm: pblk: cleanly fail when there is not enough memory
  lightnvm: pblk: set proper read status in bio
  lightnvm: Inherit mdts from the parent nvme device
  lightnvm: pblk: fix lock order in pblk_rb_tear_down_check
  lightnvm: pblk: kick writer on write recovery path
  lightnvm: pblk: fix update line wp in OOB recovery
  lightnvm: pblk: propagate errors when reading meta
  lightnvm: pblk: wait for inflight IOs in recovery
  lightnvm: pblk: remove internal IO timeout
  lightnvm: pblk: GC error handling
  lightnvm: pblk: IO path reorganization
  lightnvm: pblk: recover only written metadata
  lightnvm: track inflight target creations
  lightnvm: do not remove instance under global lock
  lightnvm: pblk: simplify partial read path
  lightnvm: pblk: use nvm_rq_to_ppa_list()

Marcin Dziegielewski (2):
  lightnvm: pblk: set propper line as data_line after gc
  lightnvm: prevent race condition on pblk remove

 drivers/lightnvm/core.c          |  82 ++++---
 drivers/lightnvm/pblk-cache.c    |   8 +-
 drivers/lightnvm/pblk-core.c     |  65 +++--
 drivers/lightnvm/pblk-gc.c       |  52 ++--
 drivers/lightnvm/pblk-init.c     |  65 ++---
 drivers/lightnvm/pblk-map.c      |   1 +
 drivers/lightnvm/pblk-rb.c       |  13 +-
 drivers/lightnvm/pblk-read.c     | 392 +++++++++----------------------
 drivers/lightnvm/pblk-recovery.c |  74 +++---
 drivers/lightnvm/pblk-write.c    |   1 +
 drivers/lightnvm/pblk.h          |  28 +--
 drivers/nvme/host/lightnvm.c     |   1 +
 include/linux/lightnvm.h         |   2 +
 13 files changed, 325 insertions(+), 459 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [GIT PULL 01/26] lightnvm: pblk: line reference fix in GC
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 02/26] lightnvm: pblk: rollback on error during gc read Matias Bjørling
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

Fixes the GC error case when moving a line back to closed state
while releasing additional references.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-gc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index 26a52ea7ec45..901e49951ab5 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -290,8 +290,11 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 fail_free_ws:
 	kfree(line_ws);
 
+	/* Line goes back to closed state, so we cannot release additional
+	 * reference for line, since we do that only when we want to do
+	 * gc to free line state transition.
+	 */
 	pblk_put_line_back(pblk, line);
-	kref_put(&line->ref, pblk_line_put);
 	atomic_dec(&gc->read_inflight_gc);
 
 	pblk_err(pblk, "failed to GC line %d\n", line->id);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 02/26] lightnvm: pblk: rollback on error during gc read
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 01/26] lightnvm: pblk: line reference fix in GC Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 03/26] lightnvm: pblk: reduce L2P memory footprint Matias Bjørling
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

A line is left unsigned to the blocks lists in case pblk_gc_line
returns an error.

This moves the line back to be appropriate list, which can then be
picked up by the garbage collector.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-gc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index 901e49951ab5..65692e6d76e6 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -358,8 +358,13 @@ static int pblk_gc_read(struct pblk *pblk)
 
 	pblk_gc_kick(pblk);
 
-	if (pblk_gc_line(pblk, line))
+	if (pblk_gc_line(pblk, line)) {
 		pblk_err(pblk, "failed to GC line %d\n", line->id);
+		/* rollback */
+		spin_lock(&gc->r_lock);
+		list_add_tail(&line->list, &gc->r_list);
+		spin_unlock(&gc->r_lock);
+	}
 
 	return 0;
 }
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 03/26] lightnvm: pblk: reduce L2P memory footprint
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 01/26] lightnvm: pblk: line reference fix in GC Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 02/26] lightnvm: pblk: rollback on error during gc read Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 04/26] lightnvm: pblk: remove unused smeta_ssec field Matias Bjørling
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

Currently L2P map size is calculated based on the total number of
available sectors, which is redundant, since it contains mapping for
overprovisioning as well (11% by default).

Change this size to the real capacity and thus reduce the memory
footprint significantly - with default op value it is approx.
110MB of DRAM less for every 1TB of media.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c     | 8 ++++----
 drivers/lightnvm/pblk-init.c     | 7 +++----
 drivers/lightnvm/pblk-read.c     | 2 +-
 drivers/lightnvm/pblk-recovery.c | 2 +-
 drivers/lightnvm/pblk.h          | 1 -
 5 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 6ca868868fee..fac32138291f 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -2023,7 +2023,7 @@ void pblk_update_map(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
 	struct ppa_addr ppa_l2p;
 
 	/* logic error: lba out-of-bounds. Ignore update */
-	if (!(lba < pblk->rl.nr_secs)) {
+	if (!(lba < pblk->capacity)) {
 		WARN(1, "pblk: corrupted L2P map request\n");
 		return;
 	}
@@ -2063,7 +2063,7 @@ int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa_new,
 #endif
 
 	/* logic error: lba out-of-bounds. Ignore update */
-	if (!(lba < pblk->rl.nr_secs)) {
+	if (!(lba < pblk->capacity)) {
 		WARN(1, "pblk: corrupted L2P map request\n");
 		return 0;
 	}
@@ -2109,7 +2109,7 @@ void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
 	}
 
 	/* logic error: lba out-of-bounds. Ignore update */
-	if (!(lba < pblk->rl.nr_secs)) {
+	if (!(lba < pblk->capacity)) {
 		WARN(1, "pblk: corrupted L2P map request\n");
 		return;
 	}
@@ -2167,7 +2167,7 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
 		lba = lba_list[i];
 		if (lba != ADDR_EMPTY) {
 			/* logic error: lba out-of-bounds. Ignore update */
-			if (!(lba < pblk->rl.nr_secs)) {
+			if (!(lba < pblk->capacity)) {
 				WARN(1, "pblk: corrupted L2P map request\n");
 				continue;
 			}
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 8b643d0bffae..81e8ed4d31ea 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -105,7 +105,7 @@ static size_t pblk_trans_map_size(struct pblk *pblk)
 	if (pblk->addrf_len < 32)
 		entry_size = 4;
 
-	return entry_size * pblk->rl.nr_secs;
+	return entry_size * pblk->capacity;
 }
 
 #ifdef CONFIG_NVM_PBLK_DEBUG
@@ -170,7 +170,7 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
 
 	pblk_ppa_set_empty(&ppa);
 
-	for (i = 0; i < pblk->rl.nr_secs; i++)
+	for (i = 0; i < pblk->capacity; i++)
 		pblk_trans_map_set(pblk, i, ppa);
 
 	ret = pblk_l2p_recover(pblk, factory_init);
@@ -701,7 +701,6 @@ static int pblk_set_provision(struct pblk *pblk, int nr_free_chks)
 	 * on user capacity consider only provisioned blocks
 	 */
 	pblk->rl.total_blocks = nr_free_chks;
-	pblk->rl.nr_secs = nr_free_chks * geo->clba;
 
 	/* Consider sectors used for metadata */
 	sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
@@ -1284,7 +1283,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
 
 	pblk_info(pblk, "luns:%u, lines:%d, secs:%llu, buf entries:%u\n",
 			geo->all_luns, pblk->l_mg.nr_lines,
-			(unsigned long long)pblk->rl.nr_secs,
+			(unsigned long long)pblk->capacity,
 			pblk->rwb.nr_entries);
 
 	wake_up_process(pblk->writer_ts);
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 0b7d5fb4548d..b8eb6bdb983b 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -568,7 +568,7 @@ static int read_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
 		goto out;
 
 	/* logic error: lba out-of-bounds */
-	if (lba >= pblk->rl.nr_secs) {
+	if (lba >= pblk->capacity) {
 		WARN(1, "pblk: read lba out of bounds\n");
 		goto out;
 	}
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index d86f580036d3..83b467b5edc7 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -474,7 +474,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 
 		lba_list[paddr++] = cpu_to_le64(lba);
 
-		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
+		if (lba == ADDR_EMPTY || lba >= pblk->capacity)
 			continue;
 
 		line->nr_valid_lbas++;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index ac3ab778e976..58da72dbef45 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -305,7 +305,6 @@ struct pblk_rl {
 
 	struct timer_list u_timer;
 
-	unsigned long long nr_secs;
 	unsigned long total_blocks;
 
 	atomic_t free_blocks;		/* Total number of free blocks (+ OP) */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 04/26] lightnvm: pblk: remove unused smeta_ssec field
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (2 preceding siblings ...)
  2019-05-04 18:37 ` [GIT PULL 03/26] lightnvm: pblk: reduce L2P memory footprint Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 05/26] lightnvm: pblk: gracefully handle GC vmalloc fail Matias Bjørling
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

smeta_ssec field in pblk_line is never used after it was replaced by
the function pblk_line_smeta_start().

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 1 -
 drivers/lightnvm/pblk.h      | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index fac32138291f..39280c1e9b5d 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1162,7 +1162,6 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
 	off = bit * geo->ws_opt;
 	bitmap_set(line->map_bitmap, off, lm->smeta_sec);
 	line->sec_in_line -= lm->smeta_sec;
-	line->smeta_ssec = off;
 	line->cur_sec = off + lm->smeta_sec;
 
 	if (init && pblk_line_smeta_write(pblk, line, off)) {
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 58da72dbef45..381f0746a9cf 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -464,7 +464,6 @@ struct pblk_line {
 	int meta_line;			/* Metadata line id */
 	int meta_distance;		/* Distance between data and metadata */
 
-	u64 smeta_ssec;			/* Sector where smeta starts */
 	u64 emeta_ssec;			/* Sector where emeta starts */
 
 	unsigned int sec_in_line;	/* Number of usable secs in line */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 05/26] lightnvm: pblk: gracefully handle GC vmalloc fail
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (3 preceding siblings ...)
  2019-05-04 18:37 ` [GIT PULL 04/26] lightnvm: pblk: remove unused smeta_ssec field Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 06/26] lightnvm: pblk: fix race during put line Matias Bjørling
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

Currently when we fail on rq data allocation in gc, it skips moving
active data and moves line straigt to its free state. Losing user
data in the process.

Move the data allocation to an earlier phase of GC, where we can still
fail gracefully by moving line back to the closed state.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-gc.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index 65692e6d76e6..ea9f392a395e 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -84,8 +84,6 @@ static void pblk_gc_line_ws(struct work_struct *work)
 	struct pblk_line_ws *gc_rq_ws = container_of(work,
 						struct pblk_line_ws, ws);
 	struct pblk *pblk = gc_rq_ws->pblk;
-	struct nvm_tgt_dev *dev = pblk->dev;
-	struct nvm_geo *geo = &dev->geo;
 	struct pblk_gc *gc = &pblk->gc;
 	struct pblk_line *line = gc_rq_ws->line;
 	struct pblk_gc_rq *gc_rq = gc_rq_ws->priv;
@@ -93,13 +91,6 @@ static void pblk_gc_line_ws(struct work_struct *work)
 
 	up(&gc->gc_sem);
 
-	gc_rq->data = vmalloc(array_size(gc_rq->nr_secs, geo->csecs));
-	if (!gc_rq->data) {
-		pblk_err(pblk, "could not GC line:%d (%d/%d)\n",
-					line->id, *line->vsc, gc_rq->nr_secs);
-		goto out;
-	}
-
 	/* Read from GC victim block */
 	ret = pblk_submit_read_gc(pblk, gc_rq);
 	if (ret) {
@@ -189,6 +180,8 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 	struct pblk_line *line = line_ws->line;
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct pblk_line_meta *lm = &pblk->lm;
+	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
 	struct pblk_gc *gc = &pblk->gc;
 	struct pblk_line_ws *gc_rq_ws;
 	struct pblk_gc_rq *gc_rq;
@@ -247,9 +240,13 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 	gc_rq->nr_secs = nr_secs;
 	gc_rq->line = line;
 
+	gc_rq->data = vmalloc(array_size(gc_rq->nr_secs, geo->csecs));
+	if (!gc_rq->data)
+		goto fail_free_gc_rq;
+
 	gc_rq_ws = kmalloc(sizeof(struct pblk_line_ws), GFP_KERNEL);
 	if (!gc_rq_ws)
-		goto fail_free_gc_rq;
+		goto fail_free_gc_data;
 
 	gc_rq_ws->pblk = pblk;
 	gc_rq_ws->line = line;
@@ -281,6 +278,8 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 
 	return;
 
+fail_free_gc_data:
+	vfree(gc_rq->data);
 fail_free_gc_rq:
 	kfree(gc_rq);
 fail_free_lba_list:
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 06/26] lightnvm: pblk: fix race during put line
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (4 preceding siblings ...)
  2019-05-04 18:37 ` [GIT PULL 05/26] lightnvm: pblk: gracefully handle GC vmalloc fail Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 07/26] lightnvm: pblk: ensure that erase is chunk aligned Matias Bjørling
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

In the pblk_put_line_back function, a race condition with
__pblk_map_invalidate can make a line not part of any lists.

Fix gc_list by resetting it to null fixes the above issue.

Fixes: a4bd217 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-gc.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index ea9f392a395e..e23b1923b773 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -64,19 +64,23 @@ static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct list_head *move_list;
 
+	spin_lock(&l_mg->gc_lock);
 	spin_lock(&line->lock);
 	WARN_ON(line->state != PBLK_LINESTATE_GC);
 	line->state = PBLK_LINESTATE_CLOSED;
 	trace_pblk_line_state(pblk_disk_name(pblk), line->id,
 					line->state);
+
+	/* We need to reset gc_group in order to ensure that
+	 * pblk_line_gc_list will return proper move_list
+	 * since right now current line is not on any of the
+	 * gc lists.
+	 */
+	line->gc_group = PBLK_LINEGC_NONE;
 	move_list = pblk_line_gc_list(pblk, line);
 	spin_unlock(&line->lock);
-
-	if (move_list) {
-		spin_lock(&l_mg->gc_lock);
-		list_add_tail(&line->list, move_list);
-		spin_unlock(&l_mg->gc_lock);
-	}
+	list_add_tail(&line->list, move_list);
+	spin_unlock(&l_mg->gc_lock);
 }
 
 static void pblk_gc_line_ws(struct work_struct *work)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 07/26] lightnvm: pblk: ensure that erase is chunk aligned
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (5 preceding siblings ...)
  2019-05-04 18:37 ` [GIT PULL 06/26] lightnvm: pblk: fix race during put line Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 08/26] lightnvm: pblk: cleanly fail when there is not enough memory Matias Bjørling
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

The sector bits in the erase command may be uninitialized are
uninitialized, causing the erase LBA to be unaligned to the chunk size.

This is unexpected situation, since erase shall always be chunk
aligned based on OCSSD the 2.0 specification.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-map.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 7fbc99b60cac..5408e32b2f13 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -162,6 +162,7 @@ int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
 
 			*erase_ppa = ppa_list[i];
 			erase_ppa->a.blk = e_line->id;
+			erase_ppa->a.reserved = 0;
 
 			spin_unlock(&e_line->lock);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 08/26] lightnvm: pblk: cleanly fail when there is not enough memory
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (6 preceding siblings ...)
  2019-05-04 18:37 ` [GIT PULL 07/26] lightnvm: pblk: ensure that erase is chunk aligned Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 09/26] lightnvm: pblk: set proper read status in bio Matias Bjørling
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

L2P table can be huge in many cases, since it typically requires 1GB
of DRAM for 1TB of drive. When there is not enough memory available,
OOM killer turns on and kills random processes, which can be very
annoying for users.

This patch changes the flag for L2P table allocation on order to handle
this situation in more user friendly way.

GFP_KERNEL and __GPF_HIGHMEM are default flags used in parameterless
vmalloc() calls, so they are also keeped in that patch. Additionally
__GFP_NOWARN flag is added in order to hide very long dmesg warn in
case of the allocation failures. The most important flag introduced
in that patch is __GFP_RETRY_MAYFAIL, which would cause allocator
to try use free memory and if not available to drop caches, but not
to run OOM killer.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 81e8ed4d31ea..e0df3de1ce83 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -164,9 +164,14 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
 	int ret = 0;
 
 	map_size = pblk_trans_map_size(pblk);
-	pblk->trans_map = vmalloc(map_size);
-	if (!pblk->trans_map)
+	pblk->trans_map = __vmalloc(map_size, GFP_KERNEL | __GFP_NOWARN
+					| __GFP_RETRY_MAYFAIL | __GFP_HIGHMEM,
+					PAGE_KERNEL);
+	if (!pblk->trans_map) {
+		pblk_err(pblk, "failed to allocate L2P (need %zu of memory)\n",
+				map_size);
 		return -ENOMEM;
+	}
 
 	pblk_ppa_set_empty(&ppa);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 09/26] lightnvm: pblk: set proper read status in bio
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (7 preceding siblings ...)
  2019-05-04 18:37 ` [GIT PULL 08/26] lightnvm: pblk: cleanly fail when there is not enough memory Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 10/26] lightnvm: Inherit mdts from the parent nvme device Matias Bjørling
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

Currently in case of read errors, bi_status is not set properly which
leads to returning inproper data to layers above. This patch fix that
by setting proper status in case of read errors.

Also remove unnecessary warn_once(), which does not make sense
in that place, since user bio is not used for interation with drive
and thus bi_status will not be set here.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-read.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index b8eb6bdb983b..7b7a04a80d67 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -175,11 +175,10 @@ static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
 	WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
 }
 
-static void pblk_end_user_read(struct bio *bio)
+static void pblk_end_user_read(struct bio *bio, int error)
 {
-#ifdef CONFIG_NVM_PBLK_DEBUG
-	WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
-#endif
+	if (error && error != NVM_RSP_WARN_HIGHECC)
+		bio_io_error(bio);
 	bio_endio(bio);
 }
 
@@ -219,7 +218,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
 	struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
 	struct bio *bio = (struct bio *)r_ctx->private;
 
-	pblk_end_user_read(bio);
+	pblk_end_user_read(bio, rqd->error);
 	__pblk_end_io_read(pblk, rqd, true);
 }
 
@@ -298,7 +297,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
 	rqd->bio = NULL;
 	rqd->nr_ppas = nr_secs;
 
-	bio_endio(bio);
+	pblk_end_user_read(bio, rqd->error);
 	__pblk_end_io_read(pblk, rqd, false);
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 10/26] lightnvm: Inherit mdts from the parent nvme device
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (8 preceding siblings ...)
  2019-05-04 18:37 ` [GIT PULL 09/26] lightnvm: pblk: set proper read status in bio Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 11/26] lightnvm: pblk: fix bio leak when bio is split Matias Bjørling
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

Current lightnvm and pblk implementation does not care about NVMe max
data transfer size, which can be smaller than 64*K=256K. There are
existing NVMe controllers which NVMe max data transfer size is lower
that 256K (for example 128K, which happens for existing NVMe
controllers which are NVMe spec compliant). Such a controllers are not
able to handle command which contains 64 PPAs, since the the size of
DMAed buffer will be above the capabilities of such a controller.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/core.c      | 9 +++++++--
 drivers/nvme/host/lightnvm.c | 1 +
 include/linux/lightnvm.h     | 1 +
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 5f82036fe322..c01f83b8fbaf 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
 	struct nvm_target *t;
 	struct nvm_tgt_dev *tgt_dev;
 	void *targetdata;
+	unsigned int mdts;
 	int ret;
 
 	switch (create->conf.type) {
@@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
 	tdisk->private_data = targetdata;
 	tqueue->queuedata = targetdata;
 
-	blk_queue_max_hw_sectors(tqueue,
-			(dev->geo.csecs >> 9) * NVM_MAX_VLBA);
+	mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
+	if (dev->geo.mdts) {
+		mdts = min_t(u32, dev->geo.mdts,
+				(dev->geo.csecs >> 9) * NVM_MAX_VLBA);
+	}
+	blk_queue_max_hw_sectors(tqueue, mdts);
 
 	set_capacity(tdisk, tt->capacity(targetdata));
 	add_disk(tdisk);
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 949e29e1d782..4f20a10b39d3 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -977,6 +977,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
 	geo->csecs = 1 << ns->lba_shift;
 	geo->sos = ns->ms;
 	geo->ext = ns->ext;
+	geo->mdts = ns->ctrl->max_hw_sectors;
 
 	dev->q = q;
 	memcpy(dev->name, disk_name, DISK_NAME_LEN);
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index 5d865a5d5cdc..d3b02708e5f0 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -358,6 +358,7 @@ struct nvm_geo {
 	u16	csecs;		/* sector size */
 	u16	sos;		/* out-of-band area size */
 	bool	ext;		/* metadata in extended data buffer */
+	u32	mdts;		/* Max data transfer size*/
 
 	/* device write constrains */
 	u32	ws_min;		/* minimum write size */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 11/26] lightnvm: pblk: fix bio leak when bio is split
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (9 preceding siblings ...)
  2019-05-04 18:37 ` [GIT PULL 10/26] lightnvm: Inherit mdts from the parent nvme device Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 12/26] lightnvm: pblk: set propper line as data_line after gc Matias Bjørling
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Chansol Kim, Matias Bjørling

From: Chansol Kim <chansol.kim@samsung.com>

For large size io where blk_queue_split needs to be called inside
pblk_rw_io, results in bio leak as bio_endio is not called on the
newly allocated. One way to observe this is to mounting ext4
filesystem on the target and issuing 1MB io with dd, e.g., dd bs=1MB
if=/dev/null of=/mount/myvolume. kmemleak reports:

unreferenced object 0xffff88803d7d0100 (size 256):
  comm "kworker/u16:1", pid 68, jiffies 4294899333 (age 284.120s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 60 e8 31 81 88 ff ff  .........`.1....
    01 40 00 00 06 06 00 00 00 00 00 00 05 00 00 00  .@..............
  backtrace:
    [<000000001f5aa04f>] kmem_cache_alloc+0x204/0x3c0
    [<0000000040945aab>] mempool_alloc_slab+0x1d/0x30
    [<00000000b4959ab4>] mempool_alloc+0x83/0x220
    [<00000000646bad9b>] bio_alloc_bioset+0x229/0x320
    [<000000009264b251>] bio_clone_fast+0x26/0xc0
    [<0000000008250252>] bio_split+0x41/0x110
    [<00000000e365cad0>] blk_queue_split+0x349/0x930
    [<00000000eb5426bc>] pblk_make_rq+0x1b5/0x1f0
    [<00000000eea09cec>] generic_make_request+0x2f9/0x690
    [<00000000ae6acede>] submit_bio+0x12e/0x1f0
    [<00000000f9b8b82a>] ext4_io_submit+0x64/0x80
    [<000000009e4f817d>] ext4_bio_write_page+0x32e/0x890
    [<00000000cbd0d106>] mpage_submit_page+0x65/0xc0
    [<000000000eec7359>] mpage_map_and_submit_buffers+0x171/0x330
    [<000000009a7afcb6>] ext4_writepages+0xd5e/0x1650
    [<000000004476b096>] do_writepages+0x39/0xc0

In case there is a need for a split, blk_queue_split returns the newly
allocated bio to the caller by changing the value of pointer passed as
a reference, while the original is passed to generic_make_requests.

Although pblk_rw_io's local variable bio* has changed and passed to
pblk_submit_read and pblk_write_to_cache, work is done on this new
bio*, and pblk_rw_io returns NVM_IO_DONE, pblk_make_rq calls bio_endio
on the old bio* because it passed bio pointer by value to pblk_rw_io.

pblk_rw_io is unfolded into pblk_make_rq so that there is no copying
of bio* and bio_endio is called on the correct bio*.

Signed-off-by: Chansol Kim <chansol.kim@samsung.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c | 47 +++++++++++++++---------------------
 1 file changed, 19 insertions(+), 28 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index e0df3de1ce83..1e227a08e54a 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -47,36 +47,10 @@ static struct pblk_global_caches pblk_caches = {
 
 struct bio_set pblk_bio_set;
 
-static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
-			  struct bio *bio)
-{
-	int ret;
-
-	/* Read requests must be <= 256kb due to NVMe's 64 bit completion bitmap
-	 * constraint. Writes can be of arbitrary size.
-	 */
-	if (bio_data_dir(bio) == READ) {
-		blk_queue_split(q, &bio);
-		ret = pblk_submit_read(pblk, bio);
-		if (ret == NVM_IO_DONE && bio_flagged(bio, BIO_CLONED))
-			bio_put(bio);
-
-		return ret;
-	}
-
-	/* Prevent deadlock in the case of a modest LUN configuration and large
-	 * user I/Os. Unless stalled, the rate limiter leaves at least 256KB
-	 * available for user I/O.
-	 */
-	if (pblk_get_secs(bio) > pblk_rl_max_io(&pblk->rl))
-		blk_queue_split(q, &bio);
-
-	return pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER);
-}
-
 static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio)
 {
 	struct pblk *pblk = q->queuedata;
+	int ret;
 
 	if (bio_op(bio) == REQ_OP_DISCARD) {
 		pblk_discard(pblk, bio);
@@ -86,7 +60,24 @@ static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio)
 		}
 	}
 
-	switch (pblk_rw_io(q, pblk, bio)) {
+	/* Read requests must be <= 256kb due to NVMe's 64 bit completion bitmap
+	 * constraint. Writes can be of arbitrary size.
+	 */
+	if (bio_data_dir(bio) == READ) {
+		blk_queue_split(q, &bio);
+		ret = pblk_submit_read(pblk, bio);
+	} else {
+		/* Prevent deadlock in the case of a modest LUN configuration
+		 * and large user I/Os. Unless stalled, the rate limiter
+		 * leaves at least 256KB available for user I/O.
+		 */
+		if (pblk_get_secs(bio) > pblk_rl_max_io(&pblk->rl))
+			blk_queue_split(q, &bio);
+
+		ret = pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER);
+	}
+
+	switch (ret) {
 	case NVM_IO_ERR:
 		bio_io_error(bio);
 		break;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 12/26] lightnvm: pblk: set propper line as data_line after gc
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (10 preceding siblings ...)
  2019-05-04 18:37 ` [GIT PULL 11/26] lightnvm: pblk: fix bio leak when bio is split Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 13/26] lightnvm: prevent race condition on pblk remove Matias Bjørling
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Marcin Dziegielewski, Matias Bjørling

From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>

In current implementation of l2p recovery, when we are after gc and we
have open line, we are not setting current data line properly (we set
last line from the device instead of last line ordered by seq_nr) and
in consequence, kernel panic and data corruption.

Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-recovery.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 83b467b5edc7..017874e03253 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -844,6 +844,7 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
 		spin_unlock(&l_mg->free_lock);
 	} else {
 		spin_lock(&l_mg->free_lock);
+		l_mg->data_line = data_line;
 		/* Allocate next line for preparation */
 		l_mg->data_next = pblk_line_get(pblk);
 		if (l_mg->data_next) {
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 13/26] lightnvm: prevent race condition on pblk remove
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (11 preceding siblings ...)
  2019-05-04 18:37 ` [GIT PULL 12/26] lightnvm: pblk: set propper line as data_line after gc Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:37 ` [GIT PULL 14/26] lightnvm: pblk: fix lock order in pblk_rb_tear_down_check Matias Bjørling
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Marcin Dziegielewski, Matias Bjørling

From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>

When we trigger nvm target remove during device hot unplug, there is
a probability to hit a general protection fault. This is caused by use
of nvm_dev thay may be freed from another (hot unplug) thread
(in the nvm_unregister function).

Introduce lock in nvme_ioctl_dev_remove function to prevent this
situation.

Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index c01f83b8fbaf..e2abe88a139c 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -1340,11 +1340,13 @@ static long nvm_ioctl_dev_remove(struct file *file, void __user *arg)
 		return -EINVAL;
 	}
 
+	down_read(&nvm_lock);
 	list_for_each_entry(dev, &nvm_devices, devices) {
 		ret = nvm_remove_tgt(dev, &remove);
 		if (!ret)
 			break;
 	}
+	up_read(&nvm_lock);
 
 	return ret;
 }
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 14/26] lightnvm: pblk: fix lock order in pblk_rb_tear_down_check
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (12 preceding siblings ...)
  2019-05-04 18:37 ` [GIT PULL 13/26] lightnvm: prevent race condition on pblk remove Matias Bjørling
@ 2019-05-04 18:37 ` Matias Bjørling
  2019-05-04 18:38 ` [GIT PULL 15/26] lightnvm: pblk: kick writer on write recovery path Matias Bjørling
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:37 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

In pblk_rb_tear_down_check() the spinlock functions are not
called in proper order.

Fixes: a4bd217 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-rb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 03c241b340ea..35550148b5e8 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -799,8 +799,8 @@ int pblk_rb_tear_down_check(struct pblk_rb *rb)
 	}
 
 out:
-	spin_unlock(&rb->w_lock);
 	spin_unlock_irq(&rb->s_lock);
+	spin_unlock(&rb->w_lock);
 
 	return ret;
 }
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 15/26] lightnvm: pblk: kick writer on write recovery path
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (13 preceding siblings ...)
  2019-05-04 18:37 ` [GIT PULL 14/26] lightnvm: pblk: fix lock order in pblk_rb_tear_down_check Matias Bjørling
@ 2019-05-04 18:38 ` Matias Bjørling
  2019-05-04 18:38 ` [GIT PULL 16/26] lightnvm: pblk: fix update line wp in OOB recovery Matias Bjørling
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:38 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

In case of write recovery path, there is a chance that writer thread
is not active, kick immediately instead of waiting for timer.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-write.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index 6593deab52da..4e63f9b5954c 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -228,6 +228,7 @@ static void pblk_submit_rec(struct work_struct *work)
 	mempool_free(recovery, &pblk->rec_pool);
 
 	atomic_dec(&pblk->inflight_io);
+	pblk_write_kick(pblk);
 }
 
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 16/26] lightnvm: pblk: fix update line wp in OOB recovery
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (14 preceding siblings ...)
  2019-05-04 18:38 ` [GIT PULL 15/26] lightnvm: pblk: kick writer on write recovery path Matias Bjørling
@ 2019-05-04 18:38 ` Matias Bjørling
  2019-05-04 18:38 ` [GIT PULL 17/26] lightnvm: pblk: propagate errors when reading meta Matias Bjørling
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:38 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

In case of OOB recovery, we can hit the scenario when all the data in
line were written and some part of emeta was written too. In such
a case pblk_update_line_wp() function will call pblk_alloc_page()
function which will case to set left_msecs to value below zero
(since this field does not track emeta region) and thus will lead to
multiple kernel warnings. This patch fixes that issue.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-recovery.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 017874e03253..357e52980f2f 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -93,10 +93,24 @@ static int pblk_recov_l2p_from_emeta(struct pblk *pblk, struct pblk_line *line)
 static void pblk_update_line_wp(struct pblk *pblk, struct pblk_line *line,
 				u64 written_secs)
 {
+	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	int i;
 
 	for (i = 0; i < written_secs; i += pblk->min_write_pgs)
-		pblk_alloc_page(pblk, line, pblk->min_write_pgs);
+		__pblk_alloc_page(pblk, line, pblk->min_write_pgs);
+
+	spin_lock(&l_mg->free_lock);
+	if (written_secs > line->left_msecs) {
+		/*
+		 * We have all data sectors written
+		 * and some emeta sectors written too.
+		 */
+		line->left_msecs = 0;
+	} else {
+		/* We have only some data sectors written. */
+		line->left_msecs -= written_secs;
+	}
+	spin_unlock(&l_mg->free_lock);
 }
 
 static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line *line)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 17/26] lightnvm: pblk: propagate errors when reading meta
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (15 preceding siblings ...)
  2019-05-04 18:38 ` [GIT PULL 16/26] lightnvm: pblk: fix update line wp in OOB recovery Matias Bjørling
@ 2019-05-04 18:38 ` Matias Bjørling
  2019-05-04 18:38 ` [GIT PULL 18/26] lightnvm: pblk: wait for inflight IOs in recovery Matias Bjørling
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:38 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

Read errors are not correctly propagated. Errors are cleared before
returning control to the io submitter. Change the behaviour such that
all read errors exept high ecc read warning status is returned
appropriately.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c     | 9 +++++++--
 drivers/lightnvm/pblk-recovery.c | 2 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 39280c1e9b5d..38e26fe23138 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -761,8 +761,10 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
 
 	atomic_dec(&pblk->inflight_io);
 
-	if (rqd.error)
+	if (rqd.error && rqd.error != NVM_RSP_WARN_HIGHECC) {
 		pblk_log_read_err(pblk, &rqd);
+		ret = -EIO;
+	}
 
 clear_rqd:
 	pblk_free_rqd_meta(pblk, &rqd);
@@ -916,8 +918,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 
 	atomic_dec(&pblk->inflight_io);
 
-	if (rqd.error)
+	if (rqd.error && rqd.error != NVM_RSP_WARN_HIGHECC) {
 		pblk_log_read_err(pblk, &rqd);
+		ret = -EIO;
+		goto free_rqd_dma;
+	}
 
 	emeta_buf += rq_len;
 	left_ppas -= rq_ppas;
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 357e52980f2f..124d8179b2ad 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -458,7 +458,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 	atomic_dec(&pblk->inflight_io);
 
 	/* If a read fails, do a best effort by padding the line and retrying */
-	if (rqd->error) {
+	if (rqd->error && rqd->error != NVM_RSP_WARN_HIGHECC) {
 		int pad_distance, ret;
 
 		if (padded) {
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 18/26] lightnvm: pblk: wait for inflight IOs in recovery
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (16 preceding siblings ...)
  2019-05-04 18:38 ` [GIT PULL 17/26] lightnvm: pblk: propagate errors when reading meta Matias Bjørling
@ 2019-05-04 18:38 ` Matias Bjørling
  2019-05-04 18:38 ` [GIT PULL 19/26] lightnvm: pblk: remove internal IO timeout Matias Bjørling
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:38 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

This patch changes the behaviour of recovery padding in order to
support a case, when some IOs were already submitted to the drive and
some next one are not submitted due to error returned.

Currently in case of errors we simply exit the pad function without
waiting for inflight IOs, which leads to panic on inflight IOs
completion.

After the changes we always wait for all the inflight IOs before
exiting the function.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-recovery.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 124d8179b2ad..137e963cd51d 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -208,7 +208,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
 	if (rq_ppas < pblk->min_write_pgs) {
 		pblk_err(pblk, "corrupted pad line %d\n", line->id);
-		goto fail_free_pad;
+		goto fail_complete;
 	}
 
 	rq_len = rq_ppas * geo->csecs;
@@ -217,7 +217,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 						PBLK_VMALLOC_META, GFP_KERNEL);
 	if (IS_ERR(bio)) {
 		ret = PTR_ERR(bio);
-		goto fail_free_pad;
+		goto fail_complete;
 	}
 
 	bio->bi_iter.bi_sector = 0; /* internal bio */
@@ -226,8 +226,11 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	rqd = pblk_alloc_rqd(pblk, PBLK_WRITE_INT);
 
 	ret = pblk_alloc_rqd_meta(pblk, rqd);
-	if (ret)
-		goto fail_free_rqd;
+	if (ret) {
+		pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
+		bio_put(bio);
+		goto fail_complete;
+	}
 
 	rqd->bio = bio;
 	rqd->opcode = NVM_OP_PWRITE;
@@ -274,7 +277,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	if (ret) {
 		pblk_err(pblk, "I/O submission failed: %d\n", ret);
 		pblk_up_chunk(pblk, rqd->ppa_list[0]);
-		goto fail_free_rqd;
+		kref_put(&pad_rq->ref, pblk_recov_complete);
+		pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
+		bio_put(bio);
+		goto fail_complete;
 	}
 
 	left_line_ppas -= rq_ppas;
@@ -282,6 +288,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	if (left_ppas && left_line_ppas)
 		goto next_pad_rq;
 
+fail_complete:
 	kref_put(&pad_rq->ref, pblk_recov_complete);
 
 	if (!wait_for_completion_io_timeout(&pad_rq->wait,
@@ -297,14 +304,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 free_rq:
 	kfree(pad_rq);
 	return ret;
-
-fail_free_rqd:
-	pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
-	bio_put(bio);
-fail_free_pad:
-	kfree(pad_rq);
-	vfree(data);
-	return ret;
 }
 
 static int pblk_pad_distance(struct pblk *pblk, struct pblk_line *line)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 19/26] lightnvm: pblk: remove internal IO timeout
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (17 preceding siblings ...)
  2019-05-04 18:38 ` [GIT PULL 18/26] lightnvm: pblk: wait for inflight IOs in recovery Matias Bjørling
@ 2019-05-04 18:38 ` Matias Bjørling
  2019-05-04 18:38 ` [GIT PULL 20/26] lightnvm: pblk: GC error handling Matias Bjørling
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:38 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

Currently during pblk padding, there is internal IO timeout introduced,
which is smaller than default NVMe timeout. This can lead to various
use-after-free issues. Since in case of any IO timeouts NVMe and block
layer will handle timeout by themselves and report it back to use,
there is no need to keep this internal timeout in pblk.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-recovery.c | 7 +------
 drivers/lightnvm/pblk.h          | 2 --
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 137e963cd51d..865fe310cab4 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -290,12 +290,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 
 fail_complete:
 	kref_put(&pad_rq->ref, pblk_recov_complete);
-
-	if (!wait_for_completion_io_timeout(&pad_rq->wait,
-				msecs_to_jiffies(PBLK_COMMAND_TIMEOUT_MS))) {
-		pblk_err(pblk, "pad write timed out\n");
-		ret = -ETIME;
-	}
+	wait_for_completion(&pad_rq->wait);
 
 	if (!pblk_line_is_full(line))
 		pblk_err(pblk, "corrupted padded line: %d\n", line->id);
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 381f0746a9cf..90c703d3f84c 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -43,8 +43,6 @@
 
 #define PBLK_CACHE_NAME_LEN (DISK_NAME_LEN + 16)
 
-#define PBLK_COMMAND_TIMEOUT_MS 30000
-
 /* Max 512 LUNs per device */
 #define PBLK_MAX_LUNS_BITMAP (4)
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 20/26] lightnvm: pblk: GC error handling
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (18 preceding siblings ...)
  2019-05-04 18:38 ` [GIT PULL 19/26] lightnvm: pblk: remove internal IO timeout Matias Bjørling
@ 2019-05-04 18:38 ` Matias Bjørling
  2019-05-04 18:38 ` [GIT PULL 21/26] lightnvm: pblk: IO path reorganization Matias Bjørling
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:38 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

Currently when there is an IO error (or similar) on GC read path, pblk
still move the line, which was currently under GC process to free state.
Such a behaviour can lead to silent data mismatch issue.

With this patch, the line which was under GC process on which some IO
errors occurred, will be putted back to closed state (instead of free
state as it was without this patch) and the L2P mapping for such a
failed sectors will not be updated.

Then in case of any user IOs to such a failed sectors, pblk would be
able to return at least real IO error instead of stale data as it is
right now.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 8 ++++++++
 drivers/lightnvm/pblk-gc.c   | 5 ++---
 drivers/lightnvm/pblk-read.c | 1 -
 drivers/lightnvm/pblk.h      | 2 ++
 4 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 38e26fe23138..73be3a0311ff 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1703,6 +1703,14 @@ static void __pblk_line_put(struct pblk *pblk, struct pblk_line *line)
 
 	spin_lock(&line->lock);
 	WARN_ON(line->state != PBLK_LINESTATE_GC);
+	if (line->w_err_gc->has_gc_err) {
+		spin_unlock(&line->lock);
+		pblk_err(pblk, "line %d had errors during GC\n", line->id);
+		pblk_put_line_back(pblk, line);
+		line->w_err_gc->has_gc_err = 0;
+		return;
+	}
+
 	line->state = PBLK_LINESTATE_FREE;
 	trace_pblk_line_state(pblk_disk_name(pblk), line->id,
 					line->state);
diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index e23b1923b773..63ee205b41c4 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -59,7 +59,7 @@ static void pblk_gc_writer_kick(struct pblk_gc *gc)
 	wake_up_process(gc->gc_writer_ts);
 }
 
-static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
+void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
 {
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct list_head *move_list;
@@ -98,8 +98,7 @@ static void pblk_gc_line_ws(struct work_struct *work)
 	/* Read from GC victim block */
 	ret = pblk_submit_read_gc(pblk, gc_rq);
 	if (ret) {
-		pblk_err(pblk, "failed GC read in line:%d (err:%d)\n",
-								line->id, ret);
+		line->w_err_gc->has_gc_err = 1;
 		goto out;
 	}
 
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 7b7a04a80d67..27f8a76d8bd8 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -641,7 +641,6 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 
 	if (pblk_submit_io_sync(pblk, &rqd)) {
 		ret = -EIO;
-		pblk_err(pblk, "GC read request failed\n");
 		goto err_free_bio;
 	}
 
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 90c703d3f84c..e304754aaa3c 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -437,6 +437,7 @@ struct pblk_smeta {
 
 struct pblk_w_err_gc {
 	int has_write_err;
+	int has_gc_err;
 	__le64 *lba_list;
 };
 
@@ -917,6 +918,7 @@ void pblk_gc_free_full_lines(struct pblk *pblk);
 void pblk_gc_sysfs_state_show(struct pblk *pblk, int *gc_enabled,
 			      int *gc_active);
 int pblk_gc_sysfs_force(struct pblk *pblk, int force);
+void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line);
 
 /*
  * pblk rate limiter
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 21/26] lightnvm: pblk: IO path reorganization
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (19 preceding siblings ...)
  2019-05-04 18:38 ` [GIT PULL 20/26] lightnvm: pblk: GC error handling Matias Bjørling
@ 2019-05-04 18:38 ` Matias Bjørling
  2019-05-04 18:38 ` [GIT PULL 22/26] lightnvm: pblk: recover only written metadata Matias Bjørling
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:38 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

This patch is made in order to prepare read path for new approach to
partial read handling, which is simpler in compare with previous one.

The most important change is to move the handling of completed and
failed bio from the pblk_make_rq() to particular read and write
functions. This is needed, since after partial read path changes,
sometimes completed/failed bio will be different from original one, so
we cannot do this any longer in pblk_make_rq().

Other changes are small read path refactor in order to reduce the size
of the following patch with partial read changes.

Generally the goal of this patch is not to change the functionality,
but just to prepare the code for the following changes.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-cache.c |  8 +++-
 drivers/lightnvm/pblk-init.c  | 14 +-----
 drivers/lightnvm/pblk-read.c  | 83 ++++++++++++++++-------------------
 drivers/lightnvm/pblk.h       |  4 +-
 4 files changed, 48 insertions(+), 61 deletions(-)

diff --git a/drivers/lightnvm/pblk-cache.c b/drivers/lightnvm/pblk-cache.c
index c9fa26f95659..5c1034c22197 100644
--- a/drivers/lightnvm/pblk-cache.c
+++ b/drivers/lightnvm/pblk-cache.c
@@ -18,7 +18,8 @@
 
 #include "pblk.h"
 
-int pblk_write_to_cache(struct pblk *pblk, struct bio *bio, unsigned long flags)
+void pblk_write_to_cache(struct pblk *pblk, struct bio *bio,
+				unsigned long flags)
 {
 	struct request_queue *q = pblk->dev->q;
 	struct pblk_w_ctx w_ctx;
@@ -43,6 +44,7 @@ int pblk_write_to_cache(struct pblk *pblk, struct bio *bio, unsigned long flags)
 		goto retry;
 	case NVM_IO_ERR:
 		pblk_pipeline_stop(pblk);
+		bio_io_error(bio);
 		goto out;
 	}
 
@@ -79,7 +81,9 @@ int pblk_write_to_cache(struct pblk *pblk, struct bio *bio, unsigned long flags)
 out:
 	generic_end_io_acct(q, REQ_OP_WRITE, &pblk->disk->part0, start_time);
 	pblk_write_should_kick(pblk);
-	return ret;
+
+	if (ret == NVM_IO_DONE)
+		bio_endio(bio);
 }
 
 /*
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 1e227a08e54a..b351c7f002de 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -50,7 +50,6 @@ struct bio_set pblk_bio_set;
 static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio)
 {
 	struct pblk *pblk = q->queuedata;
-	int ret;
 
 	if (bio_op(bio) == REQ_OP_DISCARD) {
 		pblk_discard(pblk, bio);
@@ -65,7 +64,7 @@ static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio)
 	 */
 	if (bio_data_dir(bio) == READ) {
 		blk_queue_split(q, &bio);
-		ret = pblk_submit_read(pblk, bio);
+		pblk_submit_read(pblk, bio);
 	} else {
 		/* Prevent deadlock in the case of a modest LUN configuration
 		 * and large user I/Os. Unless stalled, the rate limiter
@@ -74,16 +73,7 @@ static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio)
 		if (pblk_get_secs(bio) > pblk_rl_max_io(&pblk->rl))
 			blk_queue_split(q, &bio);
 
-		ret = pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER);
-	}
-
-	switch (ret) {
-	case NVM_IO_ERR:
-		bio_io_error(bio);
-		break;
-	case NVM_IO_DONE:
-		bio_endio(bio);
-		break;
+		pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER);
 	}
 
 	return BLK_QC_T_NONE;
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 27f8a76d8bd8..f5f155d540e2 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -179,7 +179,8 @@ static void pblk_end_user_read(struct bio *bio, int error)
 {
 	if (error && error != NVM_RSP_WARN_HIGHECC)
 		bio_io_error(bio);
-	bio_endio(bio);
+	else
+		bio_endio(bio);
 }
 
 static void __pblk_end_io_read(struct pblk *pblk, struct nvm_rq *rqd,
@@ -389,7 +390,6 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 
 	/* Free allocated pages in new bio */
 	pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt);
-	__pblk_end_io_read(pblk, rqd, false);
 	return NVM_IO_ERR;
 }
 
@@ -434,7 +434,7 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
 	}
 }
 
-int pblk_submit_read(struct pblk *pblk, struct bio *bio)
+void pblk_submit_read(struct pblk *pblk, struct bio *bio)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct request_queue *q = dev->q;
@@ -442,9 +442,9 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 	unsigned int nr_secs = pblk_get_secs(bio);
 	struct pblk_g_ctx *r_ctx;
 	struct nvm_rq *rqd;
+	struct bio *int_bio;
 	unsigned int bio_init_idx;
 	DECLARE_BITMAP(read_bitmap, NVM_MAX_VLBA);
-	int ret = NVM_IO_ERR;
 
 	generic_start_io_acct(q, REQ_OP_READ, bio_sectors(bio),
 			      &pblk->disk->part0);
@@ -455,74 +455,67 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 
 	rqd->opcode = NVM_OP_PREAD;
 	rqd->nr_ppas = nr_secs;
-	rqd->bio = NULL; /* cloned bio if needed */
 	rqd->private = pblk;
 	rqd->end_io = pblk_end_io_read;
 
 	r_ctx = nvm_rq_to_pdu(rqd);
 	r_ctx->start_time = jiffies;
 	r_ctx->lba = blba;
-	r_ctx->private = bio; /* original bio */
 
 	/* Save the index for this bio's start. This is needed in case
 	 * we need to fill a partial read.
 	 */
 	bio_init_idx = pblk_get_bi_idx(bio);
 
-	if (pblk_alloc_rqd_meta(pblk, rqd))
-		goto fail_rqd_free;
+	if (pblk_alloc_rqd_meta(pblk, rqd)) {
+		bio_io_error(bio);
+		pblk_free_rqd(pblk, rqd, PBLK_READ);
+		return;
+	}
+
+	/* Clone read bio to deal internally with:
+	 * -read errors when reading from drive
+	 * -bio_advance() calls during l2p lookup and cache reads
+	 */
+	int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
 
 	if (nr_secs > 1)
 		pblk_read_ppalist_rq(pblk, rqd, bio, blba, read_bitmap);
 	else
 		pblk_read_rq(pblk, rqd, bio, blba, read_bitmap);
 
+	r_ctx->private = bio; /* original bio */
+	rqd->bio = int_bio; /* internal bio */
+
 	if (bitmap_full(read_bitmap, nr_secs)) {
+		pblk_end_user_read(bio, 0);
 		atomic_inc(&pblk->inflight_io);
 		__pblk_end_io_read(pblk, rqd, false);
-		return NVM_IO_DONE;
+		return;
 	}
 
-	/* All sectors are to be read from the device */
-	if (bitmap_empty(read_bitmap, rqd->nr_ppas)) {
-		struct bio *int_bio = NULL;
-
-		/* Clone read bio to deal with read errors internally */
-		int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
-		if (!int_bio) {
-			pblk_err(pblk, "could not clone read bio\n");
-			goto fail_end_io;
-		}
-
-		rqd->bio = int_bio;
-
-		if (pblk_submit_io(pblk, rqd)) {
+	if (!bitmap_empty(read_bitmap, rqd->nr_ppas)) {
+		/* The read bio request could be partially filled by the write
+		 * buffer, but there are some holes that need to be read from
+		 * the drive.
+		 */
+		bio_put(int_bio);
+		rqd->bio = NULL;
+		if (pblk_partial_read_bio(pblk, rqd, bio_init_idx, read_bitmap,
+					    nr_secs)) {
 			pblk_err(pblk, "read IO submission failed\n");
-			ret = NVM_IO_ERR;
-			goto fail_end_io;
+			bio_io_error(bio);
+			__pblk_end_io_read(pblk, rqd, false);
 		}
-
-		return NVM_IO_OK;
+		return;
 	}
 
-	/* The read bio request could be partially filled by the write buffer,
-	 * but there are some holes that need to be read from the drive.
-	 */
-	ret = pblk_partial_read_bio(pblk, rqd, bio_init_idx, read_bitmap,
-				    nr_secs);
-	if (ret)
-		goto fail_meta_free;
-
-	return NVM_IO_OK;
-
-fail_meta_free:
-	nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
-fail_rqd_free:
-	pblk_free_rqd(pblk, rqd, PBLK_READ);
-	return ret;
-fail_end_io:
-	__pblk_end_io_read(pblk, rqd, false);
-	return ret;
+	/* All sectors are to be read from the device */
+	if (pblk_submit_io(pblk, rqd)) {
+		pblk_err(pblk, "read IO submission failed\n");
+		bio_io_error(bio);
+		__pblk_end_io_read(pblk, rqd, false);
+	}
 }
 
 static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index e304754aaa3c..17ced12db7dd 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -867,7 +867,7 @@ void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
 /*
  * pblk user I/O write path
  */
-int pblk_write_to_cache(struct pblk *pblk, struct bio *bio,
+void pblk_write_to_cache(struct pblk *pblk, struct bio *bio,
 			unsigned long flags);
 int pblk_write_gc_to_cache(struct pblk *pblk, struct pblk_gc_rq *gc_rq);
 
@@ -893,7 +893,7 @@ void pblk_write_kick(struct pblk *pblk);
  * pblk read path
  */
 extern struct bio_set pblk_bio_set;
-int pblk_submit_read(struct pblk *pblk, struct bio *bio);
+void pblk_submit_read(struct pblk *pblk, struct bio *bio);
 int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq);
 /*
  * pblk recovery
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 22/26] lightnvm: pblk: recover only written metadata
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (20 preceding siblings ...)
  2019-05-04 18:38 ` [GIT PULL 21/26] lightnvm: pblk: IO path reorganization Matias Bjørling
@ 2019-05-04 18:38 ` Matias Bjørling
  2019-05-04 18:38 ` [GIT PULL 23/26] lightnvm: track inflight target creations Matias Bjørling
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:38 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

This patch ensures that smeta was fully written before even
trying to read it based on chunk table state and write pointer.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-recovery.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 865fe310cab4..a9085b0e6611 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -655,10 +655,12 @@ static int pblk_line_was_written(struct pblk_line *line,
 	bppa = pblk->luns[smeta_blk].bppa;
 	chunk = &line->chks[pblk_ppa_to_pos(geo, bppa)];
 
-	if (chunk->state & NVM_CHK_ST_FREE)
-		return 0;
+	if (chunk->state & NVM_CHK_ST_CLOSED ||
+	    (chunk->state & NVM_CHK_ST_OPEN
+	     && chunk->wp >= lm->smeta_sec))
+		return 1;
 
-	return 1;
+	return 0;
 }
 
 static bool pblk_line_is_open(struct pblk *pblk, struct pblk_line *line)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 23/26] lightnvm: track inflight target creations
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (21 preceding siblings ...)
  2019-05-04 18:38 ` [GIT PULL 22/26] lightnvm: pblk: recover only written metadata Matias Bjørling
@ 2019-05-04 18:38 ` Matias Bjørling
  2019-05-04 18:38 ` [GIT PULL 24/26] lightnvm: do not remove instance under global lock Matias Bjørling
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:38 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

When creation process is still in progress, target is not yet on
targets list. This causes a chance for removing whole lightnvm
subsystem by calling nvm_unregister() in the meantime and finally by
causing kernel panic inside target init function.

This patch changes the behaviour by adding kref variable which tracks
all the users of nvm_dev structure. When nvm_dev is allocated, kref
value is set to 1. Then before every target creation the value is
increased and decreased after target removal. The extra reference
is decreased when nvm subsystem is unregistered.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/core.c  | 41 ++++++++++++++++++++++++++++++----------
 include/linux/lightnvm.h |  1 +
 2 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index e2abe88a139c..0e9f7996ff1d 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -45,6 +45,8 @@ struct nvm_dev_map {
 	int num_ch;
 };
 
+static void nvm_free(struct kref *ref);
+
 static struct nvm_target *nvm_find_target(struct nvm_dev *dev, const char *name)
 {
 	struct nvm_target *tgt;
@@ -501,6 +503,7 @@ static int nvm_remove_tgt(struct nvm_dev *dev, struct nvm_ioctl_remove *remove)
 	}
 	__nvm_remove_target(t, true);
 	mutex_unlock(&dev->mlock);
+	kref_put(&dev->ref, nvm_free);
 
 	return 0;
 }
@@ -1094,15 +1097,16 @@ static int nvm_core_init(struct nvm_dev *dev)
 	return ret;
 }
 
-static void nvm_free(struct nvm_dev *dev)
+static void nvm_free(struct kref *ref)
 {
-	if (!dev)
-		return;
+	struct nvm_dev *dev = container_of(ref, struct nvm_dev, ref);
 
 	if (dev->dma_pool)
 		dev->ops->destroy_dma_pool(dev->dma_pool);
 
-	nvm_unregister_map(dev);
+	if (dev->rmap)
+		nvm_unregister_map(dev);
+
 	kfree(dev->lun_map);
 	kfree(dev);
 }
@@ -1139,7 +1143,13 @@ static int nvm_init(struct nvm_dev *dev)
 
 struct nvm_dev *nvm_alloc_dev(int node)
 {
-	return kzalloc_node(sizeof(struct nvm_dev), GFP_KERNEL, node);
+	struct nvm_dev *dev;
+
+	dev = kzalloc_node(sizeof(struct nvm_dev), GFP_KERNEL, node);
+	if (dev)
+		kref_init(&dev->ref);
+
+	return dev;
 }
 EXPORT_SYMBOL(nvm_alloc_dev);
 
@@ -1147,12 +1157,16 @@ int nvm_register(struct nvm_dev *dev)
 {
 	int ret, exp_pool_size;
 
-	if (!dev->q || !dev->ops)
+	if (!dev->q || !dev->ops) {
+		kref_put(&dev->ref, nvm_free);
 		return -EINVAL;
+	}
 
 	ret = nvm_init(dev);
-	if (ret)
+	if (ret) {
+		kref_put(&dev->ref, nvm_free);
 		return ret;
+	}
 
 	exp_pool_size = max_t(int, PAGE_SIZE,
 			      (NVM_MAX_VLBA * (sizeof(u64) + dev->geo.sos)));
@@ -1162,7 +1176,7 @@ int nvm_register(struct nvm_dev *dev)
 						  exp_pool_size);
 	if (!dev->dma_pool) {
 		pr_err("nvm: could not create dma pool\n");
-		nvm_free(dev);
+		kref_put(&dev->ref, nvm_free);
 		return -ENOMEM;
 	}
 
@@ -1184,6 +1198,7 @@ void nvm_unregister(struct nvm_dev *dev)
 		if (t->dev->parent != dev)
 			continue;
 		__nvm_remove_target(t, false);
+		kref_put(&dev->ref, nvm_free);
 	}
 	mutex_unlock(&dev->mlock);
 
@@ -1191,13 +1206,14 @@ void nvm_unregister(struct nvm_dev *dev)
 	list_del(&dev->devices);
 	up_write(&nvm_lock);
 
-	nvm_free(dev);
+	kref_put(&dev->ref, nvm_free);
 }
 EXPORT_SYMBOL(nvm_unregister);
 
 static int __nvm_configure_create(struct nvm_ioctl_create *create)
 {
 	struct nvm_dev *dev;
+	int ret;
 
 	down_write(&nvm_lock);
 	dev = nvm_find_nvm_dev(create->dev);
@@ -1208,7 +1224,12 @@ static int __nvm_configure_create(struct nvm_ioctl_create *create)
 		return -EINVAL;
 	}
 
-	return nvm_create_tgt(dev, create);
+	kref_get(&dev->ref);
+	ret = nvm_create_tgt(dev, create);
+	if (ret)
+		kref_put(&dev->ref, nvm_free);
+
+	return ret;
 }
 
 static long nvm_ioctl_info(struct file *file, void __user *arg)
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index d3b02708e5f0..4d0d5655c7b2 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -428,6 +428,7 @@ struct nvm_dev {
 	char name[DISK_NAME_LEN];
 	void *private_data;
 
+	struct kref ref;
 	void *rmap;
 
 	struct mutex mlock;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 24/26] lightnvm: do not remove instance under global lock
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (22 preceding siblings ...)
  2019-05-04 18:38 ` [GIT PULL 23/26] lightnvm: track inflight target creations Matias Bjørling
@ 2019-05-04 18:38 ` Matias Bjørling
  2019-05-04 18:38 ` [GIT PULL 25/26] lightnvm: pblk: simplify partial read path Matias Bjørling
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:38 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

Currently all the target instances are removed under global nvm_lock.
This was needed to ensure that nvm_dev struct will not be freed by
hot unplug event during target removal. However, current implementation
has some drawbacks, since the same lock is used when new nvme subsystem
is registered, so we can have a situation, that due to long process of
target removal on drive A, registration (and listing in OS) of the
drive B will take a lot of time, since it will wait for that lock.

Now when we have kref which ensures that nvm_dev will not be freed in
the meantime, we can easily get rid of this lock for a time when we are
removing nvm targets.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/core.c | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 0e9f7996ff1d..0df7454832ef 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -483,7 +483,6 @@ static void __nvm_remove_target(struct nvm_target *t, bool graceful)
 
 /**
  * nvm_remove_tgt - Removes a target from the media manager
- * @dev:	device
  * @remove:	ioctl structure with target name to remove.
  *
  * Returns:
@@ -491,18 +490,27 @@ static void __nvm_remove_target(struct nvm_target *t, bool graceful)
  * 1: on not found
  * <0: on error
  */
-static int nvm_remove_tgt(struct nvm_dev *dev, struct nvm_ioctl_remove *remove)
+static int nvm_remove_tgt(struct nvm_ioctl_remove *remove)
 {
 	struct nvm_target *t;
+	struct nvm_dev *dev;
 
-	mutex_lock(&dev->mlock);
-	t = nvm_find_target(dev, remove->tgtname);
-	if (!t) {
+	down_read(&nvm_lock);
+	list_for_each_entry(dev, &nvm_devices, devices) {
+		mutex_lock(&dev->mlock);
+		t = nvm_find_target(dev, remove->tgtname);
+		if (t) {
+			mutex_unlock(&dev->mlock);
+			break;
+		}
 		mutex_unlock(&dev->mlock);
+	}
+	up_read(&nvm_lock);
+
+	if (!t)
 		return 1;
-	}
+
 	__nvm_remove_target(t, true);
-	mutex_unlock(&dev->mlock);
 	kref_put(&dev->ref, nvm_free);
 
 	return 0;
@@ -1348,8 +1356,6 @@ static long nvm_ioctl_dev_create(struct file *file, void __user *arg)
 static long nvm_ioctl_dev_remove(struct file *file, void __user *arg)
 {
 	struct nvm_ioctl_remove remove;
-	struct nvm_dev *dev;
-	int ret = 0;
 
 	if (copy_from_user(&remove, arg, sizeof(struct nvm_ioctl_remove)))
 		return -EFAULT;
@@ -1361,15 +1367,7 @@ static long nvm_ioctl_dev_remove(struct file *file, void __user *arg)
 		return -EINVAL;
 	}
 
-	down_read(&nvm_lock);
-	list_for_each_entry(dev, &nvm_devices, devices) {
-		ret = nvm_remove_tgt(dev, &remove);
-		if (!ret)
-			break;
-	}
-	up_read(&nvm_lock);
-
-	return ret;
+	return nvm_remove_tgt(&remove);
 }
 
 /* kept for compatibility reasons */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 25/26] lightnvm: pblk: simplify partial read path
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (23 preceding siblings ...)
  2019-05-04 18:38 ` [GIT PULL 24/26] lightnvm: do not remove instance under global lock Matias Bjørling
@ 2019-05-04 18:38 ` Matias Bjørling
  2019-05-04 18:38 ` [GIT PULL 26/26] lightnvm: pblk: use nvm_rq_to_ppa_list() Matias Bjørling
  2019-05-06 16:20 ` [GIT PULL 00/26] lightnvm updates for 5.2 Jens Axboe
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:38 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

This patch changes the approach to handling partial read path.

In old approach merging of data from round buffer and drive was fully
made by drive. This had some disadvantages - code was complex and
relies on bio internals, so it was hard to maintain and was strongly
dependent on bio changes.

In new approach most of the handling is done mostly by block layer
functions such as bio_split(), bio_chain() and generic_make request()
and generally is less complex and easier to maintain. Below some more
details of the new approach.

When read bio arrives, it is cloned for pblk internal purposes. All
the L2P mapping, which includes copying data from round buffer to bio
and thus bio_advance() calls is done on the cloned bio, so the original
bio is untouched. If we found that we have partial read case, we
still have original bio untouched, so we can split it and continue to
process only first part of it in current context, when the rest will be
called as separate bio request which is passed to generic_make_request()
for further processing.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Heiner Litz <hlitz@ucsc.edu>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c |  13 +-
 drivers/lightnvm/pblk-rb.c   |  11 +-
 drivers/lightnvm/pblk-read.c | 339 +++++++++--------------------------
 drivers/lightnvm/pblk.h      |  18 +-
 4 files changed, 100 insertions(+), 281 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 73be3a0311ff..07270ba1e95f 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -2147,8 +2147,8 @@ void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
 	spin_unlock(&pblk->trans_lock);
 }
 
-void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
-			 sector_t blba, int nr_secs)
+int pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
+			 sector_t blba, int nr_secs, bool *from_cache)
 {
 	int i;
 
@@ -2162,10 +2162,19 @@ void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
 		if (!pblk_ppa_empty(ppa) && !pblk_addr_in_cache(ppa)) {
 			struct pblk_line *line = pblk_ppa_to_line(pblk, ppa);
 
+			if (i > 0 && *from_cache)
+				break;
+			*from_cache = false;
+
 			kref_get(&line->ref);
+		} else {
+			if (i > 0 && !*from_cache)
+				break;
+			*from_cache = true;
 		}
 	}
 	spin_unlock(&pblk->trans_lock);
+	return i;
 }
 
 void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 35550148b5e8..5abb1705b039 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -642,7 +642,7 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
  * be directed to disk.
  */
 int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
-			struct ppa_addr ppa, int bio_iter, bool advanced_bio)
+			struct ppa_addr ppa)
 {
 	struct pblk *pblk = container_of(rb, struct pblk, rwb);
 	struct pblk_rb_entry *entry;
@@ -673,15 +673,6 @@ int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
 		ret = 0;
 		goto out;
 	}
-
-	/* Only advance the bio if it hasn't been advanced already. If advanced,
-	 * this bio is at least a partial bio (i.e., it has partially been
-	 * filled with data from the cache). If part of the data resides on the
-	 * media, we will read later on
-	 */
-	if (unlikely(!advanced_bio))
-		bio_advance(bio, bio_iter * PBLK_EXPOSED_PAGE_SIZE);
-
 	data = bio_data(bio);
 	memcpy(data, entry->data, rb->seg_size);
 
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index f5f155d540e2..d98ea392fe33 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -26,8 +26,7 @@
  * issued.
  */
 static int pblk_read_from_cache(struct pblk *pblk, struct bio *bio,
-				sector_t lba, struct ppa_addr ppa,
-				int bio_iter, bool advanced_bio)
+				sector_t lba, struct ppa_addr ppa)
 {
 #ifdef CONFIG_NVM_PBLK_DEBUG
 	/* Callers must ensure that the ppa points to a cache address */
@@ -35,73 +34,75 @@ static int pblk_read_from_cache(struct pblk *pblk, struct bio *bio,
 	BUG_ON(!pblk_addr_in_cache(ppa));
 #endif
 
-	return pblk_rb_copy_to_bio(&pblk->rwb, bio, lba, ppa,
-						bio_iter, advanced_bio);
+	return pblk_rb_copy_to_bio(&pblk->rwb, bio, lba, ppa);
 }
 
-static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
+static int pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
 				 struct bio *bio, sector_t blba,
-				 unsigned long *read_bitmap)
+				 bool *from_cache)
 {
 	void *meta_list = rqd->meta_list;
-	struct ppa_addr ppas[NVM_MAX_VLBA];
-	int nr_secs = rqd->nr_ppas;
-	bool advanced_bio = false;
-	int i, j = 0;
+	int nr_secs, i;
 
-	pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
+retry:
+	nr_secs = pblk_lookup_l2p_seq(pblk, rqd->ppa_list, blba, rqd->nr_ppas,
+					from_cache);
+
+	if (!*from_cache)
+		goto end;
 
 	for (i = 0; i < nr_secs; i++) {
-		struct ppa_addr p = ppas[i];
 		struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
 		sector_t lba = blba + i;
 
-retry:
-		if (pblk_ppa_empty(p)) {
+		if (pblk_ppa_empty(rqd->ppa_list[i])) {
 			__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
 
-			WARN_ON(test_and_set_bit(i, read_bitmap));
 			meta->lba = addr_empty;
-
-			if (unlikely(!advanced_bio)) {
-				bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
-				advanced_bio = true;
+		} else if (pblk_addr_in_cache(rqd->ppa_list[i])) {
+			/*
+			 * Try to read from write buffer. The address is later
+			 * checked on the write buffer to prevent retrieving
+			 * overwritten data.
+			 */
+			if (!pblk_read_from_cache(pblk, bio, lba,
+							rqd->ppa_list[i])) {
+				if (i == 0) {
+					/*
+					 * We didn't call with bio_advance()
+					 * yet, so we can just retry.
+					 */
+					goto retry;
+				} else {
+					/*
+					 * We already call bio_advance()
+					 * so we cannot retry and we need
+					 * to quit that function in order
+					 * to allow caller to handle the bio
+					 * splitting in the current sector
+					 * position.
+					 */
+					nr_secs = i;
+					goto end;
+				}
 			}
-
-			goto next;
-		}
-
-		/* Try to read from write buffer. The address is later checked
-		 * on the write buffer to prevent retrieving overwritten data.
-		 */
-		if (pblk_addr_in_cache(p)) {
-			if (!pblk_read_from_cache(pblk, bio, lba, p, i,
-								advanced_bio)) {
-				pblk_lookup_l2p_seq(pblk, &p, lba, 1);
-				goto retry;
-			}
-			WARN_ON(test_and_set_bit(i, read_bitmap));
 			meta->lba = cpu_to_le64(lba);
-			advanced_bio = true;
 #ifdef CONFIG_NVM_PBLK_DEBUG
 			atomic_long_inc(&pblk->cache_reads);
 #endif
-		} else {
-			/* Read from media non-cached sectors */
-			rqd->ppa_list[j++] = p;
 		}
-
-next:
-		if (advanced_bio)
-			bio_advance(bio, PBLK_EXPOSED_PAGE_SIZE);
+		bio_advance(bio, PBLK_EXPOSED_PAGE_SIZE);
 	}
 
+end:
 	if (pblk_io_aligned(pblk, nr_secs))
 		rqd->is_seq = 1;
 
 #ifdef CONFIG_NVM_PBLK_DEBUG
 	atomic_long_add(nr_secs, &pblk->inflight_reads);
 #endif
+
+	return nr_secs;
 }
 
 
@@ -197,9 +198,7 @@ static void __pblk_end_io_read(struct pblk *pblk, struct nvm_rq *rqd,
 		pblk_log_read_err(pblk, rqd);
 
 	pblk_read_check_seq(pblk, rqd, r_ctx->lba);
-
-	if (int_bio)
-		bio_put(int_bio);
+	bio_put(int_bio);
 
 	if (put_line)
 		pblk_rq_to_line_put(pblk, rqd);
@@ -223,183 +222,13 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
 	__pblk_end_io_read(pblk, rqd, true);
 }
 
-static void pblk_end_partial_read(struct nvm_rq *rqd)
-{
-	struct pblk *pblk = rqd->private;
-	struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
-	struct pblk_pr_ctx *pr_ctx = r_ctx->private;
-	struct pblk_sec_meta *meta;
-	struct bio *new_bio = rqd->bio;
-	struct bio *bio = pr_ctx->orig_bio;
-	void *meta_list = rqd->meta_list;
-	unsigned long *read_bitmap = pr_ctx->bitmap;
-	struct bvec_iter orig_iter = BVEC_ITER_ALL_INIT;
-	struct bvec_iter new_iter = BVEC_ITER_ALL_INIT;
-	int nr_secs = pr_ctx->orig_nr_secs;
-	int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
-	void *src_p, *dst_p;
-	int bit, i;
-
-	if (unlikely(nr_holes == 1)) {
-		struct ppa_addr ppa;
-
-		ppa = rqd->ppa_addr;
-		rqd->ppa_list = pr_ctx->ppa_ptr;
-		rqd->dma_ppa_list = pr_ctx->dma_ppa_list;
-		rqd->ppa_list[0] = ppa;
-	}
-
-	for (i = 0; i < nr_secs; i++) {
-		meta = pblk_get_meta(pblk, meta_list, i);
-		pr_ctx->lba_list_media[i] = le64_to_cpu(meta->lba);
-		meta->lba = cpu_to_le64(pr_ctx->lba_list_mem[i]);
-	}
-
-	/* Fill the holes in the original bio */
-	i = 0;
-	for (bit = 0; bit < nr_secs; bit++) {
-		if (!test_bit(bit, read_bitmap)) {
-			struct bio_vec dst_bv, src_bv;
-			struct pblk_line *line;
-
-			line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
-			kref_put(&line->ref, pblk_line_put);
-
-			meta = pblk_get_meta(pblk, meta_list, bit);
-			meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]);
-
-			dst_bv = bio_iter_iovec(bio, orig_iter);
-			src_bv = bio_iter_iovec(new_bio, new_iter);
-
-			src_p = kmap_atomic(src_bv.bv_page);
-			dst_p = kmap_atomic(dst_bv.bv_page);
-
-			memcpy(dst_p + dst_bv.bv_offset,
-				src_p + src_bv.bv_offset,
-				PBLK_EXPOSED_PAGE_SIZE);
-
-			kunmap_atomic(src_p);
-			kunmap_atomic(dst_p);
-
-			flush_dcache_page(dst_bv.bv_page);
-			mempool_free(src_bv.bv_page, &pblk->page_bio_pool);
-
-			bio_advance_iter(new_bio, &new_iter,
-					PBLK_EXPOSED_PAGE_SIZE);
-			i++;
-		}
-		bio_advance_iter(bio, &orig_iter, PBLK_EXPOSED_PAGE_SIZE);
-	}
-
-	bio_put(new_bio);
-	kfree(pr_ctx);
-
-	/* restore original request */
-	rqd->bio = NULL;
-	rqd->nr_ppas = nr_secs;
-
-	pblk_end_user_read(bio, rqd->error);
-	__pblk_end_io_read(pblk, rqd, false);
-}
-
-static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
-			    unsigned int bio_init_idx,
-			    unsigned long *read_bitmap,
-			    int nr_holes)
-{
-	void *meta_list = rqd->meta_list;
-	struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
-	struct pblk_pr_ctx *pr_ctx;
-	struct bio *new_bio, *bio = r_ctx->private;
-	int nr_secs = rqd->nr_ppas;
-	int i;
-
-	new_bio = bio_alloc(GFP_KERNEL, nr_holes);
-
-	if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
-		goto fail_bio_put;
-
-	if (nr_holes != new_bio->bi_vcnt) {
-		WARN_ONCE(1, "pblk: malformed bio\n");
-		goto fail_free_pages;
-	}
-
-	pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
-	if (!pr_ctx)
-		goto fail_free_pages;
-
-	for (i = 0; i < nr_secs; i++) {
-		struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
-
-		pr_ctx->lba_list_mem[i] = le64_to_cpu(meta->lba);
-	}
-
-	new_bio->bi_iter.bi_sector = 0; /* internal bio */
-	bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
-
-	rqd->bio = new_bio;
-	rqd->nr_ppas = nr_holes;
-
-	pr_ctx->orig_bio = bio;
-	bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
-	pr_ctx->bio_init_idx = bio_init_idx;
-	pr_ctx->orig_nr_secs = nr_secs;
-	r_ctx->private = pr_ctx;
-
-	if (unlikely(nr_holes == 1)) {
-		pr_ctx->ppa_ptr = rqd->ppa_list;
-		pr_ctx->dma_ppa_list = rqd->dma_ppa_list;
-		rqd->ppa_addr = rqd->ppa_list[0];
-	}
-	return 0;
-
-fail_free_pages:
-	pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
-fail_bio_put:
-	bio_put(new_bio);
-
-	return -ENOMEM;
-}
-
-static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
-				 unsigned int bio_init_idx,
-				 unsigned long *read_bitmap, int nr_secs)
-{
-	int nr_holes;
-	int ret;
-
-	nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
-
-	if (pblk_setup_partial_read(pblk, rqd, bio_init_idx, read_bitmap,
-				    nr_holes))
-		return NVM_IO_ERR;
-
-	rqd->end_io = pblk_end_partial_read;
-
-	ret = pblk_submit_io(pblk, rqd);
-	if (ret) {
-		bio_put(rqd->bio);
-		pblk_err(pblk, "partial read IO submission failed\n");
-		goto err;
-	}
-
-	return NVM_IO_OK;
-
-err:
-	pblk_err(pblk, "failed to perform partial read\n");
-
-	/* Free allocated pages in new bio */
-	pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt);
-	return NVM_IO_ERR;
-}
-
 static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
-			 sector_t lba, unsigned long *read_bitmap)
+			 sector_t lba, bool *from_cache)
 {
 	struct pblk_sec_meta *meta = pblk_get_meta(pblk, rqd->meta_list, 0);
 	struct ppa_addr ppa;
 
-	pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
+	pblk_lookup_l2p_seq(pblk, &ppa, lba, 1, from_cache);
 
 #ifdef CONFIG_NVM_PBLK_DEBUG
 	atomic_long_inc(&pblk->inflight_reads);
@@ -409,7 +238,6 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
 	if (pblk_ppa_empty(ppa)) {
 		__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
 
-		WARN_ON(test_and_set_bit(0, read_bitmap));
 		meta->lba = addr_empty;
 		return;
 	}
@@ -418,12 +246,11 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
 	 * write buffer to prevent retrieving overwritten data.
 	 */
 	if (pblk_addr_in_cache(ppa)) {
-		if (!pblk_read_from_cache(pblk, bio, lba, ppa, 0, 1)) {
-			pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
+		if (!pblk_read_from_cache(pblk, bio, lba, ppa)) {
+			pblk_lookup_l2p_seq(pblk, &ppa, lba, 1, from_cache);
 			goto retry;
 		}
 
-		WARN_ON(test_and_set_bit(0, read_bitmap));
 		meta->lba = cpu_to_le64(lba);
 
 #ifdef CONFIG_NVM_PBLK_DEBUG
@@ -440,17 +267,14 @@ void pblk_submit_read(struct pblk *pblk, struct bio *bio)
 	struct request_queue *q = dev->q;
 	sector_t blba = pblk_get_lba(bio);
 	unsigned int nr_secs = pblk_get_secs(bio);
+	bool from_cache;
 	struct pblk_g_ctx *r_ctx;
 	struct nvm_rq *rqd;
-	struct bio *int_bio;
-	unsigned int bio_init_idx;
-	DECLARE_BITMAP(read_bitmap, NVM_MAX_VLBA);
+	struct bio *int_bio, *split_bio;
 
 	generic_start_io_acct(q, REQ_OP_READ, bio_sectors(bio),
 			      &pblk->disk->part0);
 
-	bitmap_zero(read_bitmap, nr_secs);
-
 	rqd = pblk_alloc_rqd(pblk, PBLK_READ);
 
 	rqd->opcode = NVM_OP_PREAD;
@@ -462,11 +286,6 @@ void pblk_submit_read(struct pblk *pblk, struct bio *bio)
 	r_ctx->start_time = jiffies;
 	r_ctx->lba = blba;
 
-	/* Save the index for this bio's start. This is needed in case
-	 * we need to fill a partial read.
-	 */
-	bio_init_idx = pblk_get_bi_idx(bio);
-
 	if (pblk_alloc_rqd_meta(pblk, rqd)) {
 		bio_io_error(bio);
 		pblk_free_rqd(pblk, rqd, PBLK_READ);
@@ -475,46 +294,58 @@ void pblk_submit_read(struct pblk *pblk, struct bio *bio)
 
 	/* Clone read bio to deal internally with:
 	 * -read errors when reading from drive
-	 * -bio_advance() calls during l2p lookup and cache reads
+	 * -bio_advance() calls during cache reads
 	 */
 	int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
 
 	if (nr_secs > 1)
-		pblk_read_ppalist_rq(pblk, rqd, bio, blba, read_bitmap);
+		nr_secs = pblk_read_ppalist_rq(pblk, rqd, int_bio, blba,
+						&from_cache);
 	else
-		pblk_read_rq(pblk, rqd, bio, blba, read_bitmap);
+		pblk_read_rq(pblk, rqd, int_bio, blba, &from_cache);
 
+split_retry:
 	r_ctx->private = bio; /* original bio */
 	rqd->bio = int_bio; /* internal bio */
 
-	if (bitmap_full(read_bitmap, nr_secs)) {
+	if (from_cache && nr_secs == rqd->nr_ppas) {
+		/* All data was read from cache, we can complete the IO. */
 		pblk_end_user_read(bio, 0);
 		atomic_inc(&pblk->inflight_io);
 		__pblk_end_io_read(pblk, rqd, false);
-		return;
-	}
-
-	if (!bitmap_empty(read_bitmap, rqd->nr_ppas)) {
+	} else if (nr_secs != rqd->nr_ppas) {
 		/* The read bio request could be partially filled by the write
 		 * buffer, but there are some holes that need to be read from
-		 * the drive.
+		 * the drive. In order to handle this, we will use block layer
+		 * mechanism to split this request in to smaller ones and make
+		 * a chain of it.
+		 */
+		split_bio = bio_split(bio, nr_secs * NR_PHY_IN_LOG, GFP_KERNEL,
+					&pblk_bio_set);
+		bio_chain(split_bio, bio);
+		generic_make_request(bio);
+
+		/* New bio contains first N sectors of the previous one, so
+		 * we can continue to use existing rqd, but we need to shrink
+		 * the number of PPAs in it. New bio is also guaranteed that
+		 * it contains only either data from cache or from drive, newer
+		 * mix of them.
+		 */
+		bio = split_bio;
+		rqd->nr_ppas = nr_secs;
+		if (rqd->nr_ppas == 1)
+			rqd->ppa_addr = rqd->ppa_list[0];
+
+		/* Recreate int_bio - existing might have some needed internal
+		 * fields modified already.
 		 */
 		bio_put(int_bio);
-		rqd->bio = NULL;
-		if (pblk_partial_read_bio(pblk, rqd, bio_init_idx, read_bitmap,
-					    nr_secs)) {
-			pblk_err(pblk, "read IO submission failed\n");
-			bio_io_error(bio);
-			__pblk_end_io_read(pblk, rqd, false);
-		}
-		return;
-	}
-
-	/* All sectors are to be read from the device */
-	if (pblk_submit_io(pblk, rqd)) {
-		pblk_err(pblk, "read IO submission failed\n");
-		bio_io_error(bio);
-		__pblk_end_io_read(pblk, rqd, false);
+		int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
+		goto split_retry;
+	} else if (pblk_submit_io(pblk, rqd)) {
+		/* Submitting IO to drive failed, let's report an error */
+		rqd->error = -ENODEV;
+		pblk_end_io_read(rqd);
 	}
 }
 
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 17ced12db7dd..a67855387f53 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -121,18 +121,6 @@ struct pblk_g_ctx {
 	u64 lba;
 };
 
-/* partial read context */
-struct pblk_pr_ctx {
-	struct bio *orig_bio;
-	DECLARE_BITMAP(bitmap, NVM_MAX_VLBA);
-	unsigned int orig_nr_secs;
-	unsigned int bio_init_idx;
-	void *ppa_ptr;
-	dma_addr_t dma_ppa_list;
-	u64 lba_list_mem[NVM_MAX_VLBA];
-	u64 lba_list_media[NVM_MAX_VLBA];
-};
-
 /* Pad context */
 struct pblk_pad_rq {
 	struct pblk *pblk;
@@ -759,7 +747,7 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
 				 unsigned int pos, unsigned int nr_entries,
 				 unsigned int count);
 int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
-			struct ppa_addr ppa, int bio_iter, bool advanced_bio);
+			struct ppa_addr ppa);
 unsigned int pblk_rb_read_commit(struct pblk_rb *rb, unsigned int entries);
 
 unsigned int pblk_rb_sync_init(struct pblk_rb *rb, unsigned long *flags);
@@ -859,8 +847,8 @@ int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa,
 		       struct pblk_line *gc_line, u64 paddr);
 void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
 			  u64 *lba_list, int nr_secs);
-void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
-			 sector_t blba, int nr_secs);
+int pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
+			 sector_t blba, int nr_secs, bool *from_cache);
 void *pblk_get_meta_for_writes(struct pblk *pblk, struct nvm_rq *rqd);
 void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [GIT PULL 26/26] lightnvm: pblk: use nvm_rq_to_ppa_list()
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (24 preceding siblings ...)
  2019-05-04 18:38 ` [GIT PULL 25/26] lightnvm: pblk: simplify partial read path Matias Bjørling
@ 2019-05-04 18:38 ` Matias Bjørling
  2019-05-06 16:20 ` [GIT PULL 00/26] lightnvm updates for 5.2 Jens Axboe
  26 siblings, 0 replies; 28+ messages in thread
From: Matias Bjørling @ 2019-05-04 18:38 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Igor Konopko, Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

This patch replaces few remaining usages of rqd->ppa_list[] with
existing nvm_rq_to_ppa_list() helpers. This is needed for theoretical
devices with ws_min/ws_opt equal to 1.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c     | 26 ++++++++++++++------------
 drivers/lightnvm/pblk-recovery.c | 13 ++++++++-----
 2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 07270ba1e95f..773537804319 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -562,11 +562,9 @@ int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd)
 
 int pblk_submit_io_sync_sem(struct pblk *pblk, struct nvm_rq *rqd)
 {
-	struct ppa_addr *ppa_list;
+	struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
 	int ret;
 
-	ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
-
 	pblk_down_chunk(pblk, ppa_list[0]);
 	ret = pblk_submit_io_sync(pblk, rqd);
 	pblk_up_chunk(pblk, ppa_list[0]);
@@ -725,6 +723,7 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct bio *bio;
+	struct ppa_addr *ppa_list;
 	struct nvm_rq rqd;
 	u64 paddr = pblk_line_smeta_start(pblk, line);
 	int i, ret;
@@ -748,9 +747,10 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
 	rqd.opcode = NVM_OP_PREAD;
 	rqd.nr_ppas = lm->smeta_sec;
 	rqd.is_seq = 1;
+	ppa_list = nvm_rq_to_ppa_list(&rqd);
 
 	for (i = 0; i < lm->smeta_sec; i++, paddr++)
-		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
+		ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
 
 	ret = pblk_submit_io_sync(pblk, &rqd);
 	if (ret) {
@@ -777,6 +777,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct bio *bio;
+	struct ppa_addr *ppa_list;
 	struct nvm_rq rqd;
 	__le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
 	__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
@@ -801,12 +802,13 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
 	rqd.opcode = NVM_OP_PWRITE;
 	rqd.nr_ppas = lm->smeta_sec;
 	rqd.is_seq = 1;
+	ppa_list = nvm_rq_to_ppa_list(&rqd);
 
 	for (i = 0; i < lm->smeta_sec; i++, paddr++) {
 		struct pblk_sec_meta *meta = pblk_get_meta(pblk,
 							   rqd.meta_list, i);
 
-		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
+		ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
 		meta->lba = lba_list[paddr] = addr_empty;
 	}
 
@@ -836,8 +838,9 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 	struct nvm_geo *geo = &dev->geo;
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct pblk_line_meta *lm = &pblk->lm;
-	void *ppa_list, *meta_list;
+	void *ppa_list_buf, *meta_list;
 	struct bio *bio;
+	struct ppa_addr *ppa_list;
 	struct nvm_rq rqd;
 	u64 paddr = line->emeta_ssec;
 	dma_addr_t dma_ppa_list, dma_meta_list;
@@ -853,7 +856,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 	if (!meta_list)
 		return -ENOMEM;
 
-	ppa_list = meta_list + pblk_dma_meta_size(pblk);
+	ppa_list_buf = meta_list + pblk_dma_meta_size(pblk);
 	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
 next_rq:
@@ -874,11 +877,12 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 
 	rqd.bio = bio;
 	rqd.meta_list = meta_list;
-	rqd.ppa_list = ppa_list;
+	rqd.ppa_list = ppa_list_buf;
 	rqd.dma_meta_list = dma_meta_list;
 	rqd.dma_ppa_list = dma_ppa_list;
 	rqd.opcode = NVM_OP_PREAD;
 	rqd.nr_ppas = rq_ppas;
+	ppa_list = nvm_rq_to_ppa_list(&rqd);
 
 	for (i = 0; i < rqd.nr_ppas; ) {
 		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line_id);
@@ -906,7 +910,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 		}
 
 		for (j = 0; j < min; j++, i++, paddr++)
-			rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line_id);
+			ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line_id);
 	}
 
 	ret = pblk_submit_io_sync(pblk, &rqd);
@@ -1525,11 +1529,9 @@ void pblk_ppa_to_line_put(struct pblk *pblk, struct ppa_addr ppa)
 
 void pblk_rq_to_line_put(struct pblk *pblk, struct nvm_rq *rqd)
 {
-	struct ppa_addr *ppa_list;
+	struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
 	int i;
 
-	ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
-
 	for (i = 0; i < rqd->nr_ppas; i++)
 		pblk_ppa_to_line_put(pblk, ppa_list[i]);
 }
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index a9085b0e6611..e6dda04de144 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -179,6 +179,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	struct pblk_pad_rq *pad_rq;
 	struct nvm_rq *rqd;
 	struct bio *bio;
+	struct ppa_addr *ppa_list;
 	void *data;
 	__le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
 	u64 w_ptr = line->cur_sec;
@@ -239,6 +240,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	rqd->end_io = pblk_end_io_recov;
 	rqd->private = pad_rq;
 
+	ppa_list = nvm_rq_to_ppa_list(rqd);
 	meta_list = rqd->meta_list;
 
 	for (i = 0; i < rqd->nr_ppas; ) {
@@ -266,17 +268,17 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 			lba_list[w_ptr] = addr_empty;
 			meta = pblk_get_meta(pblk, meta_list, i);
 			meta->lba = addr_empty;
-			rqd->ppa_list[i] = dev_ppa;
+			ppa_list[i] = dev_ppa;
 		}
 	}
 
 	kref_get(&pad_rq->ref);
-	pblk_down_chunk(pblk, rqd->ppa_list[0]);
+	pblk_down_chunk(pblk, ppa_list[0]);
 
 	ret = pblk_submit_io(pblk, rqd);
 	if (ret) {
 		pblk_err(pblk, "I/O submission failed: %d\n", ret);
-		pblk_up_chunk(pblk, rqd->ppa_list[0]);
+		pblk_up_chunk(pblk, ppa_list[0]);
 		kref_put(&pad_rq->ref, pblk_recov_complete);
 		pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
 		bio_put(bio);
@@ -420,6 +422,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 	rqd->ppa_list = ppa_list;
 	rqd->dma_ppa_list = dma_ppa_list;
 	rqd->dma_meta_list = dma_meta_list;
+	ppa_list = nvm_rq_to_ppa_list(rqd);
 
 	if (pblk_io_aligned(pblk, rq_ppas))
 		rqd->is_seq = 1;
@@ -438,7 +441,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 		}
 
 		for (j = 0; j < pblk->min_write_pgs; j++, i++)
-			rqd->ppa_list[i] =
+			ppa_list[i] =
 				addr_to_gen_ppa(pblk, paddr + j, line->id);
 	}
 
@@ -486,7 +489,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 			continue;
 
 		line->nr_valid_lbas++;
-		pblk_update_map(pblk, lba, rqd->ppa_list[i]);
+		pblk_update_map(pblk, lba, ppa_list[i]);
 	}
 
 	left_ppas -= rq_ppas;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [GIT PULL 00/26] lightnvm updates for 5.2
  2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
                   ` (25 preceding siblings ...)
  2019-05-04 18:38 ` [GIT PULL 26/26] lightnvm: pblk: use nvm_rq_to_ppa_list() Matias Bjørling
@ 2019-05-06 16:20 ` Jens Axboe
  26 siblings, 0 replies; 28+ messages in thread
From: Jens Axboe @ 2019-05-06 16:20 UTC (permalink / raw)
  To: Matias Bjørling; +Cc: linux-block, linux-kernel

On 5/4/19 12:37 PM, Matias Bjørling wrote:
> Hi Jens,
> 
> Can you please pick up the following patches for the 5.2 window if
> it is too late.

It's very late. Even if I had applied this the second it came in, it
would not even have made linux-next before the merge window opens...
I'll queue this up for later in the merge window merging, but generally
I need to have bigger series a week before final. This generally means
around -rc6 time.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-05-06 16:20 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-04 18:37 [GIT PULL 00/26] lightnvm updates for 5.2 Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 01/26] lightnvm: pblk: line reference fix in GC Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 02/26] lightnvm: pblk: rollback on error during gc read Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 03/26] lightnvm: pblk: reduce L2P memory footprint Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 04/26] lightnvm: pblk: remove unused smeta_ssec field Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 05/26] lightnvm: pblk: gracefully handle GC vmalloc fail Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 06/26] lightnvm: pblk: fix race during put line Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 07/26] lightnvm: pblk: ensure that erase is chunk aligned Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 08/26] lightnvm: pblk: cleanly fail when there is not enough memory Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 09/26] lightnvm: pblk: set proper read status in bio Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 10/26] lightnvm: Inherit mdts from the parent nvme device Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 11/26] lightnvm: pblk: fix bio leak when bio is split Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 12/26] lightnvm: pblk: set propper line as data_line after gc Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 13/26] lightnvm: prevent race condition on pblk remove Matias Bjørling
2019-05-04 18:37 ` [GIT PULL 14/26] lightnvm: pblk: fix lock order in pblk_rb_tear_down_check Matias Bjørling
2019-05-04 18:38 ` [GIT PULL 15/26] lightnvm: pblk: kick writer on write recovery path Matias Bjørling
2019-05-04 18:38 ` [GIT PULL 16/26] lightnvm: pblk: fix update line wp in OOB recovery Matias Bjørling
2019-05-04 18:38 ` [GIT PULL 17/26] lightnvm: pblk: propagate errors when reading meta Matias Bjørling
2019-05-04 18:38 ` [GIT PULL 18/26] lightnvm: pblk: wait for inflight IOs in recovery Matias Bjørling
2019-05-04 18:38 ` [GIT PULL 19/26] lightnvm: pblk: remove internal IO timeout Matias Bjørling
2019-05-04 18:38 ` [GIT PULL 20/26] lightnvm: pblk: GC error handling Matias Bjørling
2019-05-04 18:38 ` [GIT PULL 21/26] lightnvm: pblk: IO path reorganization Matias Bjørling
2019-05-04 18:38 ` [GIT PULL 22/26] lightnvm: pblk: recover only written metadata Matias Bjørling
2019-05-04 18:38 ` [GIT PULL 23/26] lightnvm: track inflight target creations Matias Bjørling
2019-05-04 18:38 ` [GIT PULL 24/26] lightnvm: do not remove instance under global lock Matias Bjørling
2019-05-04 18:38 ` [GIT PULL 25/26] lightnvm: pblk: simplify partial read path Matias Bjørling
2019-05-04 18:38 ` [GIT PULL 26/26] lightnvm: pblk: use nvm_rq_to_ppa_list() Matias Bjørling
2019-05-06 16:20 ` [GIT PULL 00/26] lightnvm updates for 5.2 Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).