linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/18] lightnvm: next set of improvements for 5.2
@ 2019-03-14 16:04 Igor Konopko
  2019-03-14 16:04 ` [PATCH 01/18] lightnvm: pblk: fix warning in pblk_l2p_init() Igor Konopko
                   ` (17 more replies)
  0 siblings, 18 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This is another set of fixes and improvements to both pblk and lightnvm
core. 

First patch fixes an issue related to format specifier introduced in
previous series (reported by 0-day). Second patch is another leftover from
previous patchset, since we decided to reorganize it a little. Third patch is
the most crutial, since it changes the approach to partial read path, so
detailed review is needed especially here.

Other patches are my other findings related to some bugs or potential
improvements, mostly related to some corner cases, so with a lower
priority to review for now.

I did some testing with QEMU, but as always feedback and testing is
appreciated.

Igor Konopko (18):
  lightnvm: pblk: fix warning in pblk_l2p_init()
  lightnvm: pblk: warn when there are opened chunks
  lightnvm: pblk: simplify partial read path
  lightnvm: pblk: OOB recovery for closed chunks fix
  lightnvm: pblk: propagate errors when reading meta
  lightnvm: pblk: recover only written metadata
  lightnvm: pblk: wait for inflight IOs in recovery
  lightnvm: pblk: fix spin_unlock order
  lightnvm: pblk: kick writer on write recovery path
  lightnvm: pblk: ensure that emeta is written
  lightnvm: pblk: fix update line wp in OOB recovery
  lightnvm: pblk: do not read OOB from emeta region
  lightnvm: pblk: store multiple copies of smeta
  lightnvm: pblk: GC error handling
  lightnvm: pblk: fix in case of lack of lines
  lightnvm: pblk: use nvm_rq_to_ppa_list()
  lightnvm: allow to use full device path
  lightnvm: track inflight target creations

 drivers/lightnvm/core.c          |  42 ++++++-
 drivers/lightnvm/pblk-core.c     | 166 ++++++++++++++++++++------
 drivers/lightnvm/pblk-gc.c       |   5 +-
 drivers/lightnvm/pblk-init.c     |  48 +++++---
 drivers/lightnvm/pblk-map.c      |   2 +-
 drivers/lightnvm/pblk-rb.c       |   2 +-
 drivers/lightnvm/pblk-read.c     | 243 +++++++--------------------------------
 drivers/lightnvm/pblk-recovery.c | 127 +++++++++++++++-----
 drivers/lightnvm/pblk-write.c    |  25 ++++
 drivers/lightnvm/pblk.h          |  16 +--
 include/linux/lightnvm.h         |   2 +
 11 files changed, 375 insertions(+), 303 deletions(-)

-- 
2.9.5


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 01/18] lightnvm: pblk: fix warning in pblk_l2p_init()
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-16 22:29   ` Javier González
  2019-03-18 16:25   ` Matias Bjørling
  2019-03-14 16:04 ` [PATCH 02/18] lightnvm: pblk: warn when there are opened chunks Igor Konopko
                   ` (16 subsequent siblings)
  17 siblings, 2 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This patch fixes a compilation warning caused by improper format
specifier provided in pblk_l2p_init().

Fixes: fe0c220 ("lightnvm: pblk: cleanly fail when there is not enough memory")
Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 1940f89..1e227a0 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -159,7 +159,7 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
 					| __GFP_RETRY_MAYFAIL | __GFP_HIGHMEM,
 					PAGE_KERNEL);
 	if (!pblk->trans_map) {
-		pblk_err(pblk, "failed to allocate L2P (need %ld of memory)\n",
+		pblk_err(pblk, "failed to allocate L2P (need %zu of memory)\n",
 				map_size);
 		return -ENOMEM;
 	}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 02/18] lightnvm: pblk: warn when there are opened chunks
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
  2019-03-14 16:04 ` [PATCH 01/18] lightnvm: pblk: fix warning in pblk_l2p_init() Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-16 22:36   ` Javier González
  2019-03-17 19:39   ` Matias Bjørling
  2019-03-14 16:04 ` [PATCH 03/18] lightnvm: pblk: simplify partial read path Igor Konopko
                   ` (15 subsequent siblings)
  17 siblings, 2 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

In case of factory pblk init, we might have a situation when there are
some opened chunks. Based on OCSSD spec we are not allowed to transform
chunks from the open state directly to the free state, so such a reset
can lead to IO error (even that most of the controller will allow for
for such a operation anyway), so we would like to warn users about such
a situation at least.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-init.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 1e227a0..b7845f6 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -712,7 +712,7 @@ static int pblk_set_provision(struct pblk *pblk, int nr_free_chks)
 }
 
 static int pblk_setup_line_meta_chk(struct pblk *pblk, struct pblk_line *line,
-				   struct nvm_chk_meta *meta)
+				   struct nvm_chk_meta *meta, int *opened)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct nvm_geo *geo = &dev->geo;
@@ -748,6 +748,9 @@ static int pblk_setup_line_meta_chk(struct pblk *pblk, struct pblk_line *line,
 			continue;
 		}
 
+		if (chunk->state & NVM_CHK_ST_OPEN)
+			(*opened)++;
+
 		if (!(chunk->state & NVM_CHK_ST_OFFLINE))
 			continue;
 
@@ -759,7 +762,7 @@ static int pblk_setup_line_meta_chk(struct pblk *pblk, struct pblk_line *line,
 }
 
 static long pblk_setup_line_meta(struct pblk *pblk, struct pblk_line *line,
-				 void *chunk_meta, int line_id)
+				 void *chunk_meta, int line_id, int *opened)
 {
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct pblk_line_meta *lm = &pblk->lm;
@@ -773,7 +776,7 @@ static long pblk_setup_line_meta(struct pblk *pblk, struct pblk_line *line,
 	line->vsc = &l_mg->vsc_list[line_id];
 	spin_lock_init(&line->lock);
 
-	nr_bad_chks = pblk_setup_line_meta_chk(pblk, line, chunk_meta);
+	nr_bad_chks = pblk_setup_line_meta_chk(pblk, line, chunk_meta, opened);
 
 	chk_in_line = lm->blk_per_line - nr_bad_chks;
 	if (nr_bad_chks < 0 || nr_bad_chks > lm->blk_per_line ||
@@ -1019,12 +1022,12 @@ static int pblk_line_meta_init(struct pblk *pblk)
 	return 0;
 }
 
-static int pblk_lines_init(struct pblk *pblk)
+static int pblk_lines_init(struct pblk *pblk, bool factory)
 {
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct pblk_line *line;
 	void *chunk_meta;
-	int nr_free_chks = 0;
+	int nr_free_chks = 0, nr_opened_chks = 0;
 	int i, ret;
 
 	ret = pblk_line_meta_init(pblk);
@@ -1059,7 +1062,8 @@ static int pblk_lines_init(struct pblk *pblk)
 		if (ret)
 			goto fail_free_lines;
 
-		nr_free_chks += pblk_setup_line_meta(pblk, line, chunk_meta, i);
+		nr_free_chks += pblk_setup_line_meta(pblk, line, chunk_meta, i,
+							&nr_opened_chks);
 
 		trace_pblk_line_state(pblk_disk_name(pblk), line->id,
 								line->state);
@@ -1071,6 +1075,11 @@ static int pblk_lines_init(struct pblk *pblk)
 		goto fail_free_lines;
 	}
 
+	if (factory && nr_opened_chks) {
+		pblk_warn(pblk, "%d opened chunks during factory creation\n",
+				nr_opened_chks);
+	}
+
 	ret = pblk_set_provision(pblk, nr_free_chks);
 	if (ret)
 		goto fail_free_lines;
@@ -1235,7 +1244,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
 		goto fail;
 	}
 
-	ret = pblk_lines_init(pblk);
+	ret = pblk_lines_init(pblk, flags & NVM_TARGET_FACTORY);
 	if (ret) {
 		pblk_err(pblk, "could not initialize lines\n");
 		goto fail_free_core;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 03/18] lightnvm: pblk: simplify partial read path
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
  2019-03-14 16:04 ` [PATCH 01/18] lightnvm: pblk: fix warning in pblk_l2p_init() Igor Konopko
  2019-03-14 16:04 ` [PATCH 02/18] lightnvm: pblk: warn when there are opened chunks Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-14 21:35   ` Heiner Litz
  2019-03-14 16:04 ` [PATCH 04/18] lightnvm: pblk: OOB recovery for closed chunks fix Igor Konopko
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This patch changes the approach to handling partial read path.

In old approach merging of data from round buffer and drive was fully
made by drive. This had some disadvantages - code was complex and
relies on bio internals, so it was hard to maintain and was strongly
dependent on bio changes.

In new approach most of the handling is done mostly by block layer
functions such as bio_split(), bio_chain() and generic_make request()
and generally is less complex and easier to maintain. Below some more
details of the new approach.

When read bio arrives, it is cloned for pblk internal purposes. All
the L2P mapping, which includes copying data from round buffer to bio
and thus bio_advance() calls is done on the cloned bio, so the original
bio is untouched. Later if we found that we have partial read case, we
still have original bio untouched, so we can split it and continue to
process only first 4K of it in current context, when the rest will be
called as separate bio request which is passed to generic_make_request()
for further processing.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-read.c | 242 ++++++++-----------------------------------
 drivers/lightnvm/pblk.h      |  12 ---
 2 files changed, 41 insertions(+), 213 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 6569746..54422a2 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -222,171 +222,6 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
 	__pblk_end_io_read(pblk, rqd, true);
 }
 
-static void pblk_end_partial_read(struct nvm_rq *rqd)
-{
-	struct pblk *pblk = rqd->private;
-	struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
-	struct pblk_pr_ctx *pr_ctx = r_ctx->private;
-	struct pblk_sec_meta *meta;
-	struct bio *new_bio = rqd->bio;
-	struct bio *bio = pr_ctx->orig_bio;
-	struct bio_vec src_bv, dst_bv;
-	void *meta_list = rqd->meta_list;
-	int bio_init_idx = pr_ctx->bio_init_idx;
-	unsigned long *read_bitmap = pr_ctx->bitmap;
-	int nr_secs = pr_ctx->orig_nr_secs;
-	int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
-	void *src_p, *dst_p;
-	int hole, i;
-
-	if (unlikely(nr_holes == 1)) {
-		struct ppa_addr ppa;
-
-		ppa = rqd->ppa_addr;
-		rqd->ppa_list = pr_ctx->ppa_ptr;
-		rqd->dma_ppa_list = pr_ctx->dma_ppa_list;
-		rqd->ppa_list[0] = ppa;
-	}
-
-	for (i = 0; i < nr_secs; i++) {
-		meta = pblk_get_meta(pblk, meta_list, i);
-		pr_ctx->lba_list_media[i] = le64_to_cpu(meta->lba);
-		meta->lba = cpu_to_le64(pr_ctx->lba_list_mem[i]);
-	}
-
-	/* Fill the holes in the original bio */
-	i = 0;
-	hole = find_first_zero_bit(read_bitmap, nr_secs);
-	do {
-		struct pblk_line *line;
-
-		line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
-		kref_put(&line->ref, pblk_line_put);
-
-		meta = pblk_get_meta(pblk, meta_list, hole);
-		meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]);
-
-		src_bv = new_bio->bi_io_vec[i++];
-		dst_bv = bio->bi_io_vec[bio_init_idx + hole];
-
-		src_p = kmap_atomic(src_bv.bv_page);
-		dst_p = kmap_atomic(dst_bv.bv_page);
-
-		memcpy(dst_p + dst_bv.bv_offset,
-			src_p + src_bv.bv_offset,
-			PBLK_EXPOSED_PAGE_SIZE);
-
-		kunmap_atomic(src_p);
-		kunmap_atomic(dst_p);
-
-		mempool_free(src_bv.bv_page, &pblk->page_bio_pool);
-
-		hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1);
-	} while (hole < nr_secs);
-
-	bio_put(new_bio);
-	kfree(pr_ctx);
-
-	/* restore original request */
-	rqd->bio = NULL;
-	rqd->nr_ppas = nr_secs;
-
-	pblk_end_user_read(bio, rqd->error);
-	__pblk_end_io_read(pblk, rqd, false);
-}
-
-static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
-			    unsigned int bio_init_idx,
-			    unsigned long *read_bitmap,
-			    int nr_holes)
-{
-	void *meta_list = rqd->meta_list;
-	struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
-	struct pblk_pr_ctx *pr_ctx;
-	struct bio *new_bio, *bio = r_ctx->private;
-	int nr_secs = rqd->nr_ppas;
-	int i;
-
-	new_bio = bio_alloc(GFP_KERNEL, nr_holes);
-
-	if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
-		goto fail_bio_put;
-
-	if (nr_holes != new_bio->bi_vcnt) {
-		WARN_ONCE(1, "pblk: malformed bio\n");
-		goto fail_free_pages;
-	}
-
-	pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
-	if (!pr_ctx)
-		goto fail_free_pages;
-
-	for (i = 0; i < nr_secs; i++) {
-		struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
-
-		pr_ctx->lba_list_mem[i] = le64_to_cpu(meta->lba);
-	}
-
-	new_bio->bi_iter.bi_sector = 0; /* internal bio */
-	bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
-
-	rqd->bio = new_bio;
-	rqd->nr_ppas = nr_holes;
-
-	pr_ctx->orig_bio = bio;
-	bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
-	pr_ctx->bio_init_idx = bio_init_idx;
-	pr_ctx->orig_nr_secs = nr_secs;
-	r_ctx->private = pr_ctx;
-
-	if (unlikely(nr_holes == 1)) {
-		pr_ctx->ppa_ptr = rqd->ppa_list;
-		pr_ctx->dma_ppa_list = rqd->dma_ppa_list;
-		rqd->ppa_addr = rqd->ppa_list[0];
-	}
-	return 0;
-
-fail_free_pages:
-	pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
-fail_bio_put:
-	bio_put(new_bio);
-
-	return -ENOMEM;
-}
-
-static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
-				 unsigned int bio_init_idx,
-				 unsigned long *read_bitmap, int nr_secs)
-{
-	int nr_holes;
-	int ret;
-
-	nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
-
-	if (pblk_setup_partial_read(pblk, rqd, bio_init_idx, read_bitmap,
-				    nr_holes))
-		return NVM_IO_ERR;
-
-	rqd->end_io = pblk_end_partial_read;
-
-	ret = pblk_submit_io(pblk, rqd);
-	if (ret) {
-		bio_put(rqd->bio);
-		pblk_err(pblk, "partial read IO submission failed\n");
-		goto err;
-	}
-
-	return NVM_IO_OK;
-
-err:
-	pblk_err(pblk, "failed to perform partial read\n");
-
-	/* Free allocated pages in new bio */
-	pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt);
-	__pblk_end_io_read(pblk, rqd, false);
-	return NVM_IO_ERR;
-}
-
 static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
 			 sector_t lba, unsigned long *read_bitmap)
 {
@@ -432,11 +267,11 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct request_queue *q = dev->q;
+	struct bio *split_bio, *int_bio;
 	sector_t blba = pblk_get_lba(bio);
 	unsigned int nr_secs = pblk_get_secs(bio);
 	struct pblk_g_ctx *r_ctx;
 	struct nvm_rq *rqd;
-	unsigned int bio_init_idx;
 	DECLARE_BITMAP(read_bitmap, NVM_MAX_VLBA);
 	int ret = NVM_IO_ERR;
 
@@ -456,61 +291,66 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 	r_ctx = nvm_rq_to_pdu(rqd);
 	r_ctx->start_time = jiffies;
 	r_ctx->lba = blba;
-	r_ctx->private = bio; /* original bio */
 
-	/* Save the index for this bio's start. This is needed in case
-	 * we need to fill a partial read.
+	/* Clone read bio to deal with:
+	 * -usage of bio_advance() when memcpy data from round buffer
+	 * -read errors in case of reading from device
 	 */
-	bio_init_idx = pblk_get_bi_idx(bio);
+	int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
+	if (!int_bio)
+		return NVM_IO_ERR;
 
 	if (pblk_alloc_rqd_meta(pblk, rqd))
 		goto fail_rqd_free;
 
 	if (nr_secs > 1)
-		pblk_read_ppalist_rq(pblk, rqd, bio, blba, read_bitmap);
+		pblk_read_ppalist_rq(pblk, rqd, int_bio, blba, read_bitmap);
 	else
-		pblk_read_rq(pblk, rqd, bio, blba, read_bitmap);
+		pblk_read_rq(pblk, rqd, int_bio, blba, read_bitmap);
+
+split_retry:
+	r_ctx->private = bio; /* original bio */
 
-	if (bitmap_full(read_bitmap, nr_secs)) {
+	if (bitmap_full(read_bitmap, rqd->nr_ppas)) {
+		bio_put(int_bio);
 		atomic_inc(&pblk->inflight_io);
 		__pblk_end_io_read(pblk, rqd, false);
 		return NVM_IO_DONE;
 	}
 
-	/* All sectors are to be read from the device */
-	if (bitmap_empty(read_bitmap, rqd->nr_ppas)) {
-		struct bio *int_bio = NULL;
+	if (!bitmap_empty(read_bitmap, rqd->nr_ppas)) {
+		/* The read bio request could be partially filled by the write
+		 * buffer, but there are some holes that need to be read from
+		 * the drive. In order to handle this, we will use block layer
+		 * mechanism to split this request in to smaller ones.
+		 */
+		split_bio = bio_split(bio, NR_PHY_IN_LOG, GFP_KERNEL,
+					&pblk_bio_set);
+		bio_chain(split_bio, bio);
+		generic_make_request(bio);
+
+		/* Retry after split with new bio and existing rqd */
+		bio = split_bio;
+		rqd->nr_ppas = 1;
+		rqd->ppa_addr = rqd->ppa_list[0];
 
-		/* Clone read bio to deal with read errors internally */
+		/* Recreate int_bio for the retry needs */
+		bio_put(int_bio);
 		int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
-		if (!int_bio) {
-			pblk_err(pblk, "could not clone read bio\n");
-			goto fail_end_io;
-		}
-
-		rqd->bio = int_bio;
-
-		if (pblk_submit_io(pblk, rqd)) {
-			pblk_err(pblk, "read IO submission failed\n");
-			ret = NVM_IO_ERR;
-			goto fail_end_io;
-		}
-
-		return NVM_IO_OK;
+		if (!int_bio)
+			return NVM_IO_ERR;
+		goto split_retry;
 	}
 
-	/* The read bio request could be partially filled by the write buffer,
-	 * but there are some holes that need to be read from the drive.
-	 */
-	ret = pblk_partial_read_bio(pblk, rqd, bio_init_idx, read_bitmap,
-				    nr_secs);
-	if (ret)
-		goto fail_meta_free;
-
+	/* All sectors are to be read from the device */
+	rqd->bio = int_bio;
+	if (pblk_submit_io(pblk, rqd)) {
+		pblk_err(pblk, "read IO submission failed\n");
+		ret = NVM_IO_ERR;
+		goto fail_end_io;
+	}
 	return NVM_IO_OK;
 
-fail_meta_free:
-	nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
 fail_rqd_free:
 	pblk_free_rqd(pblk, rqd, PBLK_READ);
 	return ret;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 381f074..0a85990 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -123,18 +123,6 @@ struct pblk_g_ctx {
 	u64 lba;
 };
 
-/* partial read context */
-struct pblk_pr_ctx {
-	struct bio *orig_bio;
-	DECLARE_BITMAP(bitmap, NVM_MAX_VLBA);
-	unsigned int orig_nr_secs;
-	unsigned int bio_init_idx;
-	void *ppa_ptr;
-	dma_addr_t dma_ppa_list;
-	u64 lba_list_mem[NVM_MAX_VLBA];
-	u64 lba_list_media[NVM_MAX_VLBA];
-};
-
 /* Pad context */
 struct pblk_pad_rq {
 	struct pblk *pblk;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 04/18] lightnvm: pblk: OOB recovery for closed chunks fix
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (2 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 03/18] lightnvm: pblk: simplify partial read path Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-16 22:43   ` Javier González
  2019-03-14 16:04 ` [PATCH 05/18] lightnvm: pblk: propagate errors when reading meta Igor Konopko
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

In case of OOB recovery, when some of the chunks are in closed state,
we are calculating number of written sectors in line incorrectly,
because we are always counting chunk WP, which for closed chunks
does not longer reflects written sectors in particular chunks. This
patch for such a chunks takes clba field instead.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-recovery.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 83b467b..bcd3633 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -101,6 +101,8 @@ static void pblk_update_line_wp(struct pblk *pblk, struct pblk_line *line,
 
 static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line *line)
 {
+	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
 	struct pblk_line_meta *lm = &pblk->lm;
 	int nr_bb = bitmap_weight(line->blk_bitmap, lm->blk_per_line);
 	u64 written_secs = 0;
@@ -113,7 +115,11 @@ static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line *line)
 		if (chunk->state & NVM_CHK_ST_OFFLINE)
 			continue;
 
-		written_secs += chunk->wp;
+		if (chunk->state & NVM_CHK_ST_OPEN)
+			written_secs += chunk->wp;
+		else if (chunk->state & NVM_CHK_ST_CLOSED)
+			written_secs += geo->clba;
+
 		valid_chunks++;
 	}
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 05/18] lightnvm: pblk: propagate errors when reading meta
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (3 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 04/18] lightnvm: pblk: OOB recovery for closed chunks fix Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-16 22:48   ` Javier González
  2019-03-18 11:54   ` Hans Holmberg
  2019-03-14 16:04 ` [PATCH 06/18] lightnvm: pblk: recover only written metadata Igor Konopko
                   ` (12 subsequent siblings)
  17 siblings, 2 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

Currently when smeta/emeta/oob is read errors are not always propagated
correctly. This patch changes that behaviour and propagates all the
error codes except of high ecc read warning status.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c     | 9 +++++++--
 drivers/lightnvm/pblk-recovery.c | 2 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 39280c1..38e26fe 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -761,8 +761,10 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
 
 	atomic_dec(&pblk->inflight_io);
 
-	if (rqd.error)
+	if (rqd.error && rqd.error != NVM_RSP_WARN_HIGHECC) {
 		pblk_log_read_err(pblk, &rqd);
+		ret = -EIO;
+	}
 
 clear_rqd:
 	pblk_free_rqd_meta(pblk, &rqd);
@@ -916,8 +918,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 
 	atomic_dec(&pblk->inflight_io);
 
-	if (rqd.error)
+	if (rqd.error && rqd.error != NVM_RSP_WARN_HIGHECC) {
 		pblk_log_read_err(pblk, &rqd);
+		ret = -EIO;
+		goto free_rqd_dma;
+	}
 
 	emeta_buf += rq_len;
 	left_ppas -= rq_ppas;
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index bcd3633..688fdeb 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -450,7 +450,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 	atomic_dec(&pblk->inflight_io);
 
 	/* If a read fails, do a best effort by padding the line and retrying */
-	if (rqd->error) {
+	if (rqd->error && rqd->error != NVM_RSP_WARN_HIGHECC) {
 		int pad_distance, ret;
 
 		if (padded) {
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 06/18] lightnvm: pblk: recover only written metadata
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (4 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 05/18] lightnvm: pblk: propagate errors when reading meta Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-16 23:46   ` Javier González
  2019-03-14 16:04 ` [PATCH 07/18] lightnvm: pblk: wait for inflight IOs in recovery Igor Konopko
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This patch ensures that smeta/emeta was written properly before even
trying to read it based on chunk table state and write pointer.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-recovery.c | 43 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 688fdeb..ba1691d 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -653,8 +653,42 @@ static int pblk_line_was_written(struct pblk_line *line,
 	bppa = pblk->luns[smeta_blk].bppa;
 	chunk = &line->chks[pblk_ppa_to_pos(geo, bppa)];
 
-	if (chunk->state & NVM_CHK_ST_FREE)
-		return 0;
+	if (chunk->state & NVM_CHK_ST_CLOSED ||
+	    (chunk->state & NVM_CHK_ST_OPEN
+	     && chunk->wp >= lm->smeta_sec))
+		return 1;
+
+	return 0;
+}
+
+static int pblk_line_was_emeta_written(struct pblk_line *line,
+				       struct pblk *pblk)
+{
+
+	struct pblk_line_meta *lm = &pblk->lm;
+	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
+	struct nvm_chk_meta *chunk;
+	struct ppa_addr ppa;
+	int i, pos;
+	int min = pblk->min_write_pgs;
+	u64 paddr = line->emeta_ssec;
+
+	for (i = 0; i < lm->emeta_sec[0]; i++, paddr++) {
+		ppa = addr_to_gen_ppa(pblk, paddr, line->id);
+		pos = pblk_ppa_to_pos(geo, ppa);
+		while (test_bit(pos, line->blk_bitmap)) {
+			paddr += min;
+			ppa = addr_to_gen_ppa(pblk, paddr, line->id);
+			pos = pblk_ppa_to_pos(geo, ppa);
+		}
+		chunk = &line->chks[pos];
+
+		if (!(chunk->state & NVM_CHK_ST_CLOSED ||
+		    (chunk->state & NVM_CHK_ST_OPEN
+		     && chunk->wp > ppa.m.sec)))
+			return 0;
+	}
 
 	return 1;
 }
@@ -788,6 +822,11 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
 			goto next;
 		}
 
+		if (!pblk_line_was_emeta_written(line, pblk)) {
+			pblk_recov_l2p_from_oob(pblk, line);
+			goto next;
+		}
+
 		if (pblk_line_emeta_read(pblk, line, line->emeta->buf)) {
 			pblk_recov_l2p_from_oob(pblk, line);
 			goto next;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 07/18] lightnvm: pblk: wait for inflight IOs in recovery
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (5 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 06/18] lightnvm: pblk: recover only written metadata Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-17 19:33   ` Matias Bjørling
  2019-03-14 16:04 ` [PATCH 08/18] lightnvm: pblk: fix spin_unlock order Igor Konopko
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This patch changes the behaviour of recovery padding in order to
support a case, when some IOs were already submitted to the drive and
some next one are not submitted due to error returned.

Currently in case of errors we simply exit the pad function without
waiting for inflight IOs, which leads to panic on inflight IOs
completion.

After the changes we always wait for all the inflight IOs before
exiting the function.

Also, since NVMe has an internal timeout per IO, there is no need to
introduce additonal one here.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-recovery.c | 32 +++++++++++++-------------------
 1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index ba1691d..73d5ead 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -200,7 +200,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
 	if (rq_ppas < pblk->min_write_pgs) {
 		pblk_err(pblk, "corrupted pad line %d\n", line->id);
-		goto fail_free_pad;
+		goto fail_complete;
 	}
 
 	rq_len = rq_ppas * geo->csecs;
@@ -209,7 +209,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 						PBLK_VMALLOC_META, GFP_KERNEL);
 	if (IS_ERR(bio)) {
 		ret = PTR_ERR(bio);
-		goto fail_free_pad;
+		goto fail_complete;
 	}
 
 	bio->bi_iter.bi_sector = 0; /* internal bio */
@@ -218,8 +218,11 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	rqd = pblk_alloc_rqd(pblk, PBLK_WRITE_INT);
 
 	ret = pblk_alloc_rqd_meta(pblk, rqd);
-	if (ret)
-		goto fail_free_rqd;
+	if (ret) {
+		pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
+		bio_put(bio);
+		goto fail_complete;
+	}
 
 	rqd->bio = bio;
 	rqd->opcode = NVM_OP_PWRITE;
@@ -266,7 +269,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	if (ret) {
 		pblk_err(pblk, "I/O submission failed: %d\n", ret);
 		pblk_up_chunk(pblk, rqd->ppa_list[0]);
-		goto fail_free_rqd;
+		kref_put(&pad_rq->ref, pblk_recov_complete);
+		pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
+		bio_put(bio);
+		goto fail_complete;
 	}
 
 	left_line_ppas -= rq_ppas;
@@ -274,13 +280,9 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	if (left_ppas && left_line_ppas)
 		goto next_pad_rq;
 
+fail_complete:
 	kref_put(&pad_rq->ref, pblk_recov_complete);
-
-	if (!wait_for_completion_io_timeout(&pad_rq->wait,
-				msecs_to_jiffies(PBLK_COMMAND_TIMEOUT_MS))) {
-		pblk_err(pblk, "pad write timed out\n");
-		ret = -ETIME;
-	}
+	wait_for_completion(&pad_rq->wait);
 
 	if (!pblk_line_is_full(line))
 		pblk_err(pblk, "corrupted padded line: %d\n", line->id);
@@ -289,14 +291,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 free_rq:
 	kfree(pad_rq);
 	return ret;
-
-fail_free_rqd:
-	pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
-	bio_put(bio);
-fail_free_pad:
-	kfree(pad_rq);
-	vfree(data);
-	return ret;
 }
 
 static int pblk_pad_distance(struct pblk *pblk, struct pblk_line *line)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 08/18] lightnvm: pblk: fix spin_unlock order
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (6 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 07/18] lightnvm: pblk: wait for inflight IOs in recovery Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-16 23:49   ` Javier González
  2019-03-18 11:55   ` Hans Holmberg
  2019-03-14 16:04 ` [PATCH 09/18] lightnvm: pblk: kick writer on write recovery path Igor Konopko
                   ` (9 subsequent siblings)
  17 siblings, 2 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

In pblk_rb_tear_down_check() spin_unlock() functions are not
called in proper order. This patch fixes that.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-rb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 03c241b..3555014 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -799,8 +799,8 @@ int pblk_rb_tear_down_check(struct pblk_rb *rb)
 	}
 
 out:
-	spin_unlock(&rb->w_lock);
 	spin_unlock_irq(&rb->s_lock);
+	spin_unlock(&rb->w_lock);
 
 	return ret;
 }
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 09/18] lightnvm: pblk: kick writer on write recovery path
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (7 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 08/18] lightnvm: pblk: fix spin_unlock order Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-16 23:54   ` Javier González
  2019-03-18 11:58   ` Hans Holmberg
  2019-03-14 16:04 ` [PATCH 10/18] lightnvm: pblk: ensure that emeta is written Igor Konopko
                   ` (8 subsequent siblings)
  17 siblings, 2 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

In case of write recovery path, there is a chance that writer thread
is not active, so for sanity it would be good to always kick it.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-write.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index 6593dea..4e63f9b 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -228,6 +228,7 @@ static void pblk_submit_rec(struct work_struct *work)
 	mempool_free(recovery, &pblk->rec_pool);
 
 	atomic_dec(&pblk->inflight_io);
+	pblk_write_kick(pblk);
 }
 
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 10/18] lightnvm: pblk: ensure that emeta is written
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (8 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 09/18] lightnvm: pblk: kick writer on write recovery path Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-17 19:44   ` Matias Bjørling
  2019-03-18  7:46   ` Javier González
  2019-03-14 16:04 ` [PATCH 11/18] lightnvm: pblk: fix update line wp in OOB recovery Igor Konopko
                   ` (7 subsequent siblings)
  17 siblings, 2 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

When we are trying to switch to the new line, we need to ensure that
emeta for n-2 line is already written. In other case we can end with
deadlock scenario, when the writer has no more requests to write and
thus there is no way to trigger emeta writes from writer thread. This
is a corner case scenario which occurs in a case of multiple writes
error and thus kind of early line close due to lack of line space.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c  |  2 ++
 drivers/lightnvm/pblk-write.c | 24 ++++++++++++++++++++++++
 drivers/lightnvm/pblk.h       |  1 +
 3 files changed, 27 insertions(+)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 38e26fe..a683d1f 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1001,6 +1001,7 @@ static void pblk_line_setup_metadata(struct pblk_line *line,
 				     struct pblk_line_mgmt *l_mg,
 				     struct pblk_line_meta *lm)
 {
+	struct pblk *pblk = container_of(l_mg, struct pblk, l_mg);
 	int meta_line;
 
 	lockdep_assert_held(&l_mg->free_lock);
@@ -1009,6 +1010,7 @@ static void pblk_line_setup_metadata(struct pblk_line *line,
 	meta_line = find_first_zero_bit(&l_mg->meta_bitmap, PBLK_DATA_LINES);
 	if (meta_line == PBLK_DATA_LINES) {
 		spin_unlock(&l_mg->free_lock);
+		pblk_write_emeta_force(pblk);
 		io_schedule();
 		spin_lock(&l_mg->free_lock);
 		goto retry_meta;
diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index 4e63f9b..4fbb9b2 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -505,6 +505,30 @@ static struct pblk_line *pblk_should_submit_meta_io(struct pblk *pblk,
 	return meta_line;
 }
 
+void pblk_write_emeta_force(struct pblk *pblk)
+{
+	struct pblk_line_meta *lm = &pblk->lm;
+	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
+	struct pblk_line *meta_line;
+
+	while (true) {
+		spin_lock(&l_mg->close_lock);
+		if (list_empty(&l_mg->emeta_list)) {
+			spin_unlock(&l_mg->close_lock);
+			break;
+		}
+		meta_line = list_first_entry(&l_mg->emeta_list,
+						struct pblk_line, list);
+		if (meta_line->emeta->mem >= lm->emeta_len[0]) {
+			spin_unlock(&l_mg->close_lock);
+			io_schedule();
+			continue;
+		}
+		spin_unlock(&l_mg->close_lock);
+		pblk_submit_meta_io(pblk, meta_line);
+	}
+}
+
 static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
 {
 	struct ppa_addr erase_ppa;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 0a85990..a42bbfb 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -877,6 +877,7 @@ int pblk_write_ts(void *data);
 void pblk_write_timer_fn(struct timer_list *t);
 void pblk_write_should_kick(struct pblk *pblk);
 void pblk_write_kick(struct pblk *pblk);
+void pblk_write_emeta_force(struct pblk *pblk);
 
 /*
  * pblk read path
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 11/18] lightnvm: pblk: fix update line wp in OOB recovery
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (9 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 10/18] lightnvm: pblk: ensure that emeta is written Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-18  6:56   ` Javier González
  2019-03-14 16:04 ` [PATCH 12/18] lightnvm: pblk: do not read OOB from emeta region Igor Konopko
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

In case of OOB recovery, we can hit the scenario when all the data in
line were written and some part of emeta was written too. In such
a case pblk_update_line_wp() function will call pblk_alloc_page()
function which will case to set left_msecs to value below zero
(since this field does not track emeta region) and thus will lead to
multiple kernel warnings. This patch fixes that issue.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-recovery.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 73d5ead..4764596 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -93,10 +93,24 @@ static int pblk_recov_l2p_from_emeta(struct pblk *pblk, struct pblk_line *line)
 static void pblk_update_line_wp(struct pblk *pblk, struct pblk_line *line,
 				u64 written_secs)
 {
-	int i;
+	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
+	int i = 0;
 
-	for (i = 0; i < written_secs; i += pblk->min_write_pgs)
-		pblk_alloc_page(pblk, line, pblk->min_write_pgs);
+	for (; i < written_secs; i += pblk->min_write_pgs)
+		__pblk_alloc_page(pblk, line, pblk->min_write_pgs);
+
+	spin_lock(&l_mg->free_lock);
+	if (written_secs > line->left_msecs) {
+		/*
+		 * We have all data sectors written
+		 * and some emeta sectors written too.
+		 */
+		line->left_msecs = 0;
+	} else {
+		/* We have only some data sectors written. */
+		line->left_msecs -= written_secs;
+	}
+	spin_unlock(&l_mg->free_lock);
 }
 
 static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line *line)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 12/18] lightnvm: pblk: do not read OOB from emeta region
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (10 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 11/18] lightnvm: pblk: fix update line wp in OOB recovery Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-17 19:56   ` Matias Bjørling
  2019-03-14 16:04 ` [PATCH 13/18] lightnvm: pblk: store multiple copies of smeta Igor Konopko
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

Emeta does not have corresponding OOB metadata mapping valid, so there
is no need to try to recover L2P mapping from it.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-recovery.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 4764596..2132260 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -479,6 +479,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 		goto retry_rq;
 	}
 
+	if (paddr >= line->emeta_ssec) {
+		/*
+		 * We reach emeta region and we don't want
+		 * to recover oob from emeta region.
+		 */
+		goto completed;
+	}
+
 	pblk_get_packed_meta(pblk, rqd);
 	bio_put(bio);
 
@@ -499,6 +507,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 	if (left_ppas > 0)
 		goto next_rq;
 
+completed:
 #ifdef CONFIG_NVM_PBLK_DEBUG
 	WARN_ON(padded && !pblk_line_is_full(line));
 #endif
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 13/18] lightnvm: pblk: store multiple copies of smeta
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (11 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 12/18] lightnvm: pblk: do not read OOB from emeta region Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-18  7:33   ` Javier González
  2019-03-14 16:04 ` [PATCH 14/18] lightnvm: pblk: GC error handling Igor Konopko
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

Currently there is only one copy of emeta stored per line in pblk. This
is risky, because in case of read error on such a chunk, we are losing
all the data from whole line, what leads to silent data corruption.

This patch changes this behaviour and stores 2 copies of smeta (but
can be easily increased with kernel parameter to different value) in
order to provide higher reliability by storing mirrored copies of
smeta struct and providing possibility to failover to another copy of
that struct in case of read error. Such an approach ensures that copies
of that critical structures will be stored on different dies and thus
predicted UBER is multiple times higher

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c     | 125 ++++++++++++++++++++++++++++++++-------
 drivers/lightnvm/pblk-init.c     |  23 +++++--
 drivers/lightnvm/pblk-recovery.c |   2 +-
 drivers/lightnvm/pblk.h          |   1 +
 4 files changed, 123 insertions(+), 28 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index a683d1f..4d5cd99 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -720,13 +720,14 @@ u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line)
 	return bit * geo->ws_opt;
 }
 
-int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
+static int pblk_line_smeta_read_copy(struct pblk *pblk,
+				     struct pblk_line *line, u64 paddr)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct bio *bio;
 	struct nvm_rq rqd;
-	u64 paddr = pblk_line_smeta_start(pblk, line);
 	int i, ret;
 
 	memset(&rqd, 0, sizeof(struct nvm_rq));
@@ -735,7 +736,8 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
 	if (ret)
 		return ret;
 
-	bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
+	bio = bio_map_kern(dev->q, line->smeta,
+			   lm->smeta_len / lm->smeta_copies, GFP_KERNEL);
 	if (IS_ERR(bio)) {
 		ret = PTR_ERR(bio);
 		goto clear_rqd;
@@ -746,11 +748,23 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
 
 	rqd.bio = bio;
 	rqd.opcode = NVM_OP_PREAD;
-	rqd.nr_ppas = lm->smeta_sec;
+	rqd.nr_ppas = lm->smeta_sec / lm->smeta_copies;
 	rqd.is_seq = 1;
 
-	for (i = 0; i < lm->smeta_sec; i++, paddr++)
-		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
+	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
+		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line->id);
+		int pos = pblk_ppa_to_pos(geo, ppa);
+
+		while (test_bit(pos, line->blk_bitmap)) {
+			paddr += pblk->min_write_pgs;
+			ppa = addr_to_gen_ppa(pblk, paddr, line->id);
+			pos = pblk_ppa_to_pos(geo, ppa);
+		}
+
+		rqd.ppa_list[i] = ppa;
+		pblk_get_meta(pblk, rqd.meta_list, i)->lba =
+				  cpu_to_le64(ADDR_EMPTY);
+	}
 
 	ret = pblk_submit_io_sync(pblk, &rqd);
 	if (ret) {
@@ -771,16 +785,63 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
 	return ret;
 }
 
-static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
-				 u64 paddr)
+int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
+{
+	struct pblk_line_meta *lm = &pblk->lm;
+	int i, ret = 0, smeta_sec = lm->smeta_sec / lm->smeta_copies;
+	u64 paddr = pblk_line_smeta_start(pblk, line);
+
+	for (i = 0; i < lm->smeta_copies; i++) {
+		ret = pblk_line_smeta_read_copy(pblk, line,
+						paddr + (i * smeta_sec));
+		if (!ret) {
+			/*
+			 * Just one successfully read copy of smeta is
+			 * enough for us for recovery, don't need to
+			 * read another one.
+			 */
+			return ret;
+		}
+	}
+	return ret;
+}
+
+static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct bio *bio;
 	struct nvm_rq rqd;
 	__le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
 	__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
-	int i, ret;
+	u64 paddr = 0;
+	int smeta_cpy_len = lm->smeta_len / lm->smeta_copies;
+	int smeta_cpy_sec = lm->smeta_sec / lm->smeta_copies;
+	int i, ret, rq_writes;
+
+	/*
+	 * Check if we can write all the smeta copies with
+	 * a single write command.
+	 * If yes -> copy smeta sector into multiple copies
+	 * in buffer to write.
+	 * If no -> issue writes one by one using the same
+	 * buffer space.
+	 * Only if all the copies are written correctly
+	 * we are treating this line as valid for proper
+	 * UBER reliability.
+	 */
+	if (lm->smeta_sec > pblk->max_write_pgs) {
+		rq_writes = lm->smeta_copies;
+	} else {
+		rq_writes = 1;
+		for (i = 1; i < lm->smeta_copies; i++) {
+			memcpy(line->smeta + i * smeta_cpy_len,
+			       line->smeta, smeta_cpy_len);
+		}
+		smeta_cpy_len = lm->smeta_len;
+		smeta_cpy_sec = lm->smeta_sec;
+	}
 
 	memset(&rqd, 0, sizeof(struct nvm_rq));
 
@@ -788,7 +849,8 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
 	if (ret)
 		return ret;
 
-	bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
+next_rq:
+	bio = bio_map_kern(dev->q, line->smeta, smeta_cpy_len, GFP_KERNEL);
 	if (IS_ERR(bio)) {
 		ret = PTR_ERR(bio);
 		goto clear_rqd;
@@ -799,15 +861,23 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
 
 	rqd.bio = bio;
 	rqd.opcode = NVM_OP_PWRITE;
-	rqd.nr_ppas = lm->smeta_sec;
+	rqd.nr_ppas = smeta_cpy_sec;
 	rqd.is_seq = 1;
 
-	for (i = 0; i < lm->smeta_sec; i++, paddr++) {
-		struct pblk_sec_meta *meta = pblk_get_meta(pblk,
-							   rqd.meta_list, i);
+	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
+		void *meta_list = rqd.meta_list;
+		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line->id);
+		int pos = pblk_ppa_to_pos(geo, ppa);
+
+		while (test_bit(pos, line->blk_bitmap)) {
+			paddr += pblk->min_write_pgs;
+			ppa = addr_to_gen_ppa(pblk, paddr, line->id);
+			pos = pblk_ppa_to_pos(geo, ppa);
+		}
 
-		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
-		meta->lba = lba_list[paddr] = addr_empty;
+		rqd.ppa_list[i] = ppa;
+		pblk_get_meta(pblk, meta_list, i)->lba = addr_empty;
+		lba_list[paddr] = addr_empty;
 	}
 
 	ret = pblk_submit_io_sync_sem(pblk, &rqd);
@@ -822,8 +892,13 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
 	if (rqd.error) {
 		pblk_log_write_err(pblk, &rqd);
 		ret = -EIO;
+		goto clear_rqd;
 	}
 
+	rq_writes--;
+	if (rq_writes > 0)
+		goto next_rq;
+
 clear_rqd:
 	pblk_free_rqd_meta(pblk, &rqd);
 	return ret;
@@ -1149,7 +1224,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	u64 off;
 	int bit = -1;
-	int emeta_secs;
+	int emeta_secs, smeta_secs;
 
 	line->sec_in_line = lm->sec_per_line;
 
@@ -1165,13 +1240,19 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
 	}
 
 	/* Mark smeta metadata sectors as bad sectors */
-	bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line);
-	off = bit * geo->ws_opt;
-	bitmap_set(line->map_bitmap, off, lm->smeta_sec);
+	smeta_secs = lm->smeta_sec;
+	bit = -1;
+	while (smeta_secs) {
+		bit = find_next_zero_bit(line->blk_bitmap, lm->blk_per_line,
+					bit + 1);
+		off = bit * geo->ws_opt;
+		bitmap_set(line->map_bitmap, off, geo->ws_opt);
+		line->cur_sec = off + geo->ws_opt;
+		smeta_secs -= geo->ws_opt;
+	}
 	line->sec_in_line -= lm->smeta_sec;
-	line->cur_sec = off + lm->smeta_sec;
 
-	if (init && pblk_line_smeta_write(pblk, line, off)) {
+	if (init && pblk_line_smeta_write(pblk, line)) {
 		pblk_debug(pblk, "line smeta I/O failed. Retry\n");
 		return 0;
 	}
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index b7845f6..e771df6 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -27,6 +27,11 @@ static unsigned int write_buffer_size;
 module_param(write_buffer_size, uint, 0644);
 MODULE_PARM_DESC(write_buffer_size, "number of entries in a write buffer");
 
+static unsigned int smeta_copies = 2;
+
+module_param(smeta_copies, int, 0644);
+MODULE_PARM_DESC(smeta_copies, "number of smeta copies");
+
 struct pblk_global_caches {
 	struct kmem_cache	*ws;
 	struct kmem_cache	*rec;
@@ -977,18 +982,25 @@ static int pblk_line_meta_init(struct pblk *pblk)
 	lm->mid_thrs = lm->sec_per_line / 2;
 	lm->high_thrs = lm->sec_per_line / 4;
 	lm->meta_distance = (geo->all_luns / 2) * pblk->min_write_pgs;
+	lm->smeta_copies = smeta_copies;
+
+	if (lm->smeta_copies < 1 || lm->smeta_copies > geo->all_luns) {
+		pblk_err(pblk, "unsupported smeta copies parameter\n");
+		return -EINVAL;
+	}
 
 	/* Calculate necessary pages for smeta. See comment over struct
 	 * line_smeta definition
 	 */
-	i = 1;
+	i = lm->smeta_copies;
 add_smeta_page:
-	lm->smeta_sec = i * geo->ws_opt;
+	lm->smeta_sec = lm->smeta_copies * geo->ws_opt;
 	lm->smeta_len = lm->smeta_sec * geo->csecs;
 
 	smeta_len = sizeof(struct line_smeta) + lm->lun_bitmap_len;
+	smeta_len *= lm->smeta_copies;
 	if (smeta_len > lm->smeta_len) {
-		i++;
+		i += lm->smeta_copies;
 		goto add_smeta_page;
 	}
 
@@ -1008,10 +1020,11 @@ static int pblk_line_meta_init(struct pblk *pblk)
 
 	lm->emeta_bb = geo->all_luns > i ? geo->all_luns - i : 0;
 
-	lm->min_blk_line = 1;
-	if (geo->all_luns > 1)
+	lm->min_blk_line = lm->smeta_copies;
+	if (geo->all_luns > lm->smeta_copies) {
 		lm->min_blk_line += DIV_ROUND_UP(lm->smeta_sec +
 					lm->emeta_sec[0], geo->clba);
+	}
 
 	if (lm->min_blk_line > lm->blk_per_line) {
 		pblk_err(pblk, "config. not supported. Min. LUN in line:%d\n",
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 2132260..4e4db38 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -672,7 +672,7 @@ static int pblk_line_was_written(struct pblk_line *line,
 
 	if (chunk->state & NVM_CHK_ST_CLOSED ||
 	    (chunk->state & NVM_CHK_ST_OPEN
-	     && chunk->wp >= lm->smeta_sec))
+	     && chunk->wp >= (lm->smeta_sec / lm->smeta_copies)))
 		return 1;
 
 	return 0;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index a42bbfb..5d1040a 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -549,6 +549,7 @@ struct pblk_line_mgmt {
 struct pblk_line_meta {
 	unsigned int smeta_len;		/* Total length for smeta */
 	unsigned int smeta_sec;		/* Sectors needed for smeta */
+	unsigned int smeta_copies;	/* Number of smeta copies */
 
 	unsigned int emeta_len[4];	/* Lengths for emeta:
 					 *  [0]: Total
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 14/18] lightnvm: pblk: GC error handling
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (12 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 13/18] lightnvm: pblk: store multiple copies of smeta Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-18  7:39   ` Javier González
  2019-03-18 12:14   ` Hans Holmberg
  2019-03-14 16:04 ` [PATCH 15/18] lightnvm: pblk: fix in case of lack of lines Igor Konopko
                   ` (3 subsequent siblings)
  17 siblings, 2 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

Currently when there is an IO error (or similar) on GC read path, pblk
still moves this line to free state, what leads to data mismatch issue.

This patch adds a handling for such an error - after that changes this
line will be returned to closed state so it can be still in use
for reading - at least we will be able to return error status to user
when user tries to read such a data.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c | 8 ++++++++
 drivers/lightnvm/pblk-gc.c   | 5 ++---
 drivers/lightnvm/pblk-read.c | 1 -
 drivers/lightnvm/pblk.h      | 2 ++
 4 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 4d5cd99..6817f8f 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1786,6 +1786,14 @@ static void __pblk_line_put(struct pblk *pblk, struct pblk_line *line)
 
 	spin_lock(&line->lock);
 	WARN_ON(line->state != PBLK_LINESTATE_GC);
+	if (line->w_err_gc->has_gc_err) {
+		spin_unlock(&line->lock);
+		pblk_err(pblk, "line %d had errors during GC\n", line->id);
+		pblk_put_line_back(pblk, line);
+		line->w_err_gc->has_gc_err = 0;
+		return;
+	}
+
 	line->state = PBLK_LINESTATE_FREE;
 	trace_pblk_line_state(pblk_disk_name(pblk), line->id,
 					line->state);
diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index e23b192..63ee205 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -59,7 +59,7 @@ static void pblk_gc_writer_kick(struct pblk_gc *gc)
 	wake_up_process(gc->gc_writer_ts);
 }
 
-static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
+void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
 {
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct list_head *move_list;
@@ -98,8 +98,7 @@ static void pblk_gc_line_ws(struct work_struct *work)
 	/* Read from GC victim block */
 	ret = pblk_submit_read_gc(pblk, gc_rq);
 	if (ret) {
-		pblk_err(pblk, "failed GC read in line:%d (err:%d)\n",
-								line->id, ret);
+		line->w_err_gc->has_gc_err = 1;
 		goto out;
 	}
 
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 54422a2..6a77c24 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -475,7 +475,6 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 
 	if (pblk_submit_io_sync(pblk, &rqd)) {
 		ret = -EIO;
-		pblk_err(pblk, "GC read request failed\n");
 		goto err_free_bio;
 	}
 
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 5d1040a..52002f5 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -427,6 +427,7 @@ struct pblk_smeta {
 
 struct pblk_w_err_gc {
 	int has_write_err;
+	int has_gc_err;
 	__le64 *lba_list;
 };
 
@@ -909,6 +910,7 @@ void pblk_gc_free_full_lines(struct pblk *pblk);
 void pblk_gc_sysfs_state_show(struct pblk *pblk, int *gc_enabled,
 			      int *gc_active);
 int pblk_gc_sysfs_force(struct pblk *pblk, int force);
+void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line);
 
 /*
  * pblk rate limiter
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 15/18] lightnvm: pblk: fix in case of lack of lines
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (13 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 14/18] lightnvm: pblk: GC error handling Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-18  7:42   ` Javier González
  2019-03-14 16:04 ` [PATCH 16/18] lightnvm: pblk: use nvm_rq_to_ppa_list() Igor Konopko
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

In case when mapping fails (called from writer thread) due to lack of
lines, currently we are calling pblk_pipeline_stop(), which waits
for pending write IOs, so it will lead to the deadlock. Switching
to __pblk_pipeline_stop() in that case instead will fix that.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-map.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 5408e32..afc10306 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
 		pblk_line_close_meta(pblk, prev_line);
 
 		if (!line) {
-			pblk_pipeline_stop(pblk);
+			__pblk_pipeline_stop(pblk);
 			return -ENOSPC;
 		}
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 16/18] lightnvm: pblk: use nvm_rq_to_ppa_list()
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (14 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 15/18] lightnvm: pblk: fix in case of lack of lines Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-18  7:48   ` Javier González
  2019-03-14 16:04 ` [PATCH 17/18] lightnvm: allow to use full device path Igor Konopko
  2019-03-14 16:04 ` [PATCH 18/18] lightnvm: track inflight target creations Igor Konopko
  17 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This patch replaces few remaining usages of rqd->ppa_list[] with
existing nvm_rq_to_ppa_list() helpers. This is needed for theoretical
devices with ws_min/ws_opt equal to 1.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c     | 26 ++++++++++++++------------
 drivers/lightnvm/pblk-recovery.c | 13 ++++++++-----
 2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 6817f8f..7338a44 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -562,11 +562,9 @@ int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd)
 
 int pblk_submit_io_sync_sem(struct pblk *pblk, struct nvm_rq *rqd)
 {
-	struct ppa_addr *ppa_list;
+	struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
 	int ret;
 
-	ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
-
 	pblk_down_chunk(pblk, ppa_list[0]);
 	ret = pblk_submit_io_sync(pblk, rqd);
 	pblk_up_chunk(pblk, ppa_list[0]);
@@ -727,6 +725,7 @@ static int pblk_line_smeta_read_copy(struct pblk *pblk,
 	struct nvm_geo *geo = &dev->geo;
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct bio *bio;
+	struct ppa_addr *ppa_list;
 	struct nvm_rq rqd;
 	int i, ret;
 
@@ -750,6 +749,7 @@ static int pblk_line_smeta_read_copy(struct pblk *pblk,
 	rqd.opcode = NVM_OP_PREAD;
 	rqd.nr_ppas = lm->smeta_sec / lm->smeta_copies;
 	rqd.is_seq = 1;
+	ppa_list = nvm_rq_to_ppa_list(&rqd);
 
 	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
 		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line->id);
@@ -761,7 +761,7 @@ static int pblk_line_smeta_read_copy(struct pblk *pblk,
 			pos = pblk_ppa_to_pos(geo, ppa);
 		}
 
-		rqd.ppa_list[i] = ppa;
+		ppa_list[i] = ppa;
 		pblk_get_meta(pblk, rqd.meta_list, i)->lba =
 				  cpu_to_le64(ADDR_EMPTY);
 	}
@@ -812,6 +812,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line)
 	struct nvm_geo *geo = &dev->geo;
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct bio *bio;
+	struct ppa_addr *ppa_list;
 	struct nvm_rq rqd;
 	__le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
 	__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
@@ -863,6 +864,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line)
 	rqd.opcode = NVM_OP_PWRITE;
 	rqd.nr_ppas = smeta_cpy_sec;
 	rqd.is_seq = 1;
+	ppa_list = nvm_rq_to_ppa_list(&rqd);
 
 	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
 		void *meta_list = rqd.meta_list;
@@ -875,7 +877,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line)
 			pos = pblk_ppa_to_pos(geo, ppa);
 		}
 
-		rqd.ppa_list[i] = ppa;
+		ppa_list[i] = ppa;
 		pblk_get_meta(pblk, meta_list, i)->lba = addr_empty;
 		lba_list[paddr] = addr_empty;
 	}
@@ -911,8 +913,9 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 	struct nvm_geo *geo = &dev->geo;
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct pblk_line_meta *lm = &pblk->lm;
-	void *ppa_list, *meta_list;
+	void *ppa_list_buf, *meta_list;
 	struct bio *bio;
+	struct ppa_addr *ppa_list;
 	struct nvm_rq rqd;
 	u64 paddr = line->emeta_ssec;
 	dma_addr_t dma_ppa_list, dma_meta_list;
@@ -928,7 +931,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 	if (!meta_list)
 		return -ENOMEM;
 
-	ppa_list = meta_list + pblk_dma_meta_size(pblk);
+	ppa_list_buf = meta_list + pblk_dma_meta_size(pblk);
 	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
 next_rq:
@@ -949,11 +952,12 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 
 	rqd.bio = bio;
 	rqd.meta_list = meta_list;
-	rqd.ppa_list = ppa_list;
+	rqd.ppa_list = ppa_list_buf;
 	rqd.dma_meta_list = dma_meta_list;
 	rqd.dma_ppa_list = dma_ppa_list;
 	rqd.opcode = NVM_OP_PREAD;
 	rqd.nr_ppas = rq_ppas;
+	ppa_list = nvm_rq_to_ppa_list(&rqd);
 
 	for (i = 0; i < rqd.nr_ppas; ) {
 		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line_id);
@@ -981,7 +985,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 		}
 
 		for (j = 0; j < min; j++, i++, paddr++)
-			rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line_id);
+			ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line_id);
 	}
 
 	ret = pblk_submit_io_sync(pblk, &rqd);
@@ -1608,11 +1612,9 @@ void pblk_ppa_to_line_put(struct pblk *pblk, struct ppa_addr ppa)
 
 void pblk_rq_to_line_put(struct pblk *pblk, struct nvm_rq *rqd)
 {
-	struct ppa_addr *ppa_list;
+	struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
 	int i;
 
-	ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
-
 	for (i = 0; i < rqd->nr_ppas; i++)
 		pblk_ppa_to_line_put(pblk, ppa_list[i]);
 }
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 4e4db38..4051b93 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -185,6 +185,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	struct pblk_pad_rq *pad_rq;
 	struct nvm_rq *rqd;
 	struct bio *bio;
+	struct ppa_addr *ppa_list;
 	void *data;
 	__le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
 	u64 w_ptr = line->cur_sec;
@@ -245,6 +246,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	rqd->end_io = pblk_end_io_recov;
 	rqd->private = pad_rq;
 
+	ppa_list = nvm_rq_to_ppa_list(rqd);
 	meta_list = rqd->meta_list;
 
 	for (i = 0; i < rqd->nr_ppas; ) {
@@ -272,17 +274,17 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 			lba_list[w_ptr] = addr_empty;
 			meta = pblk_get_meta(pblk, meta_list, i);
 			meta->lba = addr_empty;
-			rqd->ppa_list[i] = dev_ppa;
+			ppa_list[i] = dev_ppa;
 		}
 	}
 
 	kref_get(&pad_rq->ref);
-	pblk_down_chunk(pblk, rqd->ppa_list[0]);
+	pblk_down_chunk(pblk, ppa_list[0]);
 
 	ret = pblk_submit_io(pblk, rqd);
 	if (ret) {
 		pblk_err(pblk, "I/O submission failed: %d\n", ret);
-		pblk_up_chunk(pblk, rqd->ppa_list[0]);
+		pblk_up_chunk(pblk, ppa_list[0]);
 		kref_put(&pad_rq->ref, pblk_recov_complete);
 		pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
 		bio_put(bio);
@@ -426,6 +428,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 	rqd->ppa_list = ppa_list;
 	rqd->dma_ppa_list = dma_ppa_list;
 	rqd->dma_meta_list = dma_meta_list;
+	ppa_list = nvm_rq_to_ppa_list(rqd);
 
 	if (pblk_io_aligned(pblk, rq_ppas))
 		rqd->is_seq = 1;
@@ -444,7 +447,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 		}
 
 		for (j = 0; j < pblk->min_write_pgs; j++, i++)
-			rqd->ppa_list[i] =
+			ppa_list[i] =
 				addr_to_gen_ppa(pblk, paddr + j, line->id);
 	}
 
@@ -500,7 +503,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 			continue;
 
 		line->nr_valid_lbas++;
-		pblk_update_map(pblk, lba, rqd->ppa_list[i]);
+		pblk_update_map(pblk, lba, ppa_list[i]);
 	}
 
 	left_ppas -= rq_ppas;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 17/18] lightnvm: allow to use full device path
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (15 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 16/18] lightnvm: pblk: use nvm_rq_to_ppa_list() Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  2019-03-18  7:49   ` Javier González
  2019-03-18 10:28   ` Hans Holmberg
  2019-03-14 16:04 ` [PATCH 18/18] lightnvm: track inflight target creations Igor Konopko
  17 siblings, 2 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This patch adds the possibility to provide full device path (like
/dev/nvme0n1) when specifying device on top of which pblk instance
should be created/removed.

This makes creation of targets from nvme-cli (or other ioctl based
tools) more unified with other commands in comparison with current
situation where almost all commands uses full device path with except
of lightnvm creation/removal parameter which uses just 'nvme0n1'
naming convention. After this changes both approach will be valid.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/core.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index c01f83b..838c3d8 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -1195,6 +1195,21 @@ void nvm_unregister(struct nvm_dev *dev)
 }
 EXPORT_SYMBOL(nvm_unregister);
 
+#define PREFIX_STR "/dev/"
+static void nvm_normalize_path(char *path)
+{
+	path[DISK_NAME_LEN - 1] = '\0';
+	if (!memcmp(PREFIX_STR, path,
+				sizeof(char) * strlen(PREFIX_STR))) {
+		/*
+		 * User provide name in '/dev/nvme0n1' format,
+		 * so we need to skip '/dev/' for comparison
+		 */
+		memmove(path, path + sizeof(char) * strlen(PREFIX_STR),
+			(DISK_NAME_LEN - strlen(PREFIX_STR)) * sizeof(char));
+	}
+}
+
 static int __nvm_configure_create(struct nvm_ioctl_create *create)
 {
 	struct nvm_dev *dev;
@@ -1304,9 +1319,9 @@ static long nvm_ioctl_dev_create(struct file *file, void __user *arg)
 		return -EINVAL;
 	}
 
-	create.dev[DISK_NAME_LEN - 1] = '\0';
+	nvm_normalize_path(create.dev);
+	nvm_normalize_path(create.tgtname);
 	create.tgttype[NVM_TTYPE_NAME_MAX - 1] = '\0';
-	create.tgtname[DISK_NAME_LEN - 1] = '\0';
 
 	if (create.flags != 0) {
 		__u32 flags = create.flags;
@@ -1333,7 +1348,7 @@ static long nvm_ioctl_dev_remove(struct file *file, void __user *arg)
 	if (copy_from_user(&remove, arg, sizeof(struct nvm_ioctl_remove)))
 		return -EFAULT;
 
-	remove.tgtname[DISK_NAME_LEN - 1] = '\0';
+	nvm_normalize_path(remove.tgtname);
 
 	if (remove.flags != 0) {
 		pr_err("nvm: no flags supported\n");
@@ -1373,8 +1388,6 @@ static long nvm_ioctl_dev_factory(struct file *file, void __user *arg)
 	if (copy_from_user(&fact, arg, sizeof(struct nvm_ioctl_dev_factory)))
 		return -EFAULT;
 
-	fact.dev[DISK_NAME_LEN - 1] = '\0';
-
 	if (fact.flags & ~(NVM_FACTORY_NR_BITS - 1))
 		return -EINVAL;
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 18/18] lightnvm: track inflight target creations
  2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (16 preceding siblings ...)
  2019-03-14 16:04 ` [PATCH 17/18] lightnvm: allow to use full device path Igor Konopko
@ 2019-03-14 16:04 ` Igor Konopko
  17 siblings, 0 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-14 16:04 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This patch adds a counter, which is responsible for tracking inflight
target creations.

When creation process is still in progress, target is not yet on
targets list. This causes a chance for removing whole lightnvm
subsystem by calling nvm_unregister() in the meantime and finally by
causing kernel panic inside target init function.

With this patch we are able to track such a scenarios and wait with
completing nvm_unregister() and freeing memory until target creation
will be completed.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/core.c  | 19 ++++++++++++++++++-
 include/linux/lightnvm.h |  2 ++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 838c3d8..490ec6e 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -1083,6 +1083,7 @@ static int nvm_core_init(struct nvm_dev *dev)
 	INIT_LIST_HEAD(&dev->targets);
 	mutex_init(&dev->mlock);
 	spin_lock_init(&dev->lock);
+	dev->create_inflight = 0;
 
 	ret = nvm_register_map(dev);
 	if (ret)
@@ -1180,6 +1181,11 @@ void nvm_unregister(struct nvm_dev *dev)
 	struct nvm_target *t, *tmp;
 
 	mutex_lock(&dev->mlock);
+	while (dev->create_inflight > 0) {
+		mutex_unlock(&dev->mlock);
+		io_schedule();
+		mutex_lock(&dev->mlock);
+	}
 	list_for_each_entry_safe(t, tmp, &dev->targets, list) {
 		if (t->dev->parent != dev)
 			continue;
@@ -1213,6 +1219,7 @@ static void nvm_normalize_path(char *path)
 static int __nvm_configure_create(struct nvm_ioctl_create *create)
 {
 	struct nvm_dev *dev;
+	int ret;
 
 	down_write(&nvm_lock);
 	dev = nvm_find_nvm_dev(create->dev);
@@ -1223,7 +1230,17 @@ static int __nvm_configure_create(struct nvm_ioctl_create *create)
 		return -EINVAL;
 	}
 
-	return nvm_create_tgt(dev, create);
+	mutex_lock(&dev->mlock);
+	dev->create_inflight++;
+	mutex_unlock(&dev->mlock);
+
+	ret = nvm_create_tgt(dev, create);
+
+	mutex_lock(&dev->mlock);
+	dev->create_inflight--;
+	mutex_unlock(&dev->mlock);
+
+	return ret;
 }
 
 static long nvm_ioctl_info(struct file *file, void __user *arg)
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index d3b0270..e462d1d 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -428,6 +428,8 @@ struct nvm_dev {
 	char name[DISK_NAME_LEN];
 	void *private_data;
 
+	int create_inflight;
+
 	void *rmap;
 
 	struct mutex mlock;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH 03/18] lightnvm: pblk: simplify partial read path
  2019-03-14 16:04 ` [PATCH 03/18] lightnvm: pblk: simplify partial read path Igor Konopko
@ 2019-03-14 21:35   ` Heiner Litz
  2019-03-15  9:52     ` Igor Konopko
  0 siblings, 1 reply; 69+ messages in thread
From: Heiner Litz @ 2019-03-14 21:35 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Matias Bjørling, Javier González, Hans Holmberg, linux-block

On Thu, Mar 14, 2019 at 9:07 AM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
> This patch changes the approach to handling partial read path.
>
> In old approach merging of data from round buffer and drive was fully
> made by drive. This had some disadvantages - code was complex and
> relies on bio internals, so it was hard to maintain and was strongly
> dependent on bio changes.
>
> In new approach most of the handling is done mostly by block layer
> functions such as bio_split(), bio_chain() and generic_make request()
> and generally is less complex and easier to maintain. Below some more
> details of the new approach.
>
> When read bio arrives, it is cloned for pblk internal purposes. All
> the L2P mapping, which includes copying data from round buffer to bio
> and thus bio_advance() calls is done on the cloned bio, so the original
> bio is untouched. Later if we found that we have partial read case, we
> still have original bio untouched, so we can split it and continue to
> process only first 4K of it in current context, when the rest will be
> called as separate bio request which is passed to generic_make_request()
> for further processing.
>
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>  drivers/lightnvm/pblk-read.c | 242 ++++++++-----------------------------------
>  drivers/lightnvm/pblk.h      |  12 ---
>  2 files changed, 41 insertions(+), 213 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 6569746..54422a2 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -222,171 +222,6 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>         __pblk_end_io_read(pblk, rqd, true);
>  }
>
> -static void pblk_end_partial_read(struct nvm_rq *rqd)
> -{
> -       struct pblk *pblk = rqd->private;
> -       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
> -       struct pblk_pr_ctx *pr_ctx = r_ctx->private;
> -       struct pblk_sec_meta *meta;
> -       struct bio *new_bio = rqd->bio;
> -       struct bio *bio = pr_ctx->orig_bio;
> -       struct bio_vec src_bv, dst_bv;
> -       void *meta_list = rqd->meta_list;
> -       int bio_init_idx = pr_ctx->bio_init_idx;
> -       unsigned long *read_bitmap = pr_ctx->bitmap;
> -       int nr_secs = pr_ctx->orig_nr_secs;
> -       int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
> -       void *src_p, *dst_p;
> -       int hole, i;
> -
> -       if (unlikely(nr_holes == 1)) {
> -               struct ppa_addr ppa;
> -
> -               ppa = rqd->ppa_addr;
> -               rqd->ppa_list = pr_ctx->ppa_ptr;
> -               rqd->dma_ppa_list = pr_ctx->dma_ppa_list;
> -               rqd->ppa_list[0] = ppa;
> -       }
> -
> -       for (i = 0; i < nr_secs; i++) {
> -               meta = pblk_get_meta(pblk, meta_list, i);
> -               pr_ctx->lba_list_media[i] = le64_to_cpu(meta->lba);
> -               meta->lba = cpu_to_le64(pr_ctx->lba_list_mem[i]);
> -       }
> -
> -       /* Fill the holes in the original bio */
> -       i = 0;
> -       hole = find_first_zero_bit(read_bitmap, nr_secs);
> -       do {
> -               struct pblk_line *line;
> -
> -               line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
> -               kref_put(&line->ref, pblk_line_put);
> -
> -               meta = pblk_get_meta(pblk, meta_list, hole);
> -               meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]);
> -
> -               src_bv = new_bio->bi_io_vec[i++];
> -               dst_bv = bio->bi_io_vec[bio_init_idx + hole];
> -
> -               src_p = kmap_atomic(src_bv.bv_page);
> -               dst_p = kmap_atomic(dst_bv.bv_page);
> -
> -               memcpy(dst_p + dst_bv.bv_offset,
> -                       src_p + src_bv.bv_offset,
> -                       PBLK_EXPOSED_PAGE_SIZE);
> -
> -               kunmap_atomic(src_p);
> -               kunmap_atomic(dst_p);
> -
> -               mempool_free(src_bv.bv_page, &pblk->page_bio_pool);
> -
> -               hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1);
> -       } while (hole < nr_secs);
> -
> -       bio_put(new_bio);
> -       kfree(pr_ctx);
> -
> -       /* restore original request */
> -       rqd->bio = NULL;
> -       rqd->nr_ppas = nr_secs;
> -
> -       pblk_end_user_read(bio, rqd->error);
> -       __pblk_end_io_read(pblk, rqd, false);
> -}
> -
> -static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
> -                           unsigned int bio_init_idx,
> -                           unsigned long *read_bitmap,
> -                           int nr_holes)
> -{
> -       void *meta_list = rqd->meta_list;
> -       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
> -       struct pblk_pr_ctx *pr_ctx;
> -       struct bio *new_bio, *bio = r_ctx->private;
> -       int nr_secs = rqd->nr_ppas;
> -       int i;
> -
> -       new_bio = bio_alloc(GFP_KERNEL, nr_holes);
> -
> -       if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
> -               goto fail_bio_put;
> -
> -       if (nr_holes != new_bio->bi_vcnt) {
> -               WARN_ONCE(1, "pblk: malformed bio\n");
> -               goto fail_free_pages;
> -       }
> -
> -       pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
> -       if (!pr_ctx)
> -               goto fail_free_pages;
> -
> -       for (i = 0; i < nr_secs; i++) {
> -               struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
> -
> -               pr_ctx->lba_list_mem[i] = le64_to_cpu(meta->lba);
> -       }
> -
> -       new_bio->bi_iter.bi_sector = 0; /* internal bio */
> -       bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
> -
> -       rqd->bio = new_bio;
> -       rqd->nr_ppas = nr_holes;
> -
> -       pr_ctx->orig_bio = bio;
> -       bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
> -       pr_ctx->bio_init_idx = bio_init_idx;
> -       pr_ctx->orig_nr_secs = nr_secs;
> -       r_ctx->private = pr_ctx;
> -
> -       if (unlikely(nr_holes == 1)) {
> -               pr_ctx->ppa_ptr = rqd->ppa_list;
> -               pr_ctx->dma_ppa_list = rqd->dma_ppa_list;
> -               rqd->ppa_addr = rqd->ppa_list[0];
> -       }
> -       return 0;
> -
> -fail_free_pages:
> -       pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
> -fail_bio_put:
> -       bio_put(new_bio);
> -
> -       return -ENOMEM;
> -}
> -
> -static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
> -                                unsigned int bio_init_idx,
> -                                unsigned long *read_bitmap, int nr_secs)
> -{
> -       int nr_holes;
> -       int ret;
> -
> -       nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
> -
> -       if (pblk_setup_partial_read(pblk, rqd, bio_init_idx, read_bitmap,
> -                                   nr_holes))
> -               return NVM_IO_ERR;
> -
> -       rqd->end_io = pblk_end_partial_read;
> -
> -       ret = pblk_submit_io(pblk, rqd);
> -       if (ret) {
> -               bio_put(rqd->bio);
> -               pblk_err(pblk, "partial read IO submission failed\n");
> -               goto err;
> -       }
> -
> -       return NVM_IO_OK;
> -
> -err:
> -       pblk_err(pblk, "failed to perform partial read\n");
> -
> -       /* Free allocated pages in new bio */
> -       pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt);
> -       __pblk_end_io_read(pblk, rqd, false);
> -       return NVM_IO_ERR;
> -}
> -
>  static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>                          sector_t lba, unsigned long *read_bitmap)
>  {
> @@ -432,11 +267,11 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
>  {
>         struct nvm_tgt_dev *dev = pblk->dev;
>         struct request_queue *q = dev->q;
> +       struct bio *split_bio, *int_bio;
>         sector_t blba = pblk_get_lba(bio);
>         unsigned int nr_secs = pblk_get_secs(bio);
>         struct pblk_g_ctx *r_ctx;
>         struct nvm_rq *rqd;
> -       unsigned int bio_init_idx;
>         DECLARE_BITMAP(read_bitmap, NVM_MAX_VLBA);
>         int ret = NVM_IO_ERR;
>
> @@ -456,61 +291,66 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
>         r_ctx = nvm_rq_to_pdu(rqd);
>         r_ctx->start_time = jiffies;
>         r_ctx->lba = blba;
> -       r_ctx->private = bio; /* original bio */
>
> -       /* Save the index for this bio's start. This is needed in case
> -        * we need to fill a partial read.
> +       /* Clone read bio to deal with:
> +        * -usage of bio_advance() when memcpy data from round buffer
> +        * -read errors in case of reading from device
>          */
> -       bio_init_idx = pblk_get_bi_idx(bio);
> +       int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
> +       if (!int_bio)
> +               return NVM_IO_ERR;
>
>         if (pblk_alloc_rqd_meta(pblk, rqd))
>                 goto fail_rqd_free;
>
>         if (nr_secs > 1)
> -               pblk_read_ppalist_rq(pblk, rqd, bio, blba, read_bitmap);
> +               pblk_read_ppalist_rq(pblk, rqd, int_bio, blba, read_bitmap);
>         else
> -               pblk_read_rq(pblk, rqd, bio, blba, read_bitmap);
> +               pblk_read_rq(pblk, rqd, int_bio, blba, read_bitmap);
> +
> +split_retry:
> +       r_ctx->private = bio; /* original bio */
>
> -       if (bitmap_full(read_bitmap, nr_secs)) {
> +       if (bitmap_full(read_bitmap, rqd->nr_ppas)) {
> +               bio_put(int_bio);
>                 atomic_inc(&pblk->inflight_io);
>                 __pblk_end_io_read(pblk, rqd, false);
>                 return NVM_IO_DONE;
>         }
>
> -       /* All sectors are to be read from the device */
> -       if (bitmap_empty(read_bitmap, rqd->nr_ppas)) {
> -               struct bio *int_bio = NULL;
> +       if (!bitmap_empty(read_bitmap, rqd->nr_ppas)) {
> +               /* The read bio request could be partially filled by the write
> +                * buffer, but there are some holes that need to be read from
> +                * the drive. In order to handle this, we will use block layer
> +                * mechanism to split this request in to smaller ones.
> +                */
> +               split_bio = bio_split(bio, NR_PHY_IN_LOG, GFP_KERNEL,
> +                                       &pblk_bio_set);

This is quite inefficient. If you have an rqd with 63 sectors on the device and
the 64th is cached, you are splitting 63 times whereas a single split
would be sufficient.

> +               bio_chain(split_bio, bio);

I am not sure if it's needed but in blk_queue_split() these flags are set
before making the request:
        split->bi_opf |= REQ_NOMERGE;
        bio_set_flag(*bio, BIO_QUEUE_ENTERED);

> +               generic_make_request(bio);

pblk_lookup_l2p_seq() increments line->ref. You have to release the krefs before
requeing the request.

You might consider introducing a pblk_lookup_l2p_uncached() which returns when
it finds a cached sector and returns its index. Doing so you can avoid obtaining
superfluous line->refs and we could also get rid of the read_bitmap
entirely. This
index could also be used to split the bio at the right position.

> +
> +               /* Retry after split with new bio and existing rqd */
> +               bio = split_bio;
> +               rqd->nr_ppas = 1;
> +               rqd->ppa_addr = rqd->ppa_list[0];
>
> -               /* Clone read bio to deal with read errors internally */
> +               /* Recreate int_bio for the retry needs */
> +               bio_put(int_bio);
>                 int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
> -               if (!int_bio) {
> -                       pblk_err(pblk, "could not clone read bio\n");
> -                       goto fail_end_io;
> -               }
> -
> -               rqd->bio = int_bio;
> -
> -               if (pblk_submit_io(pblk, rqd)) {
> -                       pblk_err(pblk, "read IO submission failed\n");
> -                       ret = NVM_IO_ERR;
> -                       goto fail_end_io;
> -               }
> -
> -               return NVM_IO_OK;
> +               if (!int_bio)
> +                       return NVM_IO_ERR;
> +               goto split_retry;
>         }
>
> -       /* The read bio request could be partially filled by the write buffer,
> -        * but there are some holes that need to be read from the drive.
> -        */
> -       ret = pblk_partial_read_bio(pblk, rqd, bio_init_idx, read_bitmap,
> -                                   nr_secs);
> -       if (ret)
> -               goto fail_meta_free;
> -
> +       /* All sectors are to be read from the device */
> +       rqd->bio = int_bio;
> +       if (pblk_submit_io(pblk, rqd)) {
> +               pblk_err(pblk, "read IO submission failed\n");
> +               ret = NVM_IO_ERR;
> +               goto fail_end_io;
> +       }
>         return NVM_IO_OK;
>
> -fail_meta_free:
> -       nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
>  fail_rqd_free:
>         pblk_free_rqd(pblk, rqd, PBLK_READ);
>         return ret;
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 381f074..0a85990 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -123,18 +123,6 @@ struct pblk_g_ctx {
>         u64 lba;
>  };
>
> -/* partial read context */
> -struct pblk_pr_ctx {
> -       struct bio *orig_bio;
> -       DECLARE_BITMAP(bitmap, NVM_MAX_VLBA);
> -       unsigned int orig_nr_secs;
> -       unsigned int bio_init_idx;
> -       void *ppa_ptr;
> -       dma_addr_t dma_ppa_list;
> -       u64 lba_list_mem[NVM_MAX_VLBA];
> -       u64 lba_list_media[NVM_MAX_VLBA];
> -};
> -
>  /* Pad context */
>  struct pblk_pad_rq {
>         struct pblk *pblk;
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 03/18] lightnvm: pblk: simplify partial read path
  2019-03-14 21:35   ` Heiner Litz
@ 2019-03-15  9:52     ` Igor Konopko
  2019-03-16 22:28       ` Javier González
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-15  9:52 UTC (permalink / raw)
  To: Heiner Litz
  Cc: Matias Bjørling, Javier González, Hans Holmberg, linux-block



On 14.03.2019 22:35, Heiner Litz wrote:
> On Thu, Mar 14, 2019 at 9:07 AM Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>> This patch changes the approach to handling partial read path.
>>
>> In old approach merging of data from round buffer and drive was fully
>> made by drive. This had some disadvantages - code was complex and
>> relies on bio internals, so it was hard to maintain and was strongly
>> dependent on bio changes.
>>
>> In new approach most of the handling is done mostly by block layer
>> functions such as bio_split(), bio_chain() and generic_make request()
>> and generally is less complex and easier to maintain. Below some more
>> details of the new approach.
>>
>> When read bio arrives, it is cloned for pblk internal purposes. All
>> the L2P mapping, which includes copying data from round buffer to bio
>> and thus bio_advance() calls is done on the cloned bio, so the original
>> bio is untouched. Later if we found that we have partial read case, we
>> still have original bio untouched, so we can split it and continue to
>> process only first 4K of it in current context, when the rest will be
>> called as separate bio request which is passed to generic_make_request()
>> for further processing.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>>   drivers/lightnvm/pblk-read.c | 242 ++++++++-----------------------------------
>>   drivers/lightnvm/pblk.h      |  12 ---
>>   2 files changed, 41 insertions(+), 213 deletions(-)
>>
>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>> index 6569746..54422a2 100644
>> --- a/drivers/lightnvm/pblk-read.c
>> +++ b/drivers/lightnvm/pblk-read.c
>> @@ -222,171 +222,6 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>          __pblk_end_io_read(pblk, rqd, true);
>>   }
>>
>> -static void pblk_end_partial_read(struct nvm_rq *rqd)
>> -{
>> -       struct pblk *pblk = rqd->private;
>> -       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>> -       struct pblk_pr_ctx *pr_ctx = r_ctx->private;
>> -       struct pblk_sec_meta *meta;
>> -       struct bio *new_bio = rqd->bio;
>> -       struct bio *bio = pr_ctx->orig_bio;
>> -       struct bio_vec src_bv, dst_bv;
>> -       void *meta_list = rqd->meta_list;
>> -       int bio_init_idx = pr_ctx->bio_init_idx;
>> -       unsigned long *read_bitmap = pr_ctx->bitmap;
>> -       int nr_secs = pr_ctx->orig_nr_secs;
>> -       int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
>> -       void *src_p, *dst_p;
>> -       int hole, i;
>> -
>> -       if (unlikely(nr_holes == 1)) {
>> -               struct ppa_addr ppa;
>> -
>> -               ppa = rqd->ppa_addr;
>> -               rqd->ppa_list = pr_ctx->ppa_ptr;
>> -               rqd->dma_ppa_list = pr_ctx->dma_ppa_list;
>> -               rqd->ppa_list[0] = ppa;
>> -       }
>> -
>> -       for (i = 0; i < nr_secs; i++) {
>> -               meta = pblk_get_meta(pblk, meta_list, i);
>> -               pr_ctx->lba_list_media[i] = le64_to_cpu(meta->lba);
>> -               meta->lba = cpu_to_le64(pr_ctx->lba_list_mem[i]);
>> -       }
>> -
>> -       /* Fill the holes in the original bio */
>> -       i = 0;
>> -       hole = find_first_zero_bit(read_bitmap, nr_secs);
>> -       do {
>> -               struct pblk_line *line;
>> -
>> -               line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
>> -               kref_put(&line->ref, pblk_line_put);
>> -
>> -               meta = pblk_get_meta(pblk, meta_list, hole);
>> -               meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]);
>> -
>> -               src_bv = new_bio->bi_io_vec[i++];
>> -               dst_bv = bio->bi_io_vec[bio_init_idx + hole];
>> -
>> -               src_p = kmap_atomic(src_bv.bv_page);
>> -               dst_p = kmap_atomic(dst_bv.bv_page);
>> -
>> -               memcpy(dst_p + dst_bv.bv_offset,
>> -                       src_p + src_bv.bv_offset,
>> -                       PBLK_EXPOSED_PAGE_SIZE);
>> -
>> -               kunmap_atomic(src_p);
>> -               kunmap_atomic(dst_p);
>> -
>> -               mempool_free(src_bv.bv_page, &pblk->page_bio_pool);
>> -
>> -               hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1);
>> -       } while (hole < nr_secs);
>> -
>> -       bio_put(new_bio);
>> -       kfree(pr_ctx);
>> -
>> -       /* restore original request */
>> -       rqd->bio = NULL;
>> -       rqd->nr_ppas = nr_secs;
>> -
>> -       pblk_end_user_read(bio, rqd->error);
>> -       __pblk_end_io_read(pblk, rqd, false);
>> -}
>> -
>> -static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>> -                           unsigned int bio_init_idx,
>> -                           unsigned long *read_bitmap,
>> -                           int nr_holes)
>> -{
>> -       void *meta_list = rqd->meta_list;
>> -       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>> -       struct pblk_pr_ctx *pr_ctx;
>> -       struct bio *new_bio, *bio = r_ctx->private;
>> -       int nr_secs = rqd->nr_ppas;
>> -       int i;
>> -
>> -       new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>> -
>> -       if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
>> -               goto fail_bio_put;
>> -
>> -       if (nr_holes != new_bio->bi_vcnt) {
>> -               WARN_ONCE(1, "pblk: malformed bio\n");
>> -               goto fail_free_pages;
>> -       }
>> -
>> -       pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
>> -       if (!pr_ctx)
>> -               goto fail_free_pages;
>> -
>> -       for (i = 0; i < nr_secs; i++) {
>> -               struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
>> -
>> -               pr_ctx->lba_list_mem[i] = le64_to_cpu(meta->lba);
>> -       }
>> -
>> -       new_bio->bi_iter.bi_sector = 0; /* internal bio */
>> -       bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>> -
>> -       rqd->bio = new_bio;
>> -       rqd->nr_ppas = nr_holes;
>> -
>> -       pr_ctx->orig_bio = bio;
>> -       bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
>> -       pr_ctx->bio_init_idx = bio_init_idx;
>> -       pr_ctx->orig_nr_secs = nr_secs;
>> -       r_ctx->private = pr_ctx;
>> -
>> -       if (unlikely(nr_holes == 1)) {
>> -               pr_ctx->ppa_ptr = rqd->ppa_list;
>> -               pr_ctx->dma_ppa_list = rqd->dma_ppa_list;
>> -               rqd->ppa_addr = rqd->ppa_list[0];
>> -       }
>> -       return 0;
>> -
>> -fail_free_pages:
>> -       pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
>> -fail_bio_put:
>> -       bio_put(new_bio);
>> -
>> -       return -ENOMEM;
>> -}
>> -
>> -static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
>> -                                unsigned int bio_init_idx,
>> -                                unsigned long *read_bitmap, int nr_secs)
>> -{
>> -       int nr_holes;
>> -       int ret;
>> -
>> -       nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
>> -
>> -       if (pblk_setup_partial_read(pblk, rqd, bio_init_idx, read_bitmap,
>> -                                   nr_holes))
>> -               return NVM_IO_ERR;
>> -
>> -       rqd->end_io = pblk_end_partial_read;
>> -
>> -       ret = pblk_submit_io(pblk, rqd);
>> -       if (ret) {
>> -               bio_put(rqd->bio);
>> -               pblk_err(pblk, "partial read IO submission failed\n");
>> -               goto err;
>> -       }
>> -
>> -       return NVM_IO_OK;
>> -
>> -err:
>> -       pblk_err(pblk, "failed to perform partial read\n");
>> -
>> -       /* Free allocated pages in new bio */
>> -       pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt);
>> -       __pblk_end_io_read(pblk, rqd, false);
>> -       return NVM_IO_ERR;
>> -}
>> -
>>   static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>                           sector_t lba, unsigned long *read_bitmap)
>>   {
>> @@ -432,11 +267,11 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
>>   {
>>          struct nvm_tgt_dev *dev = pblk->dev;
>>          struct request_queue *q = dev->q;
>> +       struct bio *split_bio, *int_bio;
>>          sector_t blba = pblk_get_lba(bio);
>>          unsigned int nr_secs = pblk_get_secs(bio);
>>          struct pblk_g_ctx *r_ctx;
>>          struct nvm_rq *rqd;
>> -       unsigned int bio_init_idx;
>>          DECLARE_BITMAP(read_bitmap, NVM_MAX_VLBA);
>>          int ret = NVM_IO_ERR;
>>
>> @@ -456,61 +291,66 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
>>          r_ctx = nvm_rq_to_pdu(rqd);
>>          r_ctx->start_time = jiffies;
>>          r_ctx->lba = blba;
>> -       r_ctx->private = bio; /* original bio */
>>
>> -       /* Save the index for this bio's start. This is needed in case
>> -        * we need to fill a partial read.
>> +       /* Clone read bio to deal with:
>> +        * -usage of bio_advance() when memcpy data from round buffer
>> +        * -read errors in case of reading from device
>>           */
>> -       bio_init_idx = pblk_get_bi_idx(bio);
>> +       int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
>> +       if (!int_bio)
>> +               return NVM_IO_ERR;
>>
>>          if (pblk_alloc_rqd_meta(pblk, rqd))
>>                  goto fail_rqd_free;
>>
>>          if (nr_secs > 1)
>> -               pblk_read_ppalist_rq(pblk, rqd, bio, blba, read_bitmap);
>> +               pblk_read_ppalist_rq(pblk, rqd, int_bio, blba, read_bitmap);
>>          else
>> -               pblk_read_rq(pblk, rqd, bio, blba, read_bitmap);
>> +               pblk_read_rq(pblk, rqd, int_bio, blba, read_bitmap);
>> +
>> +split_retry:
>> +       r_ctx->private = bio; /* original bio */
>>
>> -       if (bitmap_full(read_bitmap, nr_secs)) {
>> +       if (bitmap_full(read_bitmap, rqd->nr_ppas)) {
>> +               bio_put(int_bio);
>>                  atomic_inc(&pblk->inflight_io);
>>                  __pblk_end_io_read(pblk, rqd, false);
>>                  return NVM_IO_DONE;
>>          }
>>
>> -       /* All sectors are to be read from the device */
>> -       if (bitmap_empty(read_bitmap, rqd->nr_ppas)) {
>> -               struct bio *int_bio = NULL;
>> +       if (!bitmap_empty(read_bitmap, rqd->nr_ppas)) {
>> +               /* The read bio request could be partially filled by the write
>> +                * buffer, but there are some holes that need to be read from
>> +                * the drive. In order to handle this, we will use block layer
>> +                * mechanism to split this request in to smaller ones.
>> +                */
>> +               split_bio = bio_split(bio, NR_PHY_IN_LOG, GFP_KERNEL,
>> +                                       &pblk_bio_set);
> 
> This is quite inefficient. If you have an rqd with 63 sectors on the device and
> the 64th is cached, you are splitting 63 times whereas a single split
> would be sufficient.

Yeah, definitely, it would be better to find how many contiguous sectors 
in cache/on drive are and splitting based on it. Will improve that in v2.

> 
>> +               bio_chain(split_bio, bio);
> 
> I am not sure if it's needed but in blk_queue_split() these flags are set
> before making the request:
>          split->bi_opf |= REQ_NOMERGE;
>          bio_set_flag(*bio, BIO_QUEUE_ENTERED);
> 
>> +               generic_make_request(bio);
> 
> pblk_lookup_l2p_seq() increments line->ref. You have to release the krefs before
> requeing the request.
> 
I completely forgot about it, will fix that of course, thanks!

> You might consider introducing a pblk_lookup_l2p_uncached() which returns when
> it finds a cached sector and returns its index. Doing so you can avoid obtaining
> superfluous line->refs and we could also get rid of the read_bitmap
> entirely. This
> index could also be used to split the bio at the right position.
> 
>> +
>> +               /* Retry after split with new bio and existing rqd */
>> +               bio = split_bio;
>> +               rqd->nr_ppas = 1;
>> +               rqd->ppa_addr = rqd->ppa_list[0];
>>
>> -               /* Clone read bio to deal with read errors internally */
>> +               /* Recreate int_bio for the retry needs */
>> +               bio_put(int_bio);
>>                  int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
>> -               if (!int_bio) {
>> -                       pblk_err(pblk, "could not clone read bio\n");
>> -                       goto fail_end_io;
>> -               }
>> -
>> -               rqd->bio = int_bio;
>> -
>> -               if (pblk_submit_io(pblk, rqd)) {
>> -                       pblk_err(pblk, "read IO submission failed\n");
>> -                       ret = NVM_IO_ERR;
>> -                       goto fail_end_io;
>> -               }
>> -
>> -               return NVM_IO_OK;
>> +               if (!int_bio)
>> +                       return NVM_IO_ERR;
>> +               goto split_retry;
>>          }
>>
>> -       /* The read bio request could be partially filled by the write buffer,
>> -        * but there are some holes that need to be read from the drive.
>> -        */
>> -       ret = pblk_partial_read_bio(pblk, rqd, bio_init_idx, read_bitmap,
>> -                                   nr_secs);
>> -       if (ret)
>> -               goto fail_meta_free;
>> -
>> +       /* All sectors are to be read from the device */
>> +       rqd->bio = int_bio;
>> +       if (pblk_submit_io(pblk, rqd)) {
>> +               pblk_err(pblk, "read IO submission failed\n");
>> +               ret = NVM_IO_ERR;
>> +               goto fail_end_io;
>> +       }
>>          return NVM_IO_OK;
>>
>> -fail_meta_free:
>> -       nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
>>   fail_rqd_free:
>>          pblk_free_rqd(pblk, rqd, PBLK_READ);
>>          return ret;
>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>> index 381f074..0a85990 100644
>> --- a/drivers/lightnvm/pblk.h
>> +++ b/drivers/lightnvm/pblk.h
>> @@ -123,18 +123,6 @@ struct pblk_g_ctx {
>>          u64 lba;
>>   };
>>
>> -/* partial read context */
>> -struct pblk_pr_ctx {
>> -       struct bio *orig_bio;
>> -       DECLARE_BITMAP(bitmap, NVM_MAX_VLBA);
>> -       unsigned int orig_nr_secs;
>> -       unsigned int bio_init_idx;
>> -       void *ppa_ptr;
>> -       dma_addr_t dma_ppa_list;
>> -       u64 lba_list_mem[NVM_MAX_VLBA];
>> -       u64 lba_list_media[NVM_MAX_VLBA];
>> -};
>> -
>>   /* Pad context */
>>   struct pblk_pad_rq {
>>          struct pblk *pblk;
>> --
>> 2.9.5
>>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 03/18] lightnvm: pblk: simplify partial read path
  2019-03-15  9:52     ` Igor Konopko
@ 2019-03-16 22:28       ` Javier González
  2019-03-18 12:44         ` Igor Konopko
  0 siblings, 1 reply; 69+ messages in thread
From: Javier González @ 2019-03-16 22:28 UTC (permalink / raw)
  To: Konopko, Igor J
  Cc: Heiner Litz, Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 12566 bytes --]

> On 15 Mar 2019, at 02.52, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> 
> 
> On 14.03.2019 22:35, Heiner Litz wrote:
>> On Thu, Mar 14, 2019 at 9:07 AM Igor Konopko <igor.j.konopko@intel.com> wrote:
>>> This patch changes the approach to handling partial read path.
>>> 
>>> In old approach merging of data from round buffer and drive was fully
>>> made by drive. This had some disadvantages - code was complex and
>>> relies on bio internals, so it was hard to maintain and was strongly
>>> dependent on bio changes.
>>> 
>>> In new approach most of the handling is done mostly by block layer
>>> functions such as bio_split(), bio_chain() and generic_make request()
>>> and generally is less complex and easier to maintain. Below some more
>>> details of the new approach.
>>> 
>>> When read bio arrives, it is cloned for pblk internal purposes. All
>>> the L2P mapping, which includes copying data from round buffer to bio
>>> and thus bio_advance() calls is done on the cloned bio, so the original
>>> bio is untouched. Later if we found that we have partial read case, we
>>> still have original bio untouched, so we can split it and continue to
>>> process only first 4K of it in current context, when the rest will be
>>> called as separate bio request which is passed to generic_make_request()
>>> for further processing.
>>> 
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>>  drivers/lightnvm/pblk-read.c | 242 ++++++++-----------------------------------
>>>  drivers/lightnvm/pblk.h      |  12 ---
>>>  2 files changed, 41 insertions(+), 213 deletions(-)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>> index 6569746..54422a2 100644
>>> --- a/drivers/lightnvm/pblk-read.c
>>> +++ b/drivers/lightnvm/pblk-read.c
>>> @@ -222,171 +222,6 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>         __pblk_end_io_read(pblk, rqd, true);
>>>  }
>>> 
>>> -static void pblk_end_partial_read(struct nvm_rq *rqd)
>>> -{
>>> -       struct pblk *pblk = rqd->private;
>>> -       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>> -       struct pblk_pr_ctx *pr_ctx = r_ctx->private;
>>> -       struct pblk_sec_meta *meta;
>>> -       struct bio *new_bio = rqd->bio;
>>> -       struct bio *bio = pr_ctx->orig_bio;
>>> -       struct bio_vec src_bv, dst_bv;
>>> -       void *meta_list = rqd->meta_list;
>>> -       int bio_init_idx = pr_ctx->bio_init_idx;
>>> -       unsigned long *read_bitmap = pr_ctx->bitmap;
>>> -       int nr_secs = pr_ctx->orig_nr_secs;
>>> -       int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
>>> -       void *src_p, *dst_p;
>>> -       int hole, i;
>>> -
>>> -       if (unlikely(nr_holes == 1)) {
>>> -               struct ppa_addr ppa;
>>> -
>>> -               ppa = rqd->ppa_addr;
>>> -               rqd->ppa_list = pr_ctx->ppa_ptr;
>>> -               rqd->dma_ppa_list = pr_ctx->dma_ppa_list;
>>> -               rqd->ppa_list[0] = ppa;
>>> -       }
>>> -
>>> -       for (i = 0; i < nr_secs; i++) {
>>> -               meta = pblk_get_meta(pblk, meta_list, i);
>>> -               pr_ctx->lba_list_media[i] = le64_to_cpu(meta->lba);
>>> -               meta->lba = cpu_to_le64(pr_ctx->lba_list_mem[i]);
>>> -       }
>>> -
>>> -       /* Fill the holes in the original bio */
>>> -       i = 0;
>>> -       hole = find_first_zero_bit(read_bitmap, nr_secs);
>>> -       do {
>>> -               struct pblk_line *line;
>>> -
>>> -               line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
>>> -               kref_put(&line->ref, pblk_line_put);
>>> -
>>> -               meta = pblk_get_meta(pblk, meta_list, hole);
>>> -               meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]);
>>> -
>>> -               src_bv = new_bio->bi_io_vec[i++];
>>> -               dst_bv = bio->bi_io_vec[bio_init_idx + hole];
>>> -
>>> -               src_p = kmap_atomic(src_bv.bv_page);
>>> -               dst_p = kmap_atomic(dst_bv.bv_page);
>>> -
>>> -               memcpy(dst_p + dst_bv.bv_offset,
>>> -                       src_p + src_bv.bv_offset,
>>> -                       PBLK_EXPOSED_PAGE_SIZE);
>>> -
>>> -               kunmap_atomic(src_p);
>>> -               kunmap_atomic(dst_p);
>>> -
>>> -               mempool_free(src_bv.bv_page, &pblk->page_bio_pool);
>>> -
>>> -               hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1);
>>> -       } while (hole < nr_secs);
>>> -
>>> -       bio_put(new_bio);
>>> -       kfree(pr_ctx);
>>> -
>>> -       /* restore original request */
>>> -       rqd->bio = NULL;
>>> -       rqd->nr_ppas = nr_secs;
>>> -
>>> -       pblk_end_user_read(bio, rqd->error);
>>> -       __pblk_end_io_read(pblk, rqd, false);
>>> -}
>>> -
>>> -static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>> -                           unsigned int bio_init_idx,
>>> -                           unsigned long *read_bitmap,
>>> -                           int nr_holes)
>>> -{
>>> -       void *meta_list = rqd->meta_list;
>>> -       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>> -       struct pblk_pr_ctx *pr_ctx;
>>> -       struct bio *new_bio, *bio = r_ctx->private;
>>> -       int nr_secs = rqd->nr_ppas;
>>> -       int i;
>>> -
>>> -       new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>>> -
>>> -       if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
>>> -               goto fail_bio_put;
>>> -
>>> -       if (nr_holes != new_bio->bi_vcnt) {
>>> -               WARN_ONCE(1, "pblk: malformed bio\n");
>>> -               goto fail_free_pages;
>>> -       }
>>> -
>>> -       pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
>>> -       if (!pr_ctx)
>>> -               goto fail_free_pages;
>>> -
>>> -       for (i = 0; i < nr_secs; i++) {
>>> -               struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
>>> -
>>> -               pr_ctx->lba_list_mem[i] = le64_to_cpu(meta->lba);
>>> -       }
>>> -
>>> -       new_bio->bi_iter.bi_sector = 0; /* internal bio */
>>> -       bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>>> -
>>> -       rqd->bio = new_bio;
>>> -       rqd->nr_ppas = nr_holes;
>>> -
>>> -       pr_ctx->orig_bio = bio;
>>> -       bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
>>> -       pr_ctx->bio_init_idx = bio_init_idx;
>>> -       pr_ctx->orig_nr_secs = nr_secs;
>>> -       r_ctx->private = pr_ctx;
>>> -
>>> -       if (unlikely(nr_holes == 1)) {
>>> -               pr_ctx->ppa_ptr = rqd->ppa_list;
>>> -               pr_ctx->dma_ppa_list = rqd->dma_ppa_list;
>>> -               rqd->ppa_addr = rqd->ppa_list[0];
>>> -       }
>>> -       return 0;
>>> -
>>> -fail_free_pages:
>>> -       pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
>>> -fail_bio_put:
>>> -       bio_put(new_bio);
>>> -
>>> -       return -ENOMEM;
>>> -}
>>> -
>>> -static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
>>> -                                unsigned int bio_init_idx,
>>> -                                unsigned long *read_bitmap, int nr_secs)
>>> -{
>>> -       int nr_holes;
>>> -       int ret;
>>> -
>>> -       nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
>>> -
>>> -       if (pblk_setup_partial_read(pblk, rqd, bio_init_idx, read_bitmap,
>>> -                                   nr_holes))
>>> -               return NVM_IO_ERR;
>>> -
>>> -       rqd->end_io = pblk_end_partial_read;
>>> -
>>> -       ret = pblk_submit_io(pblk, rqd);
>>> -       if (ret) {
>>> -               bio_put(rqd->bio);
>>> -               pblk_err(pblk, "partial read IO submission failed\n");
>>> -               goto err;
>>> -       }
>>> -
>>> -       return NVM_IO_OK;
>>> -
>>> -err:
>>> -       pblk_err(pblk, "failed to perform partial read\n");
>>> -
>>> -       /* Free allocated pages in new bio */
>>> -       pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt);
>>> -       __pblk_end_io_read(pblk, rqd, false);
>>> -       return NVM_IO_ERR;
>>> -}
>>> -
>>>  static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>                          sector_t lba, unsigned long *read_bitmap)
>>>  {
>>> @@ -432,11 +267,11 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
>>>  {
>>>         struct nvm_tgt_dev *dev = pblk->dev;
>>>         struct request_queue *q = dev->q;
>>> +       struct bio *split_bio, *int_bio;
>>>         sector_t blba = pblk_get_lba(bio);
>>>         unsigned int nr_secs = pblk_get_secs(bio);
>>>         struct pblk_g_ctx *r_ctx;
>>>         struct nvm_rq *rqd;
>>> -       unsigned int bio_init_idx;
>>>         DECLARE_BITMAP(read_bitmap, NVM_MAX_VLBA);
>>>         int ret = NVM_IO_ERR;
>>> 
>>> @@ -456,61 +291,66 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
>>>         r_ctx = nvm_rq_to_pdu(rqd);
>>>         r_ctx->start_time = jiffies;
>>>         r_ctx->lba = blba;
>>> -       r_ctx->private = bio; /* original bio */
>>> 
>>> -       /* Save the index for this bio's start. This is needed in case
>>> -        * we need to fill a partial read.
>>> +       /* Clone read bio to deal with:
>>> +        * -usage of bio_advance() when memcpy data from round buffer
>>> +        * -read errors in case of reading from device
>>>          */
>>> -       bio_init_idx = pblk_get_bi_idx(bio);
>>> +       int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
>>> +       if (!int_bio)
>>> +               return NVM_IO_ERR;
>>> 
>>>         if (pblk_alloc_rqd_meta(pblk, rqd))
>>>                 goto fail_rqd_free;
>>> 
>>>         if (nr_secs > 1)
>>> -               pblk_read_ppalist_rq(pblk, rqd, bio, blba, read_bitmap);
>>> +               pblk_read_ppalist_rq(pblk, rqd, int_bio, blba, read_bitmap);
>>>         else
>>> -               pblk_read_rq(pblk, rqd, bio, blba, read_bitmap);
>>> +               pblk_read_rq(pblk, rqd, int_bio, blba, read_bitmap);
>>> +
>>> +split_retry:
>>> +       r_ctx->private = bio; /* original bio */
>>> 
>>> -       if (bitmap_full(read_bitmap, nr_secs)) {
>>> +       if (bitmap_full(read_bitmap, rqd->nr_ppas)) {
>>> +               bio_put(int_bio);
>>>                 atomic_inc(&pblk->inflight_io);
>>>                 __pblk_end_io_read(pblk, rqd, false);
>>>                 return NVM_IO_DONE;
>>>         }
>>> 
>>> -       /* All sectors are to be read from the device */
>>> -       if (bitmap_empty(read_bitmap, rqd->nr_ppas)) {
>>> -               struct bio *int_bio = NULL;
>>> +       if (!bitmap_empty(read_bitmap, rqd->nr_ppas)) {
>>> +               /* The read bio request could be partially filled by the write
>>> +                * buffer, but there are some holes that need to be read from
>>> +                * the drive. In order to handle this, we will use block layer
>>> +                * mechanism to split this request in to smaller ones.
>>> +                */
>>> +               split_bio = bio_split(bio, NR_PHY_IN_LOG, GFP_KERNEL,
>>> +                                       &pblk_bio_set);
>> This is quite inefficient. If you have an rqd with 63 sectors on the device and
>> the 64th is cached, you are splitting 63 times whereas a single split
>> would be sufficient.
> 
> Yeah, definitely, it would be better to find how many contiguous sectors in cache/on drive are and splitting based on it. Will improve that in v2.
> 
>>> +               bio_chain(split_bio, bio);
>> I am not sure if it's needed but in blk_queue_split() these flags are set
>> before making the request:
>>         split->bi_opf |= REQ_NOMERGE;
>>         bio_set_flag(*bio, BIO_QUEUE_ENTERED);
>>> +               generic_make_request(bio);
>> pblk_lookup_l2p_seq() increments line->ref. You have to release the krefs before
>> requeing the request.
> I completely forgot about it, will fix that of course, thanks!
> 
>> You might consider introducing a pblk_lookup_l2p_uncached() which returns when
>> it finds a cached sector and returns its index. Doing so you can avoid obtaining
>> superfluous line->refs and we could also get rid of the read_bitmap
>> entirely. This
>> index could also be used to split the bio at the right position.


I think you can also get rid of the read_bitmap. This would help
removing one of the 64 lba dependencies in pblk, which I think is useful
as we move forward.


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 01/18] lightnvm: pblk: fix warning in pblk_l2p_init()
  2019-03-14 16:04 ` [PATCH 01/18] lightnvm: pblk: fix warning in pblk_l2p_init() Igor Konopko
@ 2019-03-16 22:29   ` Javier González
  2019-03-18 16:25   ` Matias Bjørling
  1 sibling, 0 replies; 69+ messages in thread
From: Javier González @ 2019-03-16 22:29 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 1079 bytes --]

> On 14 Mar 2019, at 09.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> This patch fixes a compilation warning caused by improper format
> specifier provided in pblk_l2p_init().
> 
> Fixes: fe0c220 ("lightnvm: pblk: cleanly fail when there is not enough memory")
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-init.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index 1940f89..1e227a0 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -159,7 +159,7 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
> 					| __GFP_RETRY_MAYFAIL | __GFP_HIGHMEM,
> 					PAGE_KERNEL);
> 	if (!pblk->trans_map) {
> -		pblk_err(pblk, "failed to allocate L2P (need %ld of memory)\n",
> +		pblk_err(pblk, "failed to allocate L2P (need %zu of memory)\n",
> 				map_size);
> 		return -ENOMEM;
> 	}
> --
> 2.9.5

Looks food to me.

Reviewed-by: Javier González <javier@javigon.com>


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 02/18] lightnvm: pblk: warn when there are opened chunks
  2019-03-14 16:04 ` [PATCH 02/18] lightnvm: pblk: warn when there are opened chunks Igor Konopko
@ 2019-03-16 22:36   ` Javier González
  2019-03-17 19:39   ` Matias Bjørling
  1 sibling, 0 replies; 69+ messages in thread
From: Javier González @ 2019-03-16 22:36 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 3471 bytes --]

> On 14 Mar 2019, at 09.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> In case of factory pblk init, we might have a situation when there are
> some opened chunks. Based on OCSSD spec we are not allowed to transform
> chunks from the open state directly to the free state, so such a reset
> can lead to IO error (even that most of the controller will allow for
> for such a operation anyway), so we would like to warn users about such
> a situation at least.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-init.c | 23 ++++++++++++++++-------
> 1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index 1e227a0..b7845f6 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -712,7 +712,7 @@ static int pblk_set_provision(struct pblk *pblk, int nr_free_chks)
> }
> 
> static int pblk_setup_line_meta_chk(struct pblk *pblk, struct pblk_line *line,
> -				   struct nvm_chk_meta *meta)
> +				   struct nvm_chk_meta *meta, int *opened)
> {
> 	struct nvm_tgt_dev *dev = pblk->dev;
> 	struct nvm_geo *geo = &dev->geo;
> @@ -748,6 +748,9 @@ static int pblk_setup_line_meta_chk(struct pblk *pblk, struct pblk_line *line,
> 			continue;
> 		}
> 
> +		if (chunk->state & NVM_CHK_ST_OPEN)
> +			(*opened)++;
> +
> 		if (!(chunk->state & NVM_CHK_ST_OFFLINE))
> 			continue;
> 
> @@ -759,7 +762,7 @@ static int pblk_setup_line_meta_chk(struct pblk *pblk, struct pblk_line *line,
> }
> 
> static long pblk_setup_line_meta(struct pblk *pblk, struct pblk_line *line,
> -				 void *chunk_meta, int line_id)
> +				 void *chunk_meta, int line_id, int *opened)
> {
> 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> 	struct pblk_line_meta *lm = &pblk->lm;
> @@ -773,7 +776,7 @@ static long pblk_setup_line_meta(struct pblk *pblk, struct pblk_line *line,
> 	line->vsc = &l_mg->vsc_list[line_id];
> 	spin_lock_init(&line->lock);
> 
> -	nr_bad_chks = pblk_setup_line_meta_chk(pblk, line, chunk_meta);
> +	nr_bad_chks = pblk_setup_line_meta_chk(pblk, line, chunk_meta, opened);
> 
> 	chk_in_line = lm->blk_per_line - nr_bad_chks;
> 	if (nr_bad_chks < 0 || nr_bad_chks > lm->blk_per_line ||
> @@ -1019,12 +1022,12 @@ static int pblk_line_meta_init(struct pblk *pblk)
> 	return 0;
> }
> 
> -static int pblk_lines_init(struct pblk *pblk)
> +static int pblk_lines_init(struct pblk *pblk, bool factory)
> {
> 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> 	struct pblk_line *line;
> 	void *chunk_meta;
> -	int nr_free_chks = 0;
> +	int nr_free_chks = 0, nr_opened_chks = 0;
> 	int i, ret;
> 
> 	ret = pblk_line_meta_init(pblk);
> @@ -1059,7 +1062,8 @@ static int pblk_lines_init(struct pblk *pblk)
> 		if (ret)
> 			goto fail_free_lines;
> 
> -		nr_free_chks += pblk_setup_line_meta(pblk, line, chunk_meta, i);
> +		nr_free_chks += pblk_setup_line_meta(pblk, line, chunk_meta, i,
> +							&nr_opened_chks);
> 
> 		trace_pblk_line_state(pblk_disk_name(pblk), line->id,
> 								line->state);
> @@ -1071,6 +1075,11 @@ static int pblk_lines_init(struct pblk *pblk)
> 		goto fail_free_lines;
> 	}
> 
> +	if (factory && nr_opened_chks) {
> +		pblk_warn(pblk, "%d opened chunks during factory creation\n",
> +				nr_opened_chks);
> +	}
> +

nip: no need for braces here.

Otherwise it looks good.

Reviewed-by: Javier González <javier@javigon.com>


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 04/18] lightnvm: pblk: OOB recovery for closed chunks fix
  2019-03-14 16:04 ` [PATCH 04/18] lightnvm: pblk: OOB recovery for closed chunks fix Igor Konopko
@ 2019-03-16 22:43   ` Javier González
  2019-03-17 19:24     ` Matias Bjørling
  0 siblings, 1 reply; 69+ messages in thread
From: Javier González @ 2019-03-16 22:43 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 1988 bytes --]

> On 14 Mar 2019, at 09.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> In case of OOB recovery, when some of the chunks are in closed state,
> we are calculating number of written sectors in line incorrectly,
> because we are always counting chunk WP, which for closed chunks
> does not longer reflects written sectors in particular chunks. This
> patch for such a chunks takes clba field instead.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-recovery.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index 83b467b..bcd3633 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -101,6 +101,8 @@ static void pblk_update_line_wp(struct pblk *pblk, struct pblk_line *line,
> 
> static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line *line)
> {
> +	struct nvm_tgt_dev *dev = pblk->dev;
> +	struct nvm_geo *geo = &dev->geo;
> 	struct pblk_line_meta *lm = &pblk->lm;
> 	int nr_bb = bitmap_weight(line->blk_bitmap, lm->blk_per_line);
> 	u64 written_secs = 0;
> @@ -113,7 +115,11 @@ static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line *line)
> 		if (chunk->state & NVM_CHK_ST_OFFLINE)
> 			continue;
> 
> -		written_secs += chunk->wp;
> +		if (chunk->state & NVM_CHK_ST_OPEN)
> +			written_secs += chunk->wp;
> +		else if (chunk->state & NVM_CHK_ST_CLOSED)
> +			written_secs += geo->clba;
> +
> 		valid_chunks++;
> 	}
> 
> --
> 2.9.5

Mmmm. The change is correct, but can you develop on why the WP does not
reflect the written sectors in a closed chunk? As I understand it, the
WP reflects the last written sector; if it happens to be WP == clba, then
the chunk state machine transitions to closed, but the WP remains
untouched. It is only when we reset the chunk that the WP comes back to
0. Am I missing something?


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 05/18] lightnvm: pblk: propagate errors when reading meta
  2019-03-14 16:04 ` [PATCH 05/18] lightnvm: pblk: propagate errors when reading meta Igor Konopko
@ 2019-03-16 22:48   ` Javier González
  2019-03-18 11:54   ` Hans Holmberg
  1 sibling, 0 replies; 69+ messages in thread
From: Javier González @ 2019-03-16 22:48 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 2014 bytes --]

> On 14 Mar 2019, at 09.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> Currently when smeta/emeta/oob is read errors are not always propagated
> correctly. This patch changes that behaviour and propagates all the
> error codes except of high ecc read warning status.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-core.c     | 9 +++++++--
> drivers/lightnvm/pblk-recovery.c | 2 +-
> 2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 39280c1..38e26fe 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -761,8 +761,10 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
> 
> 	atomic_dec(&pblk->inflight_io);
> 
> -	if (rqd.error)
> +	if (rqd.error && rqd.error != NVM_RSP_WARN_HIGHECC) {
> 		pblk_log_read_err(pblk, &rqd);
> +		ret = -EIO;
> +	}
> 
> clear_rqd:
> 	pblk_free_rqd_meta(pblk, &rqd);
> @@ -916,8 +918,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
> 
> 	atomic_dec(&pblk->inflight_io);
> 
> -	if (rqd.error)
> +	if (rqd.error && rqd.error != NVM_RSP_WARN_HIGHECC) {
> 		pblk_log_read_err(pblk, &rqd);
> +		ret = -EIO;
> +		goto free_rqd_dma;
> +	}
> 
> 	emeta_buf += rq_len;
> 	left_ppas -= rq_ppas;
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index bcd3633..688fdeb 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -450,7 +450,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
> 	atomic_dec(&pblk->inflight_io);
> 
> 	/* If a read fails, do a best effort by padding the line and retrying */
> -	if (rqd->error) {
> +	if (rqd->error && rqd->error != NVM_RSP_WARN_HIGHECC) {
> 		int pad_distance, ret;
> 
> 		if (padded) {
> --
> 2.9.5


Looks good to me.

Reviewed-by: Javier González <javier@javigon.com>


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 06/18] lightnvm: pblk: recover only written metadata
  2019-03-14 16:04 ` [PATCH 06/18] lightnvm: pblk: recover only written metadata Igor Konopko
@ 2019-03-16 23:46   ` Javier González
  2019-03-18 12:54     ` Igor Konopko
  0 siblings, 1 reply; 69+ messages in thread
From: Javier González @ 2019-03-16 23:46 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 2821 bytes --]

> On 14 Mar 2019, at 09.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> This patch ensures that smeta/emeta was written properly before even
> trying to read it based on chunk table state and write pointer.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-recovery.c | 43 ++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index 688fdeb..ba1691d 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -653,8 +653,42 @@ static int pblk_line_was_written(struct pblk_line *line,
> 	bppa = pblk->luns[smeta_blk].bppa;
> 	chunk = &line->chks[pblk_ppa_to_pos(geo, bppa)];
> 
> -	if (chunk->state & NVM_CHK_ST_FREE)
> -		return 0;
> +	if (chunk->state & NVM_CHK_ST_CLOSED ||
> +	    (chunk->state & NVM_CHK_ST_OPEN
> +	     && chunk->wp >= lm->smeta_sec))
> +		return 1;
> +
> +	return 0;
> +}
> +
> +static int pblk_line_was_emeta_written(struct pblk_line *line,
> +				       struct pblk *pblk)
> +{
> +
> +	struct pblk_line_meta *lm = &pblk->lm;
> +	struct nvm_tgt_dev *dev = pblk->dev;
> +	struct nvm_geo *geo = &dev->geo;
> +	struct nvm_chk_meta *chunk;
> +	struct ppa_addr ppa;
> +	int i, pos;
> +	int min = pblk->min_write_pgs;
> +	u64 paddr = line->emeta_ssec;
> +
> +	for (i = 0; i < lm->emeta_sec[0]; i++, paddr++) {
> +		ppa = addr_to_gen_ppa(pblk, paddr, line->id);
> +		pos = pblk_ppa_to_pos(geo, ppa);
> +		while (test_bit(pos, line->blk_bitmap)) {
> +			paddr += min;
> +			ppa = addr_to_gen_ppa(pblk, paddr, line->id);
> +			pos = pblk_ppa_to_pos(geo, ppa);
> +		}
> +		chunk = &line->chks[pos];
> +
> +		if (!(chunk->state & NVM_CHK_ST_CLOSED ||
> +		    (chunk->state & NVM_CHK_ST_OPEN
> +		     && chunk->wp > ppa.m.sec)))
> +			return 0;
> +	}
> 
> 	return 1;
> }
> @@ -788,6 +822,11 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
> 			goto next;
> 		}
> 
> +		if (!pblk_line_was_emeta_written(line, pblk)) {
> +			pblk_recov_l2p_from_oob(pblk, line);
> +			goto next;
> +		}
> +
> 		if (pblk_line_emeta_read(pblk, line, line->emeta->buf)) {
> 			pblk_recov_l2p_from_oob(pblk, line);
> 			goto next;
> --
> 2.9.5

 I would like to avoid iterating on all chunks again an again to check
 for different things at boot time. What do you think having this as
 something like PBLK_LINESTATE_OPEN_INVALID or
 PBLK_LINESTATE_OPEN_NONRECOV, which you populate when collecting the
 chunk information? Then this becomes a simple check as opposed to the
 extra chunk iteration?

On the check itself: Is this done for completeness or have you hit a
case that is not covered by the smeta/emeta CRC protection?



[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 08/18] lightnvm: pblk: fix spin_unlock order
  2019-03-14 16:04 ` [PATCH 08/18] lightnvm: pblk: fix spin_unlock order Igor Konopko
@ 2019-03-16 23:49   ` Javier González
  2019-03-18 11:55   ` Hans Holmberg
  1 sibling, 0 replies; 69+ messages in thread
From: Javier González @ 2019-03-16 23:49 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

> On 14 Mar 2019, at 09.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> In pblk_rb_tear_down_check() spin_unlock() functions are not
> called in proper order. This patch fixes that.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-rb.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
> index 03c241b..3555014 100644
> --- a/drivers/lightnvm/pblk-rb.c
> +++ b/drivers/lightnvm/pblk-rb.c
> @@ -799,8 +799,8 @@ int pblk_rb_tear_down_check(struct pblk_rb *rb)
> 	}
> 
> out:
> -	spin_unlock(&rb->w_lock);
> 	spin_unlock_irq(&rb->s_lock);
> +	spin_unlock(&rb->w_lock);
> 
> 	return ret;
> }
> --
> 2.9.5

Ups. Thanks for fixing this...

Reviewed-by: Javier González <javier@javigon.com>


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 09/18] lightnvm: pblk: kick writer on write recovery path
  2019-03-14 16:04 ` [PATCH 09/18] lightnvm: pblk: kick writer on write recovery path Igor Konopko
@ 2019-03-16 23:54   ` Javier González
  2019-03-18 11:58   ` Hans Holmberg
  1 sibling, 0 replies; 69+ messages in thread
From: Javier González @ 2019-03-16 23:54 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 850 bytes --]

> On 14 Mar 2019, at 09.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> In case of write recovery path, there is a chance that writer thread
> is not active, so for sanity it would be good to always kick it.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-write.c | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
> index 6593dea..4e63f9b 100644
> --- a/drivers/lightnvm/pblk-write.c
> +++ b/drivers/lightnvm/pblk-write.c
> @@ -228,6 +228,7 @@ static void pblk_submit_rec(struct work_struct *work)
> 	mempool_free(recovery, &pblk->rec_pool);
> 
> 	atomic_dec(&pblk->inflight_io);
> +	pblk_write_kick(pblk);
> }
> 
> 
> --
> 2.9.5

Looks good to me.

Reviewed-by: Javier González <javier@javigon.com>


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 04/18] lightnvm: pblk: OOB recovery for closed chunks fix
  2019-03-16 22:43   ` Javier González
@ 2019-03-17 19:24     ` Matias Bjørling
  2019-03-18 12:50       ` Igor Konopko
  0 siblings, 1 reply; 69+ messages in thread
From: Matias Bjørling @ 2019-03-17 19:24 UTC (permalink / raw)
  To: Javier González, Konopko, Igor J; +Cc: Hans Holmberg, linux-block

On 3/16/19 3:43 PM, Javier González wrote:
>> On 14 Mar 2019, at 09.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>> In case of OOB recovery, when some of the chunks are in closed state,
>> we are calculating number of written sectors in line incorrectly,
>> because we are always counting chunk WP, which for closed chunks
>> does not longer reflects written sectors in particular chunks. This
>> patch for such a chunks takes clba field instead.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>> drivers/lightnvm/pblk-recovery.c | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>> index 83b467b..bcd3633 100644
>> --- a/drivers/lightnvm/pblk-recovery.c
>> +++ b/drivers/lightnvm/pblk-recovery.c
>> @@ -101,6 +101,8 @@ static void pblk_update_line_wp(struct pblk *pblk, struct pblk_line *line,
>>
>> static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line *line)
>> {
>> +	struct nvm_tgt_dev *dev = pblk->dev;
>> +	struct nvm_geo *geo = &dev->geo;
>> 	struct pblk_line_meta *lm = &pblk->lm;
>> 	int nr_bb = bitmap_weight(line->blk_bitmap, lm->blk_per_line);
>> 	u64 written_secs = 0;
>> @@ -113,7 +115,11 @@ static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line *line)
>> 		if (chunk->state & NVM_CHK_ST_OFFLINE)
>> 			continue;
>>
>> -		written_secs += chunk->wp;
>> +		if (chunk->state & NVM_CHK_ST_OPEN)
>> +			written_secs += chunk->wp;
>> +		else if (chunk->state & NVM_CHK_ST_CLOSED)
>> +			written_secs += geo->clba;
>> +
>> 		valid_chunks++;
>> 	}
>>
>> --
>> 2.9.5
> 
> Mmmm. The change is correct, but can you develop on why the WP does not
> reflect the written sectors in a closed chunk? As I understand it, the
> WP reflects the last written sector; if it happens to be WP == clba, then
> the chunk state machine transitions to closed, but the WP remains
> untouched. It is only when we reset the chunk that the WP comes back to
> 0. Am I missing something?
> 

I agree with Javier. In OCSSD, the write pointer shall always be valid.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 07/18] lightnvm: pblk: wait for inflight IOs in recovery
  2019-03-14 16:04 ` [PATCH 07/18] lightnvm: pblk: wait for inflight IOs in recovery Igor Konopko
@ 2019-03-17 19:33   ` Matias Bjørling
  2019-03-18 12:58     ` Igor Konopko
  0 siblings, 1 reply; 69+ messages in thread
From: Matias Bjørling @ 2019-03-17 19:33 UTC (permalink / raw)
  To: Igor Konopko, javier, hans.holmberg; +Cc: linux-block

On 3/14/19 9:04 AM, Igor Konopko wrote:
> This patch changes the behaviour of recovery padding in order to
> support a case, when some IOs were already submitted to the drive and
> some next one are not submitted due to error returned.
> 
> Currently in case of errors we simply exit the pad function without
> waiting for inflight IOs, which leads to panic on inflight IOs
> completion.
> 
> After the changes we always wait for all the inflight IOs before
> exiting the function.
> 
> Also, since NVMe has an internal timeout per IO, there is no need to
> introduce additonal one here.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>   drivers/lightnvm/pblk-recovery.c | 32 +++++++++++++-------------------
>   1 file changed, 13 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index ba1691d..73d5ead 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -200,7 +200,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>   	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>   	if (rq_ppas < pblk->min_write_pgs) {
>   		pblk_err(pblk, "corrupted pad line %d\n", line->id);
> -		goto fail_free_pad;
> +		goto fail_complete;
>   	}
>   
>   	rq_len = rq_ppas * geo->csecs;
> @@ -209,7 +209,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>   						PBLK_VMALLOC_META, GFP_KERNEL);
>   	if (IS_ERR(bio)) {
>   		ret = PTR_ERR(bio);
> -		goto fail_free_pad;
> +		goto fail_complete;
>   	}
>   
>   	bio->bi_iter.bi_sector = 0; /* internal bio */
> @@ -218,8 +218,11 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>   	rqd = pblk_alloc_rqd(pblk, PBLK_WRITE_INT);
>   
>   	ret = pblk_alloc_rqd_meta(pblk, rqd);
> -	if (ret)
> -		goto fail_free_rqd;
> +	if (ret) {
> +		pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
> +		bio_put(bio);
> +		goto fail_complete;
> +	}
>   
>   	rqd->bio = bio;
>   	rqd->opcode = NVM_OP_PWRITE;
> @@ -266,7 +269,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>   	if (ret) {
>   		pblk_err(pblk, "I/O submission failed: %d\n", ret);
>   		pblk_up_chunk(pblk, rqd->ppa_list[0]);
> -		goto fail_free_rqd;
> +		kref_put(&pad_rq->ref, pblk_recov_complete);
> +		pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
> +		bio_put(bio);
> +		goto fail_complete;
>   	}
>   
>   	left_line_ppas -= rq_ppas;
> @@ -274,13 +280,9 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>   	if (left_ppas && left_line_ppas)
>   		goto next_pad_rq;
>   
> +fail_complete:
>   	kref_put(&pad_rq->ref, pblk_recov_complete);
> -
> -	if (!wait_for_completion_io_timeout(&pad_rq->wait,
> -				msecs_to_jiffies(PBLK_COMMAND_TIMEOUT_MS))) {
> -		pblk_err(pblk, "pad write timed out\n");
> -		ret = -ETIME;
> -	}
> +	wait_for_completion(&pad_rq->wait);
>   
>   	if (!pblk_line_is_full(line))
>   		pblk_err(pblk, "corrupted padded line: %d\n", line->id);
> @@ -289,14 +291,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>   free_rq:
>   	kfree(pad_rq);
>   	return ret;
> -
> -fail_free_rqd:
> -	pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
> -	bio_put(bio);
> -fail_free_pad:
> -	kfree(pad_rq);
> -	vfree(data);
> -	return ret;
>   }
>   
>   static int pblk_pad_distance(struct pblk *pblk, struct pblk_line *line)
> 

Hi Igor,

Can you split this patch in two. One that removes the 
wait_for_completion_io_timeout (and constant), and another that makes 
sure it waits until all inflight IOs are completed?



^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 02/18] lightnvm: pblk: warn when there are opened chunks
  2019-03-14 16:04 ` [PATCH 02/18] lightnvm: pblk: warn when there are opened chunks Igor Konopko
  2019-03-16 22:36   ` Javier González
@ 2019-03-17 19:39   ` Matias Bjørling
  1 sibling, 0 replies; 69+ messages in thread
From: Matias Bjørling @ 2019-03-17 19:39 UTC (permalink / raw)
  To: Igor Konopko, javier, hans.holmberg; +Cc: linux-block

On 3/14/19 9:04 AM, Igor Konopko wrote:
> In case of factory pblk init, we might have a situation when there are
> some opened chunks. Based on OCSSD spec we are not allowed to transform
> chunks from the open state directly to the free state, so such a reset
> can lead to IO error (even that most of the controller will allow for
> for such a operation anyway), so we would like to warn users about such
> a situation at least.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>   drivers/lightnvm/pblk-init.c | 23 ++++++++++++++++-------
>   1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index 1e227a0..b7845f6 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -712,7 +712,7 @@ static int pblk_set_provision(struct pblk *pblk, int nr_free_chks)
>   }
>   
>   static int pblk_setup_line_meta_chk(struct pblk *pblk, struct pblk_line *line,
> -				   struct nvm_chk_meta *meta)
> +				   struct nvm_chk_meta *meta, int *opened)
>   {
>   	struct nvm_tgt_dev *dev = pblk->dev;
>   	struct nvm_geo *geo = &dev->geo;
> @@ -748,6 +748,9 @@ static int pblk_setup_line_meta_chk(struct pblk *pblk, struct pblk_line *line,
>   			continue;
>   		}
>   
> +		if (chunk->state & NVM_CHK_ST_OPEN)
> +			(*opened)++;
> +
>   		if (!(chunk->state & NVM_CHK_ST_OFFLINE))
>   			continue;
>   
> @@ -759,7 +762,7 @@ static int pblk_setup_line_meta_chk(struct pblk *pblk, struct pblk_line *line,
>   }
>   
>   static long pblk_setup_line_meta(struct pblk *pblk, struct pblk_line *line,
> -				 void *chunk_meta, int line_id)
> +				 void *chunk_meta, int line_id, int *opened)
>   {
>   	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>   	struct pblk_line_meta *lm = &pblk->lm;
> @@ -773,7 +776,7 @@ static long pblk_setup_line_meta(struct pblk *pblk, struct pblk_line *line,
>   	line->vsc = &l_mg->vsc_list[line_id];
>   	spin_lock_init(&line->lock);
>   
> -	nr_bad_chks = pblk_setup_line_meta_chk(pblk, line, chunk_meta);
> +	nr_bad_chks = pblk_setup_line_meta_chk(pblk, line, chunk_meta, opened);
>   
>   	chk_in_line = lm->blk_per_line - nr_bad_chks;
>   	if (nr_bad_chks < 0 || nr_bad_chks > lm->blk_per_line ||
> @@ -1019,12 +1022,12 @@ static int pblk_line_meta_init(struct pblk *pblk)
>   	return 0;
>   }
>   
> -static int pblk_lines_init(struct pblk *pblk)
> +static int pblk_lines_init(struct pblk *pblk, bool factory)
>   {
>   	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>   	struct pblk_line *line;
>   	void *chunk_meta;
> -	int nr_free_chks = 0;
> +	int nr_free_chks = 0, nr_opened_chks = 0;
>   	int i, ret;
>   
>   	ret = pblk_line_meta_init(pblk);
> @@ -1059,7 +1062,8 @@ static int pblk_lines_init(struct pblk *pblk)
>   		if (ret)
>   			goto fail_free_lines;
>   
> -		nr_free_chks += pblk_setup_line_meta(pblk, line, chunk_meta, i);
> +		nr_free_chks += pblk_setup_line_meta(pblk, line, chunk_meta, i,
> +							&nr_opened_chks);
>   
>   		trace_pblk_line_state(pblk_disk_name(pblk), line->id,
>   								line->state);
> @@ -1071,6 +1075,11 @@ static int pblk_lines_init(struct pblk *pblk)
>   		goto fail_free_lines;
>   	}
>   
> +	if (factory && nr_opened_chks) {
> +		pblk_warn(pblk, "%d opened chunks during factory creation\n",
> +				nr_opened_chks);
> +	}
> +
>   	ret = pblk_set_provision(pblk, nr_free_chks);
>   	if (ret)
>   		goto fail_free_lines;
> @@ -1235,7 +1244,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
>   		goto fail;
>   	}
>   
> -	ret = pblk_lines_init(pblk);
> +	ret = pblk_lines_init(pblk, flags & NVM_TARGET_FACTORY);
>   	if (ret) {
>   		pblk_err(pblk, "could not initialize lines\n");
>   		goto fail_free_core;
> 

I've going both ways on this one. I think I rather just do an ECN of the 
spec to say that this always have been allowed, and then add a feature 
bit for those that like the reset protection.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 10/18] lightnvm: pblk: ensure that emeta is written
  2019-03-14 16:04 ` [PATCH 10/18] lightnvm: pblk: ensure that emeta is written Igor Konopko
@ 2019-03-17 19:44   ` Matias Bjørling
  2019-03-18 13:02     ` Igor Konopko
  2019-03-18  7:46   ` Javier González
  1 sibling, 1 reply; 69+ messages in thread
From: Matias Bjørling @ 2019-03-17 19:44 UTC (permalink / raw)
  To: Igor Konopko, javier, hans.holmberg; +Cc: linux-block

On 3/14/19 9:04 AM, Igor Konopko wrote:
> When we are trying to switch to the new line, we need to ensure that
> emeta for n-2 line is already written. In other case we can end with
> deadlock scenario, when the writer has no more requests to write and
> thus there is no way to trigger emeta writes from writer thread. This
> is a corner case scenario which occurs in a case of multiple writes
> error and thus kind of early line close due to lack of line space.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>   drivers/lightnvm/pblk-core.c  |  2 ++
>   drivers/lightnvm/pblk-write.c | 24 ++++++++++++++++++++++++
>   drivers/lightnvm/pblk.h       |  1 +
>   3 files changed, 27 insertions(+)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 38e26fe..a683d1f 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -1001,6 +1001,7 @@ static void pblk_line_setup_metadata(struct pblk_line *line,
>   				     struct pblk_line_mgmt *l_mg,
>   				     struct pblk_line_meta *lm)
>   {
> +	struct pblk *pblk = container_of(l_mg, struct pblk, l_mg);
>   	int meta_line;
>   
>   	lockdep_assert_held(&l_mg->free_lock);
> @@ -1009,6 +1010,7 @@ static void pblk_line_setup_metadata(struct pblk_line *line,
>   	meta_line = find_first_zero_bit(&l_mg->meta_bitmap, PBLK_DATA_LINES);
>   	if (meta_line == PBLK_DATA_LINES) {
>   		spin_unlock(&l_mg->free_lock);
> +		pblk_write_emeta_force(pblk);
>   		io_schedule();
>   		spin_lock(&l_mg->free_lock);
>   		goto retry_meta;
> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
> index 4e63f9b..4fbb9b2 100644
> --- a/drivers/lightnvm/pblk-write.c
> +++ b/drivers/lightnvm/pblk-write.c
> @@ -505,6 +505,30 @@ static struct pblk_line *pblk_should_submit_meta_io(struct pblk *pblk,
>   	return meta_line;
>   }
>   
> +void pblk_write_emeta_force(struct pblk *pblk)
> +{
> +	struct pblk_line_meta *lm = &pblk->lm;
> +	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> +	struct pblk_line *meta_line;
> +
> +	while (true) {
> +		spin_lock(&l_mg->close_lock);
> +		if (list_empty(&l_mg->emeta_list)) {
> +			spin_unlock(&l_mg->close_lock);
> +			break;
> +		}
> +		meta_line = list_first_entry(&l_mg->emeta_list,
> +						struct pblk_line, list);
> +		if (meta_line->emeta->mem >= lm->emeta_len[0]) {
> +			spin_unlock(&l_mg->close_lock);
> +			io_schedule();
> +			continue;
> +		}
> +		spin_unlock(&l_mg->close_lock);
> +		pblk_submit_meta_io(pblk, meta_line);
> +	}
> +}
> +
>   static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
>   {
>   	struct ppa_addr erase_ppa;
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 0a85990..a42bbfb 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -877,6 +877,7 @@ int pblk_write_ts(void *data);
>   void pblk_write_timer_fn(struct timer_list *t);
>   void pblk_write_should_kick(struct pblk *pblk);
>   void pblk_write_kick(struct pblk *pblk);
> +void pblk_write_emeta_force(struct pblk *pblk);
>   
>   /*
>    * pblk read path
> 

Hi Igor,

Is this an error that qemu can force pblk to expose? Can you provide a 
specific example on what is needed to force the error?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 12/18] lightnvm: pblk: do not read OOB from emeta region
  2019-03-14 16:04 ` [PATCH 12/18] lightnvm: pblk: do not read OOB from emeta region Igor Konopko
@ 2019-03-17 19:56   ` Matias Bjørling
  2019-03-18 13:05     ` Igor Konopko
  0 siblings, 1 reply; 69+ messages in thread
From: Matias Bjørling @ 2019-03-17 19:56 UTC (permalink / raw)
  To: Igor Konopko, javier, hans.holmberg; +Cc: linux-block

On 3/14/19 9:04 AM, Igor Konopko wrote:
> Emeta does not have corresponding OOB metadata mapping valid, so there
> is no need to try to recover L2P mapping from it.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>   drivers/lightnvm/pblk-recovery.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index 4764596..2132260 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -479,6 +479,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>   		goto retry_rq;
>   	}
>   
> +	if (paddr >= line->emeta_ssec) {
> +		/*
> +		 * We reach emeta region and we don't want
> +		 * to recover oob from emeta region.
> +		 */
> +		goto completed;

The bio needs to be put before going to completed?

> +	}
> +
>   	pblk_get_packed_meta(pblk, rqd);
>   	bio_put(bio);
>   
> @@ -499,6 +507,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>   	if (left_ppas > 0)
>   		goto next_rq;
>   
> +completed:
>   #ifdef CONFIG_NVM_PBLK_DEBUG
>   	WARN_ON(padded && !pblk_line_is_full(line));
>   #endif
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 11/18] lightnvm: pblk: fix update line wp in OOB recovery
  2019-03-14 16:04 ` [PATCH 11/18] lightnvm: pblk: fix update line wp in OOB recovery Igor Konopko
@ 2019-03-18  6:56   ` Javier González
  2019-03-18 13:06     ` Igor Konopko
  0 siblings, 1 reply; 69+ messages in thread
From: Javier González @ 2019-03-18  6:56 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 1979 bytes --]

> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> In case of OOB recovery, we can hit the scenario when all the data in
> line were written and some part of emeta was written too. In such
> a case pblk_update_line_wp() function will call pblk_alloc_page()
> function which will case to set left_msecs to value below zero
> (since this field does not track emeta region) and thus will lead to
> multiple kernel warnings. This patch fixes that issue.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-recovery.c | 20 +++++++++++++++++---
> 1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index 73d5ead..4764596 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -93,10 +93,24 @@ static int pblk_recov_l2p_from_emeta(struct pblk *pblk, struct pblk_line *line)
> static void pblk_update_line_wp(struct pblk *pblk, struct pblk_line *line,
> 				u64 written_secs)
> {
> -	int i;
> +	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> +	int i = 0;
> 
> -	for (i = 0; i < written_secs; i += pblk->min_write_pgs)
> -		pblk_alloc_page(pblk, line, pblk->min_write_pgs);
> +	for (; i < written_secs; i += pblk->min_write_pgs)

Why no i = 0 here?

> +		__pblk_alloc_page(pblk, line, pblk->min_write_pgs);
> +
> +	spin_lock(&l_mg->free_lock);
> +	if (written_secs > line->left_msecs) {
> +		/*
> +		 * We have all data sectors written
> +		 * and some emeta sectors written too.
> +		 */
> +		line->left_msecs = 0;
> +	} else {
> +		/* We have only some data sectors written. */
> +		line->left_msecs -= written_secs;
> +	}
> +	spin_unlock(&l_mg->free_lock);
> }
> 
> static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line *line)
> --
> 2.9.5

Otherwise, it looks good.

Reviewed-by: Javier González <javier@javigon.com>




[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 13/18] lightnvm: pblk: store multiple copies of smeta
  2019-03-14 16:04 ` [PATCH 13/18] lightnvm: pblk: store multiple copies of smeta Igor Konopko
@ 2019-03-18  7:33   ` Javier González
  2019-03-18 13:12     ` Igor Konopko
  0 siblings, 1 reply; 69+ messages in thread
From: Javier González @ 2019-03-18  7:33 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 9677 bytes --]

> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> Currently there is only one copy of emeta stored per line in pblk. This
                                                           ^^^^^^
smeta?

> is risky, because in case of read error on such a chunk, we are losing
> all the data from whole line, what leads to silent data corruption.
> 
> This patch changes this behaviour and stores 2 copies of smeta (but
> can be easily increased with kernel parameter to different value) in
> order to provide higher reliability by storing mirrored copies of
> smeta struct and providing possibility to failover to another copy of
> that struct in case of read error. Such an approach ensures that copies
> of that critical structures will be stored on different dies and thus
> predicted UBER is multiple times higher
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-core.c     | 125 ++++++++++++++++++++++++++++++++-------
> drivers/lightnvm/pblk-init.c     |  23 +++++--
> drivers/lightnvm/pblk-recovery.c |   2 +-
> drivers/lightnvm/pblk.h          |   1 +
> 4 files changed, 123 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index a683d1f..4d5cd99 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -720,13 +720,14 @@ u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line)
> 	return bit * geo->ws_opt;
> }
> 
> -int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
> +static int pblk_line_smeta_read_copy(struct pblk *pblk,
> +				     struct pblk_line *line, u64 paddr)
> {
> 	struct nvm_tgt_dev *dev = pblk->dev;
> +	struct nvm_geo *geo = &dev->geo;
> 	struct pblk_line_meta *lm = &pblk->lm;
> 	struct bio *bio;
> 	struct nvm_rq rqd;
> -	u64 paddr = pblk_line_smeta_start(pblk, line);
> 	int i, ret;
> 
> 	memset(&rqd, 0, sizeof(struct nvm_rq));
> @@ -735,7 +736,8 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
> 	if (ret)
> 		return ret;
> 
> -	bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
> +	bio = bio_map_kern(dev->q, line->smeta,
> +			   lm->smeta_len / lm->smeta_copies, GFP_KERNEL);
> 	if (IS_ERR(bio)) {
> 		ret = PTR_ERR(bio);
> 		goto clear_rqd;
> @@ -746,11 +748,23 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
> 
> 	rqd.bio = bio;
> 	rqd.opcode = NVM_OP_PREAD;
> -	rqd.nr_ppas = lm->smeta_sec;
> +	rqd.nr_ppas = lm->smeta_sec / lm->smeta_copies;
> 	rqd.is_seq = 1;
> 
> -	for (i = 0; i < lm->smeta_sec; i++, paddr++)
> -		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
> +	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
> +		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line->id);
> +		int pos = pblk_ppa_to_pos(geo, ppa);
> +
> +		while (test_bit(pos, line->blk_bitmap)) {
> +			paddr += pblk->min_write_pgs;
> +			ppa = addr_to_gen_ppa(pblk, paddr, line->id);
> +			pos = pblk_ppa_to_pos(geo, ppa);
> +		}
> +
> +		rqd.ppa_list[i] = ppa;
> +		pblk_get_meta(pblk, rqd.meta_list, i)->lba =
> +				  cpu_to_le64(ADDR_EMPTY);
> +	}
> 
> 	ret = pblk_submit_io_sync(pblk, &rqd);
> 	if (ret) {
> @@ -771,16 +785,63 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
> 	return ret;
> }
> 
> -static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
> -				 u64 paddr)
> +int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
> +{
> +	struct pblk_line_meta *lm = &pblk->lm;
> +	int i, ret = 0, smeta_sec = lm->smeta_sec / lm->smeta_copies;
> +	u64 paddr = pblk_line_smeta_start(pblk, line);
> +
> +	for (i = 0; i < lm->smeta_copies; i++) {
> +		ret = pblk_line_smeta_read_copy(pblk, line,
> +						paddr + (i * smeta_sec));
> +		if (!ret) {
> +			/*
> +			 * Just one successfully read copy of smeta is
> +			 * enough for us for recovery, don't need to
> +			 * read another one.
> +			 */
> +			return ret;
> +		}
> +	}
> +	return ret;
> +}
> +
> +static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line)
> {
> 	struct nvm_tgt_dev *dev = pblk->dev;
> +	struct nvm_geo *geo = &dev->geo;
> 	struct pblk_line_meta *lm = &pblk->lm;
> 	struct bio *bio;
> 	struct nvm_rq rqd;
> 	__le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
> 	__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
> -	int i, ret;
> +	u64 paddr = 0;
> +	int smeta_cpy_len = lm->smeta_len / lm->smeta_copies;
> +	int smeta_cpy_sec = lm->smeta_sec / lm->smeta_copies;
> +	int i, ret, rq_writes;
> +
> +	/*
> +	 * Check if we can write all the smeta copies with
> +	 * a single write command.
> +	 * If yes -> copy smeta sector into multiple copies
> +	 * in buffer to write.
> +	 * If no -> issue writes one by one using the same
> +	 * buffer space.
> +	 * Only if all the copies are written correctly
> +	 * we are treating this line as valid for proper
> +	 * UBER reliability.
> +	 */
> +	if (lm->smeta_sec > pblk->max_write_pgs) {
> +		rq_writes = lm->smeta_copies;
> +	} else {
> +		rq_writes = 1;
> +		for (i = 1; i < lm->smeta_copies; i++) {
> +			memcpy(line->smeta + i * smeta_cpy_len,
> +			       line->smeta, smeta_cpy_len);
> +		}
> +		smeta_cpy_len = lm->smeta_len;
> +		smeta_cpy_sec = lm->smeta_sec;
> +	}

smeta writes are synchronous, so you can just populate 2 entries in the
vector command. This will help you minimizing the BW spikes with storing
multiple copies. When doing this, you can remove the comment too.

In fact, smeta’s length is calculated so that it takes whole writable
pages to avoid mixing it with user data, so with this logic you will
always send 1 commands per smeta copy.

> 
> 	memset(&rqd, 0, sizeof(struct nvm_rq));
> 
> @@ -788,7 +849,8 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
> 	if (ret)
> 		return ret;
> 
> -	bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
> +next_rq:
> +	bio = bio_map_kern(dev->q, line->smeta, smeta_cpy_len, GFP_KERNEL);
> 	if (IS_ERR(bio)) {
> 		ret = PTR_ERR(bio);
> 		goto clear_rqd;
> @@ -799,15 +861,23 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
> 
> 	rqd.bio = bio;
> 	rqd.opcode = NVM_OP_PWRITE;
> -	rqd.nr_ppas = lm->smeta_sec;
> +	rqd.nr_ppas = smeta_cpy_sec;
> 	rqd.is_seq = 1;
> 
> -	for (i = 0; i < lm->smeta_sec; i++, paddr++) {
> -		struct pblk_sec_meta *meta = pblk_get_meta(pblk,
> -							   rqd.meta_list, i);
> +	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
> +		void *meta_list = rqd.meta_list;
> +		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line->id);
> +		int pos = pblk_ppa_to_pos(geo, ppa);
> +
> +		while (test_bit(pos, line->blk_bitmap)) {
> +			paddr += pblk->min_write_pgs;
> +			ppa = addr_to_gen_ppa(pblk, paddr, line->id);
> +			pos = pblk_ppa_to_pos(geo, ppa);
> +		}
> 
> -		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
> -		meta->lba = lba_list[paddr] = addr_empty;
> +		rqd.ppa_list[i] = ppa;
> +		pblk_get_meta(pblk, meta_list, i)->lba = addr_empty;
> +		lba_list[paddr] = addr_empty;
> 	}
> 
> 	ret = pblk_submit_io_sync_sem(pblk, &rqd);
> @@ -822,8 +892,13 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
> 	if (rqd.error) {
> 		pblk_log_write_err(pblk, &rqd);
> 		ret = -EIO;
> +		goto clear_rqd;
> 	}
> 
> +	rq_writes--;
> +	if (rq_writes > 0)
> +		goto next_rq;
> +
> clear_rqd:
> 	pblk_free_rqd_meta(pblk, &rqd);
> 	return ret;
> @@ -1149,7 +1224,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
> 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> 	u64 off;
> 	int bit = -1;
> -	int emeta_secs;
> +	int emeta_secs, smeta_secs;
> 
> 	line->sec_in_line = lm->sec_per_line;
> 
> @@ -1165,13 +1240,19 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
> 	}
> 
> 	/* Mark smeta metadata sectors as bad sectors */
> -	bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line);
> -	off = bit * geo->ws_opt;
> -	bitmap_set(line->map_bitmap, off, lm->smeta_sec);
> +	smeta_secs = lm->smeta_sec;
> +	bit = -1;
> +	while (smeta_secs) {
> +		bit = find_next_zero_bit(line->blk_bitmap, lm->blk_per_line,
> +					bit + 1);
> +		off = bit * geo->ws_opt;
> +		bitmap_set(line->map_bitmap, off, geo->ws_opt);
> +		line->cur_sec = off + geo->ws_opt;
> +		smeta_secs -= geo->ws_opt;
> +	}

The idea with lm->smeta_sec was to abstract the sectors used for smeta
from ws_opt, as this could change in the future.

What do you think about leaving lm->smeta_sec as the sectors needed for
each copy of smeta and then used the lm->smeta_copies to calculate the
total? (This is done in pblk_line_meta_init(), not here). This way, you
can keep the code here without using ws_opt.

> 	line->sec_in_line -= lm->smeta_sec;
> -	line->cur_sec = off + lm->smeta_sec;
> 
> -	if (init && pblk_line_smeta_write(pblk, line, off)) {
> +	if (init && pblk_line_smeta_write(pblk, line)) {
> 		pblk_debug(pblk, "line smeta I/O failed. Retry\n");
> 		return 0;
> 	}
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index b7845f6..e771df6 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -27,6 +27,11 @@ static unsigned int write_buffer_size;
> module_param(write_buffer_size, uint, 0644);
> MODULE_PARM_DESC(write_buffer_size, "number of entries in a write buffer");
> 
> +static unsigned int smeta_copies = 2;

Can we default to 1?

> […]


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 14/18] lightnvm: pblk: GC error handling
  2019-03-14 16:04 ` [PATCH 14/18] lightnvm: pblk: GC error handling Igor Konopko
@ 2019-03-18  7:39   ` Javier González
  2019-03-18 12:14   ` Hans Holmberg
  1 sibling, 0 replies; 69+ messages in thread
From: Javier González @ 2019-03-18  7:39 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 3517 bytes --]

> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> Currently when there is an IO error (or similar) on GC read path, pblk
> still moves this line to free state, what leads to data mismatch issue.
> 
> This patch adds a handling for such an error - after that changes this
> line will be returned to closed state so it can be still in use
> for reading - at least we will be able to return error status to user
> when user tries to read such a data.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-core.c | 8 ++++++++
> drivers/lightnvm/pblk-gc.c   | 5 ++---
> drivers/lightnvm/pblk-read.c | 1 -
> drivers/lightnvm/pblk.h      | 2 ++
> 4 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 4d5cd99..6817f8f 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -1786,6 +1786,14 @@ static void __pblk_line_put(struct pblk *pblk, struct pblk_line *line)
> 
> 	spin_lock(&line->lock);
> 	WARN_ON(line->state != PBLK_LINESTATE_GC);
> +	if (line->w_err_gc->has_gc_err) {
> +		spin_unlock(&line->lock);
> +		pblk_err(pblk, "line %d had errors during GC\n", line->id);
> +		pblk_put_line_back(pblk, line);
> +		line->w_err_gc->has_gc_err = 0;
> +		return;
> +	}
> +
> 	line->state = PBLK_LINESTATE_FREE;
> 	trace_pblk_line_state(pblk_disk_name(pblk), line->id,
> 					line->state);
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index e23b192..63ee205 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -59,7 +59,7 @@ static void pblk_gc_writer_kick(struct pblk_gc *gc)
> 	wake_up_process(gc->gc_writer_ts);
> }
> 
> -static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
> +void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
> {
> 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> 	struct list_head *move_list;
> @@ -98,8 +98,7 @@ static void pblk_gc_line_ws(struct work_struct *work)
> 	/* Read from GC victim block */
> 	ret = pblk_submit_read_gc(pblk, gc_rq);
> 	if (ret) {
> -		pblk_err(pblk, "failed GC read in line:%d (err:%d)\n",
> -								line->id, ret);
> +		line->w_err_gc->has_gc_err = 1;
> 		goto out;
> 	}
> 
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 54422a2..6a77c24 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -475,7 +475,6 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
> 
> 	if (pblk_submit_io_sync(pblk, &rqd)) {
> 		ret = -EIO;
> -		pblk_err(pblk, "GC read request failed\n");
> 		goto err_free_bio;
> 	}
> 
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 5d1040a..52002f5 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -427,6 +427,7 @@ struct pblk_smeta {
> 
> struct pblk_w_err_gc {
> 	int has_write_err;
> +	int has_gc_err;
> 	__le64 *lba_list;
> };
> 
> @@ -909,6 +910,7 @@ void pblk_gc_free_full_lines(struct pblk *pblk);
> void pblk_gc_sysfs_state_show(struct pblk *pblk, int *gc_enabled,
> 			      int *gc_active);
> int pblk_gc_sysfs_force(struct pblk *pblk, int force);
> +void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line);
> 
> /*
>  * pblk rate limiter
> --
> 2.9.5

Looks good to me.


Reviewed-by: Javier González <javier@javigon.com>


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 15/18] lightnvm: pblk: fix in case of lack of lines
  2019-03-14 16:04 ` [PATCH 15/18] lightnvm: pblk: fix in case of lack of lines Igor Konopko
@ 2019-03-18  7:42   ` Javier González
  2019-03-18 13:28     ` Igor Konopko
  0 siblings, 1 reply; 69+ messages in thread
From: Javier González @ 2019-03-18  7:42 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 1188 bytes --]

> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> In case when mapping fails (called from writer thread) due to lack of
> lines, currently we are calling pblk_pipeline_stop(), which waits
> for pending write IOs, so it will lead to the deadlock. Switching
> to __pblk_pipeline_stop() in that case instead will fix that.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-map.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
> index 5408e32..afc10306 100644
> --- a/drivers/lightnvm/pblk-map.c
> +++ b/drivers/lightnvm/pblk-map.c
> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
> 		pblk_line_close_meta(pblk, prev_line);
> 
> 		if (!line) {
> -			pblk_pipeline_stop(pblk);
> +			__pblk_pipeline_stop(pblk);
> 			return -ENOSPC;
> 		}
> 
> --
> 2.9.5

Have you seeing this problem?

Before checking if there is a line, we are closing metadata for the
previous line, so all inflight I/Os should be clear. Can you develop on
the case in which this would happen?


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 10/18] lightnvm: pblk: ensure that emeta is written
  2019-03-14 16:04 ` [PATCH 10/18] lightnvm: pblk: ensure that emeta is written Igor Konopko
  2019-03-17 19:44   ` Matias Bjørling
@ 2019-03-18  7:46   ` Javier González
  1 sibling, 0 replies; 69+ messages in thread
From: Javier González @ 2019-03-18  7:46 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 3559 bytes --]

> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> When we are trying to switch to the new line, we need to ensure that
> emeta for n-2 line is already written. In other case we can end with
> deadlock scenario, when the writer has no more requests to write and
> thus there is no way to trigger emeta writes from writer thread. This
> is a corner case scenario which occurs in a case of multiple writes
> error and thus kind of early line close due to lack of line space.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-core.c  |  2 ++
> drivers/lightnvm/pblk-write.c | 24 ++++++++++++++++++++++++
> drivers/lightnvm/pblk.h       |  1 +
> 3 files changed, 27 insertions(+)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 38e26fe..a683d1f 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -1001,6 +1001,7 @@ static void pblk_line_setup_metadata(struct pblk_line *line,
> 				     struct pblk_line_mgmt *l_mg,
> 				     struct pblk_line_meta *lm)
> {
> +	struct pblk *pblk = container_of(l_mg, struct pblk, l_mg);
> 	int meta_line;
> 
> 	lockdep_assert_held(&l_mg->free_lock);
> @@ -1009,6 +1010,7 @@ static void pblk_line_setup_metadata(struct pblk_line *line,
> 	meta_line = find_first_zero_bit(&l_mg->meta_bitmap, PBLK_DATA_LINES);
> 	if (meta_line == PBLK_DATA_LINES) {
> 		spin_unlock(&l_mg->free_lock);
> +		pblk_write_emeta_force(pblk);
> 		io_schedule();
> 		spin_lock(&l_mg->free_lock);
> 		goto retry_meta;
> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
> index 4e63f9b..4fbb9b2 100644
> --- a/drivers/lightnvm/pblk-write.c
> +++ b/drivers/lightnvm/pblk-write.c
> @@ -505,6 +505,30 @@ static struct pblk_line *pblk_should_submit_meta_io(struct pblk *pblk,
> 	return meta_line;
> }
> 
> +void pblk_write_emeta_force(struct pblk *pblk)
> +{
> +	struct pblk_line_meta *lm = &pblk->lm;
> +	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> +	struct pblk_line *meta_line;
> +
> +	while (true) {
> +		spin_lock(&l_mg->close_lock);
> +		if (list_empty(&l_mg->emeta_list)) {
> +			spin_unlock(&l_mg->close_lock);
> +			break;
> +		}
> +		meta_line = list_first_entry(&l_mg->emeta_list,
> +						struct pblk_line, list);
> +		if (meta_line->emeta->mem >= lm->emeta_len[0]) {
> +			spin_unlock(&l_mg->close_lock);
> +			io_schedule();
> +			continue;
> +		}
> +		spin_unlock(&l_mg->close_lock);
> +		pblk_submit_meta_io(pblk, meta_line);
> +	}
> +}
> +
> static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
> {
> 	struct ppa_addr erase_ppa;
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 0a85990..a42bbfb 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -877,6 +877,7 @@ int pblk_write_ts(void *data);
> void pblk_write_timer_fn(struct timer_list *t);
> void pblk_write_should_kick(struct pblk *pblk);
> void pblk_write_kick(struct pblk *pblk);
> +void pblk_write_emeta_force(struct pblk *pblk);
> 
> /*
>  * pblk read path
> --
> 2.9.5

This should not be possible. In pblk_map_page_data() we wait to close
emeta for the previous line before starting to write to the new line
(look at pblk_line_close_meta(). Note that we schedule emeta writes for
the previous line as we write the current line, so this function is only
called in extreme cases. Otherwise, this would show jittering.

Do you have a different case in mind?


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 16/18] lightnvm: pblk: use nvm_rq_to_ppa_list()
  2019-03-14 16:04 ` [PATCH 16/18] lightnvm: pblk: use nvm_rq_to_ppa_list() Igor Konopko
@ 2019-03-18  7:48   ` Javier González
  0 siblings, 0 replies; 69+ messages in thread
From: Javier González @ 2019-03-18  7:48 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 7526 bytes --]

> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> This patch replaces few remaining usages of rqd->ppa_list[] with
> existing nvm_rq_to_ppa_list() helpers. This is needed for theoretical
> devices with ws_min/ws_opt equal to 1.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-core.c     | 26 ++++++++++++++------------
> drivers/lightnvm/pblk-recovery.c | 13 ++++++++-----
> 2 files changed, 22 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 6817f8f..7338a44 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -562,11 +562,9 @@ int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd)
> 
> int pblk_submit_io_sync_sem(struct pblk *pblk, struct nvm_rq *rqd)
> {
> -	struct ppa_addr *ppa_list;
> +	struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
> 	int ret;
> 
> -	ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
> -
> 	pblk_down_chunk(pblk, ppa_list[0]);
> 	ret = pblk_submit_io_sync(pblk, rqd);
> 	pblk_up_chunk(pblk, ppa_list[0]);
> @@ -727,6 +725,7 @@ static int pblk_line_smeta_read_copy(struct pblk *pblk,
> 	struct nvm_geo *geo = &dev->geo;
> 	struct pblk_line_meta *lm = &pblk->lm;
> 	struct bio *bio;
> +	struct ppa_addr *ppa_list;
> 	struct nvm_rq rqd;
> 	int i, ret;
> 
> @@ -750,6 +749,7 @@ static int pblk_line_smeta_read_copy(struct pblk *pblk,
> 	rqd.opcode = NVM_OP_PREAD;
> 	rqd.nr_ppas = lm->smeta_sec / lm->smeta_copies;
> 	rqd.is_seq = 1;
> +	ppa_list = nvm_rq_to_ppa_list(&rqd);
> 
> 	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
> 		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line->id);
> @@ -761,7 +761,7 @@ static int pblk_line_smeta_read_copy(struct pblk *pblk,
> 			pos = pblk_ppa_to_pos(geo, ppa);
> 		}
> 
> -		rqd.ppa_list[i] = ppa;
> +		ppa_list[i] = ppa;
> 		pblk_get_meta(pblk, rqd.meta_list, i)->lba =
> 				  cpu_to_le64(ADDR_EMPTY);
> 	}
> @@ -812,6 +812,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line)
> 	struct nvm_geo *geo = &dev->geo;
> 	struct pblk_line_meta *lm = &pblk->lm;
> 	struct bio *bio;
> +	struct ppa_addr *ppa_list;
> 	struct nvm_rq rqd;
> 	__le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
> 	__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
> @@ -863,6 +864,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line)
> 	rqd.opcode = NVM_OP_PWRITE;
> 	rqd.nr_ppas = smeta_cpy_sec;
> 	rqd.is_seq = 1;
> +	ppa_list = nvm_rq_to_ppa_list(&rqd);
> 
> 	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
> 		void *meta_list = rqd.meta_list;
> @@ -875,7 +877,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line)
> 			pos = pblk_ppa_to_pos(geo, ppa);
> 		}
> 
> -		rqd.ppa_list[i] = ppa;
> +		ppa_list[i] = ppa;
> 		pblk_get_meta(pblk, meta_list, i)->lba = addr_empty;
> 		lba_list[paddr] = addr_empty;
> 	}
> @@ -911,8 +913,9 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
> 	struct nvm_geo *geo = &dev->geo;
> 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> 	struct pblk_line_meta *lm = &pblk->lm;
> -	void *ppa_list, *meta_list;
> +	void *ppa_list_buf, *meta_list;
> 	struct bio *bio;
> +	struct ppa_addr *ppa_list;
> 	struct nvm_rq rqd;
> 	u64 paddr = line->emeta_ssec;
> 	dma_addr_t dma_ppa_list, dma_meta_list;
> @@ -928,7 +931,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
> 	if (!meta_list)
> 		return -ENOMEM;
> 
> -	ppa_list = meta_list + pblk_dma_meta_size(pblk);
> +	ppa_list_buf = meta_list + pblk_dma_meta_size(pblk);
> 	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
> 
> next_rq:
> @@ -949,11 +952,12 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
> 
> 	rqd.bio = bio;
> 	rqd.meta_list = meta_list;
> -	rqd.ppa_list = ppa_list;
> +	rqd.ppa_list = ppa_list_buf;
> 	rqd.dma_meta_list = dma_meta_list;
> 	rqd.dma_ppa_list = dma_ppa_list;
> 	rqd.opcode = NVM_OP_PREAD;
> 	rqd.nr_ppas = rq_ppas;
> +	ppa_list = nvm_rq_to_ppa_list(&rqd);
> 
> 	for (i = 0; i < rqd.nr_ppas; ) {
> 		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line_id);
> @@ -981,7 +985,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
> 		}
> 
> 		for (j = 0; j < min; j++, i++, paddr++)
> -			rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line_id);
> +			ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line_id);
> 	}
> 
> 	ret = pblk_submit_io_sync(pblk, &rqd);
> @@ -1608,11 +1612,9 @@ void pblk_ppa_to_line_put(struct pblk *pblk, struct ppa_addr ppa)
> 
> void pblk_rq_to_line_put(struct pblk *pblk, struct nvm_rq *rqd)
> {
> -	struct ppa_addr *ppa_list;
> +	struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
> 	int i;
> 
> -	ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
> -
> 	for (i = 0; i < rqd->nr_ppas; i++)
> 		pblk_ppa_to_line_put(pblk, ppa_list[i]);
> }
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index 4e4db38..4051b93 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -185,6 +185,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
> 	struct pblk_pad_rq *pad_rq;
> 	struct nvm_rq *rqd;
> 	struct bio *bio;
> +	struct ppa_addr *ppa_list;
> 	void *data;
> 	__le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
> 	u64 w_ptr = line->cur_sec;
> @@ -245,6 +246,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
> 	rqd->end_io = pblk_end_io_recov;
> 	rqd->private = pad_rq;
> 
> +	ppa_list = nvm_rq_to_ppa_list(rqd);
> 	meta_list = rqd->meta_list;
> 
> 	for (i = 0; i < rqd->nr_ppas; ) {
> @@ -272,17 +274,17 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
> 			lba_list[w_ptr] = addr_empty;
> 			meta = pblk_get_meta(pblk, meta_list, i);
> 			meta->lba = addr_empty;
> -			rqd->ppa_list[i] = dev_ppa;
> +			ppa_list[i] = dev_ppa;
> 		}
> 	}
> 
> 	kref_get(&pad_rq->ref);
> -	pblk_down_chunk(pblk, rqd->ppa_list[0]);
> +	pblk_down_chunk(pblk, ppa_list[0]);
> 
> 	ret = pblk_submit_io(pblk, rqd);
> 	if (ret) {
> 		pblk_err(pblk, "I/O submission failed: %d\n", ret);
> -		pblk_up_chunk(pblk, rqd->ppa_list[0]);
> +		pblk_up_chunk(pblk, ppa_list[0]);
> 		kref_put(&pad_rq->ref, pblk_recov_complete);
> 		pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
> 		bio_put(bio);
> @@ -426,6 +428,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
> 	rqd->ppa_list = ppa_list;
> 	rqd->dma_ppa_list = dma_ppa_list;
> 	rqd->dma_meta_list = dma_meta_list;
> +	ppa_list = nvm_rq_to_ppa_list(rqd);
> 
> 	if (pblk_io_aligned(pblk, rq_ppas))
> 		rqd->is_seq = 1;
> @@ -444,7 +447,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
> 		}
> 
> 		for (j = 0; j < pblk->min_write_pgs; j++, i++)
> -			rqd->ppa_list[i] =
> +			ppa_list[i] =
> 				addr_to_gen_ppa(pblk, paddr + j, line->id);
> 	}
> 
> @@ -500,7 +503,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
> 			continue;
> 
> 		line->nr_valid_lbas++;
> -		pblk_update_map(pblk, lba, rqd->ppa_list[i]);
> +		pblk_update_map(pblk, lba, ppa_list[i]);
> 	}
> 
> 	left_ppas -= rq_ppas;
> --
> 2.9.5

This is a good fix. Thanks!

Reviewed-by: Javier González <javier@javigon.com>


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 17/18] lightnvm: allow to use full device path
  2019-03-14 16:04 ` [PATCH 17/18] lightnvm: allow to use full device path Igor Konopko
@ 2019-03-18  7:49   ` Javier González
  2019-03-18 10:28   ` Hans Holmberg
  1 sibling, 0 replies; 69+ messages in thread
From: Javier González @ 2019-03-18  7:49 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 2798 bytes --]

> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> This patch adds the possibility to provide full device path (like
> /dev/nvme0n1) when specifying device on top of which pblk instance
> should be created/removed.
> 
> This makes creation of targets from nvme-cli (or other ioctl based
> tools) more unified with other commands in comparison with current
> situation where almost all commands uses full device path with except
> of lightnvm creation/removal parameter which uses just 'nvme0n1'
> naming convention. After this changes both approach will be valid.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/core.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> index c01f83b..838c3d8 100644
> --- a/drivers/lightnvm/core.c
> +++ b/drivers/lightnvm/core.c
> @@ -1195,6 +1195,21 @@ void nvm_unregister(struct nvm_dev *dev)
> }
> EXPORT_SYMBOL(nvm_unregister);
> 
> +#define PREFIX_STR "/dev/"
> +static void nvm_normalize_path(char *path)
> +{
> +	path[DISK_NAME_LEN - 1] = '\0';
> +	if (!memcmp(PREFIX_STR, path,
> +				sizeof(char) * strlen(PREFIX_STR))) {
> +		/*
> +		 * User provide name in '/dev/nvme0n1' format,
> +		 * so we need to skip '/dev/' for comparison
> +		 */
> +		memmove(path, path + sizeof(char) * strlen(PREFIX_STR),
> +			(DISK_NAME_LEN - strlen(PREFIX_STR)) * sizeof(char));
> +	}
> +}
> +
> static int __nvm_configure_create(struct nvm_ioctl_create *create)
> {
> 	struct nvm_dev *dev;
> @@ -1304,9 +1319,9 @@ static long nvm_ioctl_dev_create(struct file *file, void __user *arg)
> 		return -EINVAL;
> 	}
> 
> -	create.dev[DISK_NAME_LEN - 1] = '\0';
> +	nvm_normalize_path(create.dev);
> +	nvm_normalize_path(create.tgtname);
> 	create.tgttype[NVM_TTYPE_NAME_MAX - 1] = '\0';
> -	create.tgtname[DISK_NAME_LEN - 1] = '\0';
> 
> 	if (create.flags != 0) {
> 		__u32 flags = create.flags;
> @@ -1333,7 +1348,7 @@ static long nvm_ioctl_dev_remove(struct file *file, void __user *arg)
> 	if (copy_from_user(&remove, arg, sizeof(struct nvm_ioctl_remove)))
> 		return -EFAULT;
> 
> -	remove.tgtname[DISK_NAME_LEN - 1] = '\0';
> +	nvm_normalize_path(remove.tgtname);
> 
> 	if (remove.flags != 0) {
> 		pr_err("nvm: no flags supported\n");
> @@ -1373,8 +1388,6 @@ static long nvm_ioctl_dev_factory(struct file *file, void __user *arg)
> 	if (copy_from_user(&fact, arg, sizeof(struct nvm_ioctl_dev_factory)))
> 		return -EFAULT;
> 
> -	fact.dev[DISK_NAME_LEN - 1] = '\0';
> -
> 	if (fact.flags & ~(NVM_FACTORY_NR_BITS - 1))
> 		return -EINVAL;
> 
> --
> 2.9.5

Looks good to me.

Reviewed-by: Javier González <javier@javigon.com>


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 17/18] lightnvm: allow to use full device path
  2019-03-14 16:04 ` [PATCH 17/18] lightnvm: allow to use full device path Igor Konopko
  2019-03-18  7:49   ` Javier González
@ 2019-03-18 10:28   ` Hans Holmberg
  2019-03-18 13:18     ` Igor Konopko
  1 sibling, 1 reply; 69+ messages in thread
From: Hans Holmberg @ 2019-03-18 10:28 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Matias Bjorling, Javier González, Hans Holmberg, linux-block

On Thu, Mar 14, 2019 at 5:11 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
> This patch adds the possibility to provide full device path (like
> /dev/nvme0n1) when specifying device on top of which pblk instance
> should be created/removed.
>
> This makes creation of targets from nvme-cli (or other ioctl based
> tools) more unified with other commands in comparison with current
> situation where almost all commands uses full device path with except
> of lightnvm creation/removal parameter which uses just 'nvme0n1'
> naming convention. After this changes both approach will be valid.
>
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>  drivers/lightnvm/core.c | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> index c01f83b..838c3d8 100644
> --- a/drivers/lightnvm/core.c
> +++ b/drivers/lightnvm/core.c
> @@ -1195,6 +1195,21 @@ void nvm_unregister(struct nvm_dev *dev)
>  }
>  EXPORT_SYMBOL(nvm_unregister);
>
> +#define PREFIX_STR "/dev/"
> +static void nvm_normalize_path(char *path)
> +{
> +       path[DISK_NAME_LEN - 1] = '\0';
> +       if (!memcmp(PREFIX_STR, path,
> +                               sizeof(char) * strlen(PREFIX_STR))) {
> +               /*
> +                * User provide name in '/dev/nvme0n1' format,
> +                * so we need to skip '/dev/' for comparison
> +                */
> +               memmove(path, path + sizeof(char) * strlen(PREFIX_STR),
> +                       (DISK_NAME_LEN - strlen(PREFIX_STR)) * sizeof(char));
> +       }
> +}
> +

I don't like this. Why add string parsing to the kernel? Can't this
feature be added to the nvme tool?

>  static int __nvm_configure_create(struct nvm_ioctl_create *create)
>  {
>         struct nvm_dev *dev;
> @@ -1304,9 +1319,9 @@ static long nvm_ioctl_dev_create(struct file *file, void __user *arg)
>                 return -EINVAL;
>         }
>
> -       create.dev[DISK_NAME_LEN - 1] = '\0';
> +       nvm_normalize_path(create.dev);
> +       nvm_normalize_path(create.tgtname);
>         create.tgttype[NVM_TTYPE_NAME_MAX - 1] = '\0';
> -       create.tgtname[DISK_NAME_LEN - 1] = '\0';
>
>         if (create.flags != 0) {
>                 __u32 flags = create.flags;
> @@ -1333,7 +1348,7 @@ static long nvm_ioctl_dev_remove(struct file *file, void __user *arg)
>         if (copy_from_user(&remove, arg, sizeof(struct nvm_ioctl_remove)))
>                 return -EFAULT;
>
> -       remove.tgtname[DISK_NAME_LEN - 1] = '\0';
> +       nvm_normalize_path(remove.tgtname);
>
>         if (remove.flags != 0) {
>                 pr_err("nvm: no flags supported\n");
> @@ -1373,8 +1388,6 @@ static long nvm_ioctl_dev_factory(struct file *file, void __user *arg)
>         if (copy_from_user(&fact, arg, sizeof(struct nvm_ioctl_dev_factory)))
>                 return -EFAULT;
>
> -       fact.dev[DISK_NAME_LEN - 1] = '\0';
> -
>         if (fact.flags & ~(NVM_FACTORY_NR_BITS - 1))
>                 return -EINVAL;
>
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 05/18] lightnvm: pblk: propagate errors when reading meta
  2019-03-14 16:04 ` [PATCH 05/18] lightnvm: pblk: propagate errors when reading meta Igor Konopko
  2019-03-16 22:48   ` Javier González
@ 2019-03-18 11:54   ` Hans Holmberg
  1 sibling, 0 replies; 69+ messages in thread
From: Hans Holmberg @ 2019-03-18 11:54 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Matias Bjorling, Javier González, Hans Holmberg, linux-block

Nice!

Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>

On Thu, Mar 14, 2019 at 5:07 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
> Currently when smeta/emeta/oob is read errors are not always propagated
> correctly. This patch changes that behaviour and propagates all the
> error codes except of high ecc read warning status.
>
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>  drivers/lightnvm/pblk-core.c     | 9 +++++++--
>  drivers/lightnvm/pblk-recovery.c | 2 +-
>  2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 39280c1..38e26fe 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -761,8 +761,10 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
>
>         atomic_dec(&pblk->inflight_io);
>
> -       if (rqd.error)
> +       if (rqd.error && rqd.error != NVM_RSP_WARN_HIGHECC) {
>                 pblk_log_read_err(pblk, &rqd);
> +               ret = -EIO;
> +       }
>
>  clear_rqd:
>         pblk_free_rqd_meta(pblk, &rqd);
> @@ -916,8 +918,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>
>         atomic_dec(&pblk->inflight_io);
>
> -       if (rqd.error)
> +       if (rqd.error && rqd.error != NVM_RSP_WARN_HIGHECC) {
>                 pblk_log_read_err(pblk, &rqd);
> +               ret = -EIO;
> +               goto free_rqd_dma;
> +       }
>
>         emeta_buf += rq_len;
>         left_ppas -= rq_ppas;
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index bcd3633..688fdeb 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -450,7 +450,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>         atomic_dec(&pblk->inflight_io);
>
>         /* If a read fails, do a best effort by padding the line and retrying */
> -       if (rqd->error) {
> +       if (rqd->error && rqd->error != NVM_RSP_WARN_HIGHECC) {
>                 int pad_distance, ret;
>
>                 if (padded) {
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 08/18] lightnvm: pblk: fix spin_unlock order
  2019-03-14 16:04 ` [PATCH 08/18] lightnvm: pblk: fix spin_unlock order Igor Konopko
  2019-03-16 23:49   ` Javier González
@ 2019-03-18 11:55   ` Hans Holmberg
  1 sibling, 0 replies; 69+ messages in thread
From: Hans Holmberg @ 2019-03-18 11:55 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Matias Bjorling, Javier González, Hans Holmberg, linux-block

On Thu, Mar 14, 2019 at 5:08 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
> In pblk_rb_tear_down_check() spin_unlock() functions are not
> called in proper order. This patch fixes that.

Can you add a Fixes: ?

Thanks.

Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>

>
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>  drivers/lightnvm/pblk-rb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
> index 03c241b..3555014 100644
> --- a/drivers/lightnvm/pblk-rb.c
> +++ b/drivers/lightnvm/pblk-rb.c
> @@ -799,8 +799,8 @@ int pblk_rb_tear_down_check(struct pblk_rb *rb)
>         }
>
>  out:
> -       spin_unlock(&rb->w_lock);
>         spin_unlock_irq(&rb->s_lock);
> +       spin_unlock(&rb->w_lock);
>
>         return ret;
>  }
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 09/18] lightnvm: pblk: kick writer on write recovery path
  2019-03-14 16:04 ` [PATCH 09/18] lightnvm: pblk: kick writer on write recovery path Igor Konopko
  2019-03-16 23:54   ` Javier González
@ 2019-03-18 11:58   ` Hans Holmberg
  1 sibling, 0 replies; 69+ messages in thread
From: Hans Holmberg @ 2019-03-18 11:58 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Matias Bjorling, Javier González, Hans Holmberg, linux-block

On Thu, Mar 14, 2019 at 5:08 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
> In case of write recovery path, there is a chance that writer thread
> is not active, so for sanity it would be good to always kick it.

Makes sense.

Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
>
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>  drivers/lightnvm/pblk-write.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
> index 6593dea..4e63f9b 100644
> --- a/drivers/lightnvm/pblk-write.c
> +++ b/drivers/lightnvm/pblk-write.c
> @@ -228,6 +228,7 @@ static void pblk_submit_rec(struct work_struct *work)
>         mempool_free(recovery, &pblk->rec_pool);
>
>         atomic_dec(&pblk->inflight_io);
> +       pblk_write_kick(pblk);
>  }
>
>
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 14/18] lightnvm: pblk: GC error handling
  2019-03-14 16:04 ` [PATCH 14/18] lightnvm: pblk: GC error handling Igor Konopko
  2019-03-18  7:39   ` Javier González
@ 2019-03-18 12:14   ` Hans Holmberg
  2019-03-18 13:22     ` Igor Konopko
  1 sibling, 1 reply; 69+ messages in thread
From: Hans Holmberg @ 2019-03-18 12:14 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Matias Bjorling, Javier González, Hans Holmberg, linux-block

On Thu, Mar 14, 2019 at 5:09 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
> Currently when there is an IO error (or similar) on GC read path, pblk
> still moves this line to free state, what leads to data mismatch issue.
>
> This patch adds a handling for such an error - after that changes this
> line will be returned to closed state so it can be still in use
> for reading - at least we will be able to return error status to user
> when user tries to read such a data.
>
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>  drivers/lightnvm/pblk-core.c | 8 ++++++++
>  drivers/lightnvm/pblk-gc.c   | 5 ++---
>  drivers/lightnvm/pblk-read.c | 1 -
>  drivers/lightnvm/pblk.h      | 2 ++
>  4 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 4d5cd99..6817f8f 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -1786,6 +1786,14 @@ static void __pblk_line_put(struct pblk *pblk, struct pblk_line *line)
>
>         spin_lock(&line->lock);
>         WARN_ON(line->state != PBLK_LINESTATE_GC);
> +       if (line->w_err_gc->has_gc_err) {
> +               spin_unlock(&line->lock);
> +               pblk_err(pblk, "line %d had errors during GC\n", line->id);
> +               pblk_put_line_back(pblk, line);
> +               line->w_err_gc->has_gc_err = 0;

In a real bummer corner case, the line might have had a write error as well
(line->w_err_gc->has_write_err == true)

In that case we need to inform the rate limiter:
pblk_rl_werr_line_out(&pblk->rl);

> +               return;
> +       }
> +
>         line->state = PBLK_LINESTATE_FREE;
>         trace_pblk_line_state(pblk_disk_name(pblk), line->id,
>                                         line->state);
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index e23b192..63ee205 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -59,7 +59,7 @@ static void pblk_gc_writer_kick(struct pblk_gc *gc)
>         wake_up_process(gc->gc_writer_ts);
>  }
>
> -static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
> +void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
>  {
>         struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>         struct list_head *move_list;
> @@ -98,8 +98,7 @@ static void pblk_gc_line_ws(struct work_struct *work)
>         /* Read from GC victim block */
>         ret = pblk_submit_read_gc(pblk, gc_rq);
>         if (ret) {
> -               pblk_err(pblk, "failed GC read in line:%d (err:%d)\n",
> -                                                               line->id, ret);
> +               line->w_err_gc->has_gc_err = 1;
>                 goto out;
>         }
>
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 54422a2..6a77c24 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -475,7 +475,6 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
>
>         if (pblk_submit_io_sync(pblk, &rqd)) {
>                 ret = -EIO;
> -               pblk_err(pblk, "GC read request failed\n");
>                 goto err_free_bio;
>         }
>
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 5d1040a..52002f5 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -427,6 +427,7 @@ struct pblk_smeta {
>
>  struct pblk_w_err_gc {
>         int has_write_err;
> +       int has_gc_err;
>         __le64 *lba_list;
>  };
>
> @@ -909,6 +910,7 @@ void pblk_gc_free_full_lines(struct pblk *pblk);
>  void pblk_gc_sysfs_state_show(struct pblk *pblk, int *gc_enabled,
>                               int *gc_active);
>  int pblk_gc_sysfs_force(struct pblk *pblk, int force);
> +void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line);
>
>  /*
>   * pblk rate limiter
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 03/18] lightnvm: pblk: simplify partial read path
  2019-03-16 22:28       ` Javier González
@ 2019-03-18 12:44         ` Igor Konopko
  0 siblings, 0 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-18 12:44 UTC (permalink / raw)
  To: Javier González
  Cc: Heiner Litz, Matias Bjørling, Hans Holmberg, linux-block



On 16.03.2019 23:28, Javier González wrote:
>> On 15 Mar 2019, at 02.52, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>>
>>
>> On 14.03.2019 22:35, Heiner Litz wrote:
>>> On Thu, Mar 14, 2019 at 9:07 AM Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>> This patch changes the approach to handling partial read path.
>>>>
>>>> In old approach merging of data from round buffer and drive was fully
>>>> made by drive. This had some disadvantages - code was complex and
>>>> relies on bio internals, so it was hard to maintain and was strongly
>>>> dependent on bio changes.
>>>>
>>>> In new approach most of the handling is done mostly by block layer
>>>> functions such as bio_split(), bio_chain() and generic_make request()
>>>> and generally is less complex and easier to maintain. Below some more
>>>> details of the new approach.
>>>>
>>>> When read bio arrives, it is cloned for pblk internal purposes. All
>>>> the L2P mapping, which includes copying data from round buffer to bio
>>>> and thus bio_advance() calls is done on the cloned bio, so the original
>>>> bio is untouched. Later if we found that we have partial read case, we
>>>> still have original bio untouched, so we can split it and continue to
>>>> process only first 4K of it in current context, when the rest will be
>>>> called as separate bio request which is passed to generic_make_request()
>>>> for further processing.
>>>>
>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>> ---
>>>>   drivers/lightnvm/pblk-read.c | 242 ++++++++-----------------------------------
>>>>   drivers/lightnvm/pblk.h      |  12 ---
>>>>   2 files changed, 41 insertions(+), 213 deletions(-)
>>>>
>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>> index 6569746..54422a2 100644
>>>> --- a/drivers/lightnvm/pblk-read.c
>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>> @@ -222,171 +222,6 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>>>          __pblk_end_io_read(pblk, rqd, true);
>>>>   }
>>>>
>>>> -static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>> -{
>>>> -       struct pblk *pblk = rqd->private;
>>>> -       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>> -       struct pblk_pr_ctx *pr_ctx = r_ctx->private;
>>>> -       struct pblk_sec_meta *meta;
>>>> -       struct bio *new_bio = rqd->bio;
>>>> -       struct bio *bio = pr_ctx->orig_bio;
>>>> -       struct bio_vec src_bv, dst_bv;
>>>> -       void *meta_list = rqd->meta_list;
>>>> -       int bio_init_idx = pr_ctx->bio_init_idx;
>>>> -       unsigned long *read_bitmap = pr_ctx->bitmap;
>>>> -       int nr_secs = pr_ctx->orig_nr_secs;
>>>> -       int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
>>>> -       void *src_p, *dst_p;
>>>> -       int hole, i;
>>>> -
>>>> -       if (unlikely(nr_holes == 1)) {
>>>> -               struct ppa_addr ppa;
>>>> -
>>>> -               ppa = rqd->ppa_addr;
>>>> -               rqd->ppa_list = pr_ctx->ppa_ptr;
>>>> -               rqd->dma_ppa_list = pr_ctx->dma_ppa_list;
>>>> -               rqd->ppa_list[0] = ppa;
>>>> -       }
>>>> -
>>>> -       for (i = 0; i < nr_secs; i++) {
>>>> -               meta = pblk_get_meta(pblk, meta_list, i);
>>>> -               pr_ctx->lba_list_media[i] = le64_to_cpu(meta->lba);
>>>> -               meta->lba = cpu_to_le64(pr_ctx->lba_list_mem[i]);
>>>> -       }
>>>> -
>>>> -       /* Fill the holes in the original bio */
>>>> -       i = 0;
>>>> -       hole = find_first_zero_bit(read_bitmap, nr_secs);
>>>> -       do {
>>>> -               struct pblk_line *line;
>>>> -
>>>> -               line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
>>>> -               kref_put(&line->ref, pblk_line_put);
>>>> -
>>>> -               meta = pblk_get_meta(pblk, meta_list, hole);
>>>> -               meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]);
>>>> -
>>>> -               src_bv = new_bio->bi_io_vec[i++];
>>>> -               dst_bv = bio->bi_io_vec[bio_init_idx + hole];
>>>> -
>>>> -               src_p = kmap_atomic(src_bv.bv_page);
>>>> -               dst_p = kmap_atomic(dst_bv.bv_page);
>>>> -
>>>> -               memcpy(dst_p + dst_bv.bv_offset,
>>>> -                       src_p + src_bv.bv_offset,
>>>> -                       PBLK_EXPOSED_PAGE_SIZE);
>>>> -
>>>> -               kunmap_atomic(src_p);
>>>> -               kunmap_atomic(dst_p);
>>>> -
>>>> -               mempool_free(src_bv.bv_page, &pblk->page_bio_pool);
>>>> -
>>>> -               hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1);
>>>> -       } while (hole < nr_secs);
>>>> -
>>>> -       bio_put(new_bio);
>>>> -       kfree(pr_ctx);
>>>> -
>>>> -       /* restore original request */
>>>> -       rqd->bio = NULL;
>>>> -       rqd->nr_ppas = nr_secs;
>>>> -
>>>> -       pblk_end_user_read(bio, rqd->error);
>>>> -       __pblk_end_io_read(pblk, rqd, false);
>>>> -}
>>>> -
>>>> -static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>> -                           unsigned int bio_init_idx,
>>>> -                           unsigned long *read_bitmap,
>>>> -                           int nr_holes)
>>>> -{
>>>> -       void *meta_list = rqd->meta_list;
>>>> -       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>> -       struct pblk_pr_ctx *pr_ctx;
>>>> -       struct bio *new_bio, *bio = r_ctx->private;
>>>> -       int nr_secs = rqd->nr_ppas;
>>>> -       int i;
>>>> -
>>>> -       new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>>>> -
>>>> -       if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
>>>> -               goto fail_bio_put;
>>>> -
>>>> -       if (nr_holes != new_bio->bi_vcnt) {
>>>> -               WARN_ONCE(1, "pblk: malformed bio\n");
>>>> -               goto fail_free_pages;
>>>> -       }
>>>> -
>>>> -       pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
>>>> -       if (!pr_ctx)
>>>> -               goto fail_free_pages;
>>>> -
>>>> -       for (i = 0; i < nr_secs; i++) {
>>>> -               struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
>>>> -
>>>> -               pr_ctx->lba_list_mem[i] = le64_to_cpu(meta->lba);
>>>> -       }
>>>> -
>>>> -       new_bio->bi_iter.bi_sector = 0; /* internal bio */
>>>> -       bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>>>> -
>>>> -       rqd->bio = new_bio;
>>>> -       rqd->nr_ppas = nr_holes;
>>>> -
>>>> -       pr_ctx->orig_bio = bio;
>>>> -       bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
>>>> -       pr_ctx->bio_init_idx = bio_init_idx;
>>>> -       pr_ctx->orig_nr_secs = nr_secs;
>>>> -       r_ctx->private = pr_ctx;
>>>> -
>>>> -       if (unlikely(nr_holes == 1)) {
>>>> -               pr_ctx->ppa_ptr = rqd->ppa_list;
>>>> -               pr_ctx->dma_ppa_list = rqd->dma_ppa_list;
>>>> -               rqd->ppa_addr = rqd->ppa_list[0];
>>>> -       }
>>>> -       return 0;
>>>> -
>>>> -fail_free_pages:
>>>> -       pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
>>>> -fail_bio_put:
>>>> -       bio_put(new_bio);
>>>> -
>>>> -       return -ENOMEM;
>>>> -}
>>>> -
>>>> -static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
>>>> -                                unsigned int bio_init_idx,
>>>> -                                unsigned long *read_bitmap, int nr_secs)
>>>> -{
>>>> -       int nr_holes;
>>>> -       int ret;
>>>> -
>>>> -       nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
>>>> -
>>>> -       if (pblk_setup_partial_read(pblk, rqd, bio_init_idx, read_bitmap,
>>>> -                                   nr_holes))
>>>> -               return NVM_IO_ERR;
>>>> -
>>>> -       rqd->end_io = pblk_end_partial_read;
>>>> -
>>>> -       ret = pblk_submit_io(pblk, rqd);
>>>> -       if (ret) {
>>>> -               bio_put(rqd->bio);
>>>> -               pblk_err(pblk, "partial read IO submission failed\n");
>>>> -               goto err;
>>>> -       }
>>>> -
>>>> -       return NVM_IO_OK;
>>>> -
>>>> -err:
>>>> -       pblk_err(pblk, "failed to perform partial read\n");
>>>> -
>>>> -       /* Free allocated pages in new bio */
>>>> -       pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt);
>>>> -       __pblk_end_io_read(pblk, rqd, false);
>>>> -       return NVM_IO_ERR;
>>>> -}
>>>> -
>>>>   static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>                           sector_t lba, unsigned long *read_bitmap)
>>>>   {
>>>> @@ -432,11 +267,11 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
>>>>   {
>>>>          struct nvm_tgt_dev *dev = pblk->dev;
>>>>          struct request_queue *q = dev->q;
>>>> +       struct bio *split_bio, *int_bio;
>>>>          sector_t blba = pblk_get_lba(bio);
>>>>          unsigned int nr_secs = pblk_get_secs(bio);
>>>>          struct pblk_g_ctx *r_ctx;
>>>>          struct nvm_rq *rqd;
>>>> -       unsigned int bio_init_idx;
>>>>          DECLARE_BITMAP(read_bitmap, NVM_MAX_VLBA);
>>>>          int ret = NVM_IO_ERR;
>>>>
>>>> @@ -456,61 +291,66 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
>>>>          r_ctx = nvm_rq_to_pdu(rqd);
>>>>          r_ctx->start_time = jiffies;
>>>>          r_ctx->lba = blba;
>>>> -       r_ctx->private = bio; /* original bio */
>>>>
>>>> -       /* Save the index for this bio's start. This is needed in case
>>>> -        * we need to fill a partial read.
>>>> +       /* Clone read bio to deal with:
>>>> +        * -usage of bio_advance() when memcpy data from round buffer
>>>> +        * -read errors in case of reading from device
>>>>           */
>>>> -       bio_init_idx = pblk_get_bi_idx(bio);
>>>> +       int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
>>>> +       if (!int_bio)
>>>> +               return NVM_IO_ERR;
>>>>
>>>>          if (pblk_alloc_rqd_meta(pblk, rqd))
>>>>                  goto fail_rqd_free;
>>>>
>>>>          if (nr_secs > 1)
>>>> -               pblk_read_ppalist_rq(pblk, rqd, bio, blba, read_bitmap);
>>>> +               pblk_read_ppalist_rq(pblk, rqd, int_bio, blba, read_bitmap);
>>>>          else
>>>> -               pblk_read_rq(pblk, rqd, bio, blba, read_bitmap);
>>>> +               pblk_read_rq(pblk, rqd, int_bio, blba, read_bitmap);
>>>> +
>>>> +split_retry:
>>>> +       r_ctx->private = bio; /* original bio */
>>>>
>>>> -       if (bitmap_full(read_bitmap, nr_secs)) {
>>>> +       if (bitmap_full(read_bitmap, rqd->nr_ppas)) {
>>>> +               bio_put(int_bio);
>>>>                  atomic_inc(&pblk->inflight_io);
>>>>                  __pblk_end_io_read(pblk, rqd, false);
>>>>                  return NVM_IO_DONE;
>>>>          }
>>>>
>>>> -       /* All sectors are to be read from the device */
>>>> -       if (bitmap_empty(read_bitmap, rqd->nr_ppas)) {
>>>> -               struct bio *int_bio = NULL;
>>>> +       if (!bitmap_empty(read_bitmap, rqd->nr_ppas)) {
>>>> +               /* The read bio request could be partially filled by the write
>>>> +                * buffer, but there are some holes that need to be read from
>>>> +                * the drive. In order to handle this, we will use block layer
>>>> +                * mechanism to split this request in to smaller ones.
>>>> +                */
>>>> +               split_bio = bio_split(bio, NR_PHY_IN_LOG, GFP_KERNEL,
>>>> +                                       &pblk_bio_set);
>>> This is quite inefficient. If you have an rqd with 63 sectors on the device and
>>> the 64th is cached, you are splitting 63 times whereas a single split
>>> would be sufficient.
>>
>> Yeah, definitely, it would be better to find how many contiguous sectors in cache/on drive are and splitting based on it. Will improve that in v2.
>>
>>>> +               bio_chain(split_bio, bio);
>>> I am not sure if it's needed but in blk_queue_split() these flags are set
>>> before making the request:
>>>          split->bi_opf |= REQ_NOMERGE;
>>>          bio_set_flag(*bio, BIO_QUEUE_ENTERED);
>>>> +               generic_make_request(bio);
>>> pblk_lookup_l2p_seq() increments line->ref. You have to release the krefs before
>>> requeing the request.
>> I completely forgot about it, will fix that of course, thanks!
>>
>>> You might consider introducing a pblk_lookup_l2p_uncached() which returns when
>>> it finds a cached sector and returns its index. Doing so you can avoid obtaining
>>> superfluous line->refs and we could also get rid of the read_bitmap
>>> entirely. This
>>> index could also be used to split the bio at the right position.
> 
> 
> I think you can also get rid of the read_bitmap. This would help
> removing one of the 64 lba dependencies in pblk, which I think is useful
> as we move forward.

Sure, makes sense - I'll post this as a part of v2.
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 04/18] lightnvm: pblk: OOB recovery for closed chunks fix
  2019-03-17 19:24     ` Matias Bjørling
@ 2019-03-18 12:50       ` Igor Konopko
  2019-03-18 19:25         ` Javier González
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-18 12:50 UTC (permalink / raw)
  To: Matias Bjørling, Javier González; +Cc: Hans Holmberg, linux-block



On 17.03.2019 20:24, Matias Bjørling wrote:
> On 3/16/19 3:43 PM, Javier González wrote:
>>> On 14 Mar 2019, at 09.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>
>>> In case of OOB recovery, when some of the chunks are in closed state,
>>> we are calculating number of written sectors in line incorrectly,
>>> because we are always counting chunk WP, which for closed chunks
>>> does not longer reflects written sectors in particular chunks. This
>>> patch for such a chunks takes clba field instead.
>>>
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-recovery.c | 8 +++++++-
>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/lightnvm/pblk-recovery.c 
>>> b/drivers/lightnvm/pblk-recovery.c
>>> index 83b467b..bcd3633 100644
>>> --- a/drivers/lightnvm/pblk-recovery.c
>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>> @@ -101,6 +101,8 @@ static void pblk_update_line_wp(struct pblk 
>>> *pblk, struct pblk_line *line,
>>>
>>> static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line 
>>> *line)
>>> {
>>> +    struct nvm_tgt_dev *dev = pblk->dev;
>>> +    struct nvm_geo *geo = &dev->geo;
>>>     struct pblk_line_meta *lm = &pblk->lm;
>>>     int nr_bb = bitmap_weight(line->blk_bitmap, lm->blk_per_line);
>>>     u64 written_secs = 0;
>>> @@ -113,7 +115,11 @@ static u64 pblk_sec_in_open_line(struct pblk 
>>> *pblk, struct pblk_line *line)
>>>         if (chunk->state & NVM_CHK_ST_OFFLINE)
>>>             continue;
>>>
>>> -        written_secs += chunk->wp;
>>> +        if (chunk->state & NVM_CHK_ST_OPEN)
>>> +            written_secs += chunk->wp;
>>> +        else if (chunk->state & NVM_CHK_ST_CLOSED)
>>> +            written_secs += geo->clba;
>>> +
>>>         valid_chunks++;
>>>     }
>>>
>>> -- 
>>> 2.9.5
>>
>> Mmmm. The change is correct, but can you develop on why the WP does not
>> reflect the written sectors in a closed chunk? As I understand it, the
>> WP reflects the last written sector; if it happens to be WP == clba, then
>> the chunk state machine transitions to closed, but the WP remains
>> untouched. It is only when we reset the chunk that the WP comes back to
>> 0. Am I missing something?
>>
> 
> I agree with Javier. In OCSSD, the write pointer shall always be valid.

So based on OCSSD 2.0 spec "If WP = SLBA + NLB, the chunk is closed" 
(also on "Figure 5: Chunk State Diagram in spec": WP = SLBA + NLB for 
closed chunk), my understanding it that the WP is not valid for that 
particular use case, since in written_secs we want to obtain NLB field 
(relative within chunk, without starting LBA value). So that's why I 
used fixed CLBA here, since it WP pointer for closed chunk is absolute 
value instead of relative one.
Did I misunderstood sth in the spec?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 06/18] lightnvm: pblk: recover only written metadata
  2019-03-16 23:46   ` Javier González
@ 2019-03-18 12:54     ` Igor Konopko
  2019-03-18 15:04       ` Igor Konopko
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-18 12:54 UTC (permalink / raw)
  To: Javier González; +Cc: Matias Bjørling, Hans Holmberg, linux-block



On 17.03.2019 00:46, Javier González wrote:
>> On 14 Mar 2019, at 09.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>> This patch ensures that smeta/emeta was written properly before even
>> trying to read it based on chunk table state and write pointer.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>> drivers/lightnvm/pblk-recovery.c | 43 ++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 41 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>> index 688fdeb..ba1691d 100644
>> --- a/drivers/lightnvm/pblk-recovery.c
>> +++ b/drivers/lightnvm/pblk-recovery.c
>> @@ -653,8 +653,42 @@ static int pblk_line_was_written(struct pblk_line *line,
>> 	bppa = pblk->luns[smeta_blk].bppa;
>> 	chunk = &line->chks[pblk_ppa_to_pos(geo, bppa)];
>>
>> -	if (chunk->state & NVM_CHK_ST_FREE)
>> -		return 0;
>> +	if (chunk->state & NVM_CHK_ST_CLOSED ||
>> +	    (chunk->state & NVM_CHK_ST_OPEN
>> +	     && chunk->wp >= lm->smeta_sec))
>> +		return 1;
>> +
>> +	return 0;
>> +}
>> +
>> +static int pblk_line_was_emeta_written(struct pblk_line *line,
>> +				       struct pblk *pblk)
>> +{
>> +
>> +	struct pblk_line_meta *lm = &pblk->lm;
>> +	struct nvm_tgt_dev *dev = pblk->dev;
>> +	struct nvm_geo *geo = &dev->geo;
>> +	struct nvm_chk_meta *chunk;
>> +	struct ppa_addr ppa;
>> +	int i, pos;
>> +	int min = pblk->min_write_pgs;
>> +	u64 paddr = line->emeta_ssec;
>> +
>> +	for (i = 0; i < lm->emeta_sec[0]; i++, paddr++) {
>> +		ppa = addr_to_gen_ppa(pblk, paddr, line->id);
>> +		pos = pblk_ppa_to_pos(geo, ppa);
>> +		while (test_bit(pos, line->blk_bitmap)) {
>> +			paddr += min;
>> +			ppa = addr_to_gen_ppa(pblk, paddr, line->id);
>> +			pos = pblk_ppa_to_pos(geo, ppa);
>> +		}
>> +		chunk = &line->chks[pos];
>> +
>> +		if (!(chunk->state & NVM_CHK_ST_CLOSED ||
>> +		    (chunk->state & NVM_CHK_ST_OPEN
>> +		     && chunk->wp > ppa.m.sec)))
>> +			return 0;
>> +	}
>>
>> 	return 1;
>> }
>> @@ -788,6 +822,11 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
>> 			goto next;
>> 		}
>>
>> +		if (!pblk_line_was_emeta_written(line, pblk)) {
>> +			pblk_recov_l2p_from_oob(pblk, line);
>> +			goto next;
>> +		}
>> +
>> 		if (pblk_line_emeta_read(pblk, line, line->emeta->buf)) {
>> 			pblk_recov_l2p_from_oob(pblk, line);
>> 			goto next;
>> --
>> 2.9.5
> 
>   I would like to avoid iterating on all chunks again an again to check
>   for different things at boot time. What do you think having this as
>   something like PBLK_LINESTATE_OPEN_INVALID or
>   PBLK_LINESTATE_OPEN_NONRECOV, which you populate when collecting the
>   chunk information? Then this becomes a simple check as opposed to the
>   extra chunk iteration?
> 

Ok, will work on sth different.

> On the check itself: Is this done for completeness or have you hit a
> case that is not covered by the smeta/emeta CRC protection?
> 

So generally my goal here was to avoid reading blank sectors if not needed.

> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 07/18] lightnvm: pblk: wait for inflight IOs in recovery
  2019-03-17 19:33   ` Matias Bjørling
@ 2019-03-18 12:58     ` Igor Konopko
  0 siblings, 0 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-18 12:58 UTC (permalink / raw)
  To: Matias Bjørling, javier, hans.holmberg; +Cc: linux-block



On 17.03.2019 20:33, Matias Bjørling wrote:
> On 3/14/19 9:04 AM, Igor Konopko wrote:
>> This patch changes the behaviour of recovery padding in order to
>> support a case, when some IOs were already submitted to the drive and
>> some next one are not submitted due to error returned.
>>
>> Currently in case of errors we simply exit the pad function without
>> waiting for inflight IOs, which leads to panic on inflight IOs
>> completion.
>>
>> After the changes we always wait for all the inflight IOs before
>> exiting the function.
>>
>> Also, since NVMe has an internal timeout per IO, there is no need to
>> introduce additonal one here.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>>   drivers/lightnvm/pblk-recovery.c | 32 +++++++++++++-------------------
>>   1 file changed, 13 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/lightnvm/pblk-recovery.c 
>> b/drivers/lightnvm/pblk-recovery.c
>> index ba1691d..73d5ead 100644
>> --- a/drivers/lightnvm/pblk-recovery.c
>> +++ b/drivers/lightnvm/pblk-recovery.c
>> @@ -200,7 +200,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, 
>> struct pblk_line *line,
>>       rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>       if (rq_ppas < pblk->min_write_pgs) {
>>           pblk_err(pblk, "corrupted pad line %d\n", line->id);
>> -        goto fail_free_pad;
>> +        goto fail_complete;
>>       }
>>       rq_len = rq_ppas * geo->csecs;
>> @@ -209,7 +209,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, 
>> struct pblk_line *line,
>>                           PBLK_VMALLOC_META, GFP_KERNEL);
>>       if (IS_ERR(bio)) {
>>           ret = PTR_ERR(bio);
>> -        goto fail_free_pad;
>> +        goto fail_complete;
>>       }
>>       bio->bi_iter.bi_sector = 0; /* internal bio */
>> @@ -218,8 +218,11 @@ static int pblk_recov_pad_line(struct pblk *pblk, 
>> struct pblk_line *line,
>>       rqd = pblk_alloc_rqd(pblk, PBLK_WRITE_INT);
>>       ret = pblk_alloc_rqd_meta(pblk, rqd);
>> -    if (ret)
>> -        goto fail_free_rqd;
>> +    if (ret) {
>> +        pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
>> +        bio_put(bio);
>> +        goto fail_complete;
>> +    }
>>       rqd->bio = bio;
>>       rqd->opcode = NVM_OP_PWRITE;
>> @@ -266,7 +269,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, 
>> struct pblk_line *line,
>>       if (ret) {
>>           pblk_err(pblk, "I/O submission failed: %d\n", ret);
>>           pblk_up_chunk(pblk, rqd->ppa_list[0]);
>> -        goto fail_free_rqd;
>> +        kref_put(&pad_rq->ref, pblk_recov_complete);
>> +        pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
>> +        bio_put(bio);
>> +        goto fail_complete;
>>       }
>>       left_line_ppas -= rq_ppas;
>> @@ -274,13 +280,9 @@ static int pblk_recov_pad_line(struct pblk *pblk, 
>> struct pblk_line *line,
>>       if (left_ppas && left_line_ppas)
>>           goto next_pad_rq;
>> +fail_complete:
>>       kref_put(&pad_rq->ref, pblk_recov_complete);
>> -
>> -    if (!wait_for_completion_io_timeout(&pad_rq->wait,
>> -                msecs_to_jiffies(PBLK_COMMAND_TIMEOUT_MS))) {
>> -        pblk_err(pblk, "pad write timed out\n");
>> -        ret = -ETIME;
>> -    }
>> +    wait_for_completion(&pad_rq->wait);
>>       if (!pblk_line_is_full(line))
>>           pblk_err(pblk, "corrupted padded line: %d\n", line->id);
>> @@ -289,14 +291,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, 
>> struct pblk_line *line,
>>   free_rq:
>>       kfree(pad_rq);
>>       return ret;
>> -
>> -fail_free_rqd:
>> -    pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
>> -    bio_put(bio);
>> -fail_free_pad:
>> -    kfree(pad_rq);
>> -    vfree(data);
>> -    return ret;
>>   }
>>   static int pblk_pad_distance(struct pblk *pblk, struct pblk_line *line)
>>
> 
> Hi Igor,
> 
> Can you split this patch in two. One that removes the 
> wait_for_completion_io_timeout (and constant), and another that makes 
> sure it waits until all inflight IOs are completed?
> 

Sure, will split this into two patches for v2.
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 10/18] lightnvm: pblk: ensure that emeta is written
  2019-03-17 19:44   ` Matias Bjørling
@ 2019-03-18 13:02     ` Igor Konopko
  2019-03-18 18:26       ` Javier González
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-18 13:02 UTC (permalink / raw)
  To: Matias Bjørling, javier, hans.holmberg; +Cc: linux-block



On 17.03.2019 20:44, Matias Bjørling wrote:
> On 3/14/19 9:04 AM, Igor Konopko wrote:
>> When we are trying to switch to the new line, we need to ensure that
>> emeta for n-2 line is already written. In other case we can end with
>> deadlock scenario, when the writer has no more requests to write and
>> thus there is no way to trigger emeta writes from writer thread. This
>> is a corner case scenario which occurs in a case of multiple writes
>> error and thus kind of early line close due to lack of line space.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>>   drivers/lightnvm/pblk-core.c  |  2 ++
>>   drivers/lightnvm/pblk-write.c | 24 ++++++++++++++++++++++++
>>   drivers/lightnvm/pblk.h       |  1 +
>>   3 files changed, 27 insertions(+)
>>
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index 38e26fe..a683d1f 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -1001,6 +1001,7 @@ static void pblk_line_setup_metadata(struct 
>> pblk_line *line,
>>                        struct pblk_line_mgmt *l_mg,
>>                        struct pblk_line_meta *lm)
>>   {
>> +    struct pblk *pblk = container_of(l_mg, struct pblk, l_mg);
>>       int meta_line;
>>       lockdep_assert_held(&l_mg->free_lock);
>> @@ -1009,6 +1010,7 @@ static void pblk_line_setup_metadata(struct 
>> pblk_line *line,
>>       meta_line = find_first_zero_bit(&l_mg->meta_bitmap, 
>> PBLK_DATA_LINES);
>>       if (meta_line == PBLK_DATA_LINES) {
>>           spin_unlock(&l_mg->free_lock);
>> +        pblk_write_emeta_force(pblk);
>>           io_schedule();
>>           spin_lock(&l_mg->free_lock);
>>           goto retry_meta;
>> diff --git a/drivers/lightnvm/pblk-write.c 
>> b/drivers/lightnvm/pblk-write.c
>> index 4e63f9b..4fbb9b2 100644
>> --- a/drivers/lightnvm/pblk-write.c
>> +++ b/drivers/lightnvm/pblk-write.c
>> @@ -505,6 +505,30 @@ static struct pblk_line 
>> *pblk_should_submit_meta_io(struct pblk *pblk,
>>       return meta_line;
>>   }
>> +void pblk_write_emeta_force(struct pblk *pblk)
>> +{
>> +    struct pblk_line_meta *lm = &pblk->lm;
>> +    struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>> +    struct pblk_line *meta_line;
>> +
>> +    while (true) {
>> +        spin_lock(&l_mg->close_lock);
>> +        if (list_empty(&l_mg->emeta_list)) {
>> +            spin_unlock(&l_mg->close_lock);
>> +            break;
>> +        }
>> +        meta_line = list_first_entry(&l_mg->emeta_list,
>> +                        struct pblk_line, list);
>> +        if (meta_line->emeta->mem >= lm->emeta_len[0]) {
>> +            spin_unlock(&l_mg->close_lock);
>> +            io_schedule();
>> +            continue;
>> +        }
>> +        spin_unlock(&l_mg->close_lock);
>> +        pblk_submit_meta_io(pblk, meta_line);
>> +    }
>> +}
>> +
>>   static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
>>   {
>>       struct ppa_addr erase_ppa;
>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>> index 0a85990..a42bbfb 100644
>> --- a/drivers/lightnvm/pblk.h
>> +++ b/drivers/lightnvm/pblk.h
>> @@ -877,6 +877,7 @@ int pblk_write_ts(void *data);
>>   void pblk_write_timer_fn(struct timer_list *t);
>>   void pblk_write_should_kick(struct pblk *pblk);
>>   void pblk_write_kick(struct pblk *pblk);
>> +void pblk_write_emeta_force(struct pblk *pblk);
>>   /*
>>    * pblk read path
>>
> 
> Hi Igor,
> 
> Is this an error that qemu can force pblk to expose? Can you provide a 
> specific example on what is needed to force the error?

So I hit this error on PBLKs with low number of LUNs and multiple write 
IO errors (should be reproducible with error injection). Then 
pblk_map_remaining() quickly mapped all the sectors in line and thus 
writer thread was not able to issue all the necessary emeta IO writes, 
so it stucks when trying to replace line to new one. So this is 
definitely an error/corner case scenario.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 12/18] lightnvm: pblk: do not read OOB from emeta region
  2019-03-17 19:56   ` Matias Bjørling
@ 2019-03-18 13:05     ` Igor Konopko
  0 siblings, 0 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-18 13:05 UTC (permalink / raw)
  To: Matias Bjørling, javier, hans.holmberg; +Cc: linux-block



On 17.03.2019 20:56, Matias Bjørling wrote:
> On 3/14/19 9:04 AM, Igor Konopko wrote:
>> Emeta does not have corresponding OOB metadata mapping valid, so there
>> is no need to try to recover L2P mapping from it.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>>   drivers/lightnvm/pblk-recovery.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/lightnvm/pblk-recovery.c 
>> b/drivers/lightnvm/pblk-recovery.c
>> index 4764596..2132260 100644
>> --- a/drivers/lightnvm/pblk-recovery.c
>> +++ b/drivers/lightnvm/pblk-recovery.c
>> @@ -479,6 +479,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, 
>> struct pblk_line *line,
>>           goto retry_rq;
>>       }
>> +    if (paddr >= line->emeta_ssec) {
>> +        /*
>> +         * We reach emeta region and we don't want
>> +         * to recover oob from emeta region.
>> +         */
>> +        goto completed;
> 
> The bio needs to be put before going to completed?

Yes, definitely, thanks.
> 
>> +    }
>> +
>>       pblk_get_packed_meta(pblk, rqd);
>>       bio_put(bio);
>> @@ -499,6 +507,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, 
>> struct pblk_line *line,
>>       if (left_ppas > 0)
>>           goto next_rq;
>> +completed:
>>   #ifdef CONFIG_NVM_PBLK_DEBUG
>>       WARN_ON(padded && !pblk_line_is_full(line));
>>   #endif
>>
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 11/18] lightnvm: pblk: fix update line wp in OOB recovery
  2019-03-18  6:56   ` Javier González
@ 2019-03-18 13:06     ` Igor Konopko
  0 siblings, 0 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-18 13:06 UTC (permalink / raw)
  To: Javier González; +Cc: Matias Bjørling, Hans Holmberg, linux-block



On 18.03.2019 07:56, Javier González wrote:
>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>> In case of OOB recovery, we can hit the scenario when all the data in
>> line were written and some part of emeta was written too. In such
>> a case pblk_update_line_wp() function will call pblk_alloc_page()
>> function which will case to set left_msecs to value below zero
>> (since this field does not track emeta region) and thus will lead to
>> multiple kernel warnings. This patch fixes that issue.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>> drivers/lightnvm/pblk-recovery.c | 20 +++++++++++++++++---
>> 1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>> index 73d5ead..4764596 100644
>> --- a/drivers/lightnvm/pblk-recovery.c
>> +++ b/drivers/lightnvm/pblk-recovery.c
>> @@ -93,10 +93,24 @@ static int pblk_recov_l2p_from_emeta(struct pblk *pblk, struct pblk_line *line)
>> static void pblk_update_line_wp(struct pblk *pblk, struct pblk_line *line,
>> 				u64 written_secs)
>> {
>> -	int i;
>> +	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>> +	int i = 0;
>>
>> -	for (i = 0; i < written_secs; i += pblk->min_write_pgs)
>> -		pblk_alloc_page(pblk, line, pblk->min_write_pgs);
>> +	for (; i < written_secs; i += pblk->min_write_pgs)
> 
> Why no i = 0 here?

Accidentally changed - will revert it.

> 
>> +		__pblk_alloc_page(pblk, line, pblk->min_write_pgs);
>> +
>> +	spin_lock(&l_mg->free_lock);
>> +	if (written_secs > line->left_msecs) {
>> +		/*
>> +		 * We have all data sectors written
>> +		 * and some emeta sectors written too.
>> +		 */
>> +		line->left_msecs = 0;
>> +	} else {
>> +		/* We have only some data sectors written. */
>> +		line->left_msecs -= written_secs;
>> +	}
>> +	spin_unlock(&l_mg->free_lock);
>> }
>>
>> static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line *line)
>> --
>> 2.9.5
> 
> Otherwise, it looks good.
> 
> Reviewed-by: Javier González <javier@javigon.com>
> 
> 
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 13/18] lightnvm: pblk: store multiple copies of smeta
  2019-03-18  7:33   ` Javier González
@ 2019-03-18 13:12     ` Igor Konopko
  0 siblings, 0 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-18 13:12 UTC (permalink / raw)
  To: Javier González; +Cc: Matias Bjørling, Hans Holmberg, linux-block



On 18.03.2019 08:33, Javier González wrote:
>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>> Currently there is only one copy of emeta stored per line in pblk. This
>                                                             ^^^^^^
> smeta?

Sure, typo.

> 
>> is risky, because in case of read error on such a chunk, we are losing
>> all the data from whole line, what leads to silent data corruption.
>>
>> This patch changes this behaviour and stores 2 copies of smeta (but
>> can be easily increased with kernel parameter to different value) in
>> order to provide higher reliability by storing mirrored copies of
>> smeta struct and providing possibility to failover to another copy of
>> that struct in case of read error. Such an approach ensures that copies
>> of that critical structures will be stored on different dies and thus
>> predicted UBER is multiple times higher
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>> drivers/lightnvm/pblk-core.c     | 125 ++++++++++++++++++++++++++++++++-------
>> drivers/lightnvm/pblk-init.c     |  23 +++++--
>> drivers/lightnvm/pblk-recovery.c |   2 +-
>> drivers/lightnvm/pblk.h          |   1 +
>> 4 files changed, 123 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index a683d1f..4d5cd99 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -720,13 +720,14 @@ u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line)
>> 	return bit * geo->ws_opt;
>> }
>>
>> -int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
>> +static int pblk_line_smeta_read_copy(struct pblk *pblk,
>> +				     struct pblk_line *line, u64 paddr)
>> {
>> 	struct nvm_tgt_dev *dev = pblk->dev;
>> +	struct nvm_geo *geo = &dev->geo;
>> 	struct pblk_line_meta *lm = &pblk->lm;
>> 	struct bio *bio;
>> 	struct nvm_rq rqd;
>> -	u64 paddr = pblk_line_smeta_start(pblk, line);
>> 	int i, ret;
>>
>> 	memset(&rqd, 0, sizeof(struct nvm_rq));
>> @@ -735,7 +736,8 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
>> 	if (ret)
>> 		return ret;
>>
>> -	bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
>> +	bio = bio_map_kern(dev->q, line->smeta,
>> +			   lm->smeta_len / lm->smeta_copies, GFP_KERNEL);
>> 	if (IS_ERR(bio)) {
>> 		ret = PTR_ERR(bio);
>> 		goto clear_rqd;
>> @@ -746,11 +748,23 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
>>
>> 	rqd.bio = bio;
>> 	rqd.opcode = NVM_OP_PREAD;
>> -	rqd.nr_ppas = lm->smeta_sec;
>> +	rqd.nr_ppas = lm->smeta_sec / lm->smeta_copies;
>> 	rqd.is_seq = 1;
>>
>> -	for (i = 0; i < lm->smeta_sec; i++, paddr++)
>> -		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>> +	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
>> +		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line->id);
>> +		int pos = pblk_ppa_to_pos(geo, ppa);
>> +
>> +		while (test_bit(pos, line->blk_bitmap)) {
>> +			paddr += pblk->min_write_pgs;
>> +			ppa = addr_to_gen_ppa(pblk, paddr, line->id);
>> +			pos = pblk_ppa_to_pos(geo, ppa);
>> +		}
>> +
>> +		rqd.ppa_list[i] = ppa;
>> +		pblk_get_meta(pblk, rqd.meta_list, i)->lba =
>> +				  cpu_to_le64(ADDR_EMPTY);
>> +	}
>>
>> 	ret = pblk_submit_io_sync(pblk, &rqd);
>> 	if (ret) {
>> @@ -771,16 +785,63 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
>> 	return ret;
>> }
>>
>> -static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
>> -				 u64 paddr)
>> +int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
>> +{
>> +	struct pblk_line_meta *lm = &pblk->lm;
>> +	int i, ret = 0, smeta_sec = lm->smeta_sec / lm->smeta_copies;
>> +	u64 paddr = pblk_line_smeta_start(pblk, line);
>> +
>> +	for (i = 0; i < lm->smeta_copies; i++) {
>> +		ret = pblk_line_smeta_read_copy(pblk, line,
>> +						paddr + (i * smeta_sec));
>> +		if (!ret) {
>> +			/*
>> +			 * Just one successfully read copy of smeta is
>> +			 * enough for us for recovery, don't need to
>> +			 * read another one.
>> +			 */
>> +			return ret;
>> +		}
>> +	}
>> +	return ret;
>> +}
>> +
>> +static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line)
>> {
>> 	struct nvm_tgt_dev *dev = pblk->dev;
>> +	struct nvm_geo *geo = &dev->geo;
>> 	struct pblk_line_meta *lm = &pblk->lm;
>> 	struct bio *bio;
>> 	struct nvm_rq rqd;
>> 	__le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
>> 	__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
>> -	int i, ret;
>> +	u64 paddr = 0;
>> +	int smeta_cpy_len = lm->smeta_len / lm->smeta_copies;
>> +	int smeta_cpy_sec = lm->smeta_sec / lm->smeta_copies;
>> +	int i, ret, rq_writes;
>> +
>> +	/*
>> +	 * Check if we can write all the smeta copies with
>> +	 * a single write command.
>> +	 * If yes -> copy smeta sector into multiple copies
>> +	 * in buffer to write.
>> +	 * If no -> issue writes one by one using the same
>> +	 * buffer space.
>> +	 * Only if all the copies are written correctly
>> +	 * we are treating this line as valid for proper
>> +	 * UBER reliability.
>> +	 */
>> +	if (lm->smeta_sec > pblk->max_write_pgs) {
>> +		rq_writes = lm->smeta_copies;
>> +	} else {
>> +		rq_writes = 1;
>> +		for (i = 1; i < lm->smeta_copies; i++) {
>> +			memcpy(line->smeta + i * smeta_cpy_len,
>> +			       line->smeta, smeta_cpy_len);
>> +		}
>> +		smeta_cpy_len = lm->smeta_len;
>> +		smeta_cpy_sec = lm->smeta_sec;
>> +	}
> 
> smeta writes are synchronous, so you can just populate 2 entries in the
> vector command. This will help you minimizing the BW spikes with storing
> multiple copies. When doing this, you can remove the comment too.
> 
> In fact, smeta’s length is calculated so that it takes whole writable
> pages to avoid mixing it with user data, so with this logic you will
> always send 1 commands per smeta copy.

Generally in most of the cases it is true (and we will fall into the 
second case), but there is a case when max_write_pgs == ws_opt, so then 
we need to submit two IOs and my intention was to cover it in the first 
case.

> 
>>
>> 	memset(&rqd, 0, sizeof(struct nvm_rq));
>>
>> @@ -788,7 +849,8 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
>> 	if (ret)
>> 		return ret;
>>
>> -	bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
>> +next_rq:
>> +	bio = bio_map_kern(dev->q, line->smeta, smeta_cpy_len, GFP_KERNEL);
>> 	if (IS_ERR(bio)) {
>> 		ret = PTR_ERR(bio);
>> 		goto clear_rqd;
>> @@ -799,15 +861,23 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
>>
>> 	rqd.bio = bio;
>> 	rqd.opcode = NVM_OP_PWRITE;
>> -	rqd.nr_ppas = lm->smeta_sec;
>> +	rqd.nr_ppas = smeta_cpy_sec;
>> 	rqd.is_seq = 1;
>>
>> -	for (i = 0; i < lm->smeta_sec; i++, paddr++) {
>> -		struct pblk_sec_meta *meta = pblk_get_meta(pblk,
>> -							   rqd.meta_list, i);
>> +	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
>> +		void *meta_list = rqd.meta_list;
>> +		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line->id);
>> +		int pos = pblk_ppa_to_pos(geo, ppa);
>> +
>> +		while (test_bit(pos, line->blk_bitmap)) {
>> +			paddr += pblk->min_write_pgs;
>> +			ppa = addr_to_gen_ppa(pblk, paddr, line->id);
>> +			pos = pblk_ppa_to_pos(geo, ppa);
>> +		}
>>
>> -		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>> -		meta->lba = lba_list[paddr] = addr_empty;
>> +		rqd.ppa_list[i] = ppa;
>> +		pblk_get_meta(pblk, meta_list, i)->lba = addr_empty;
>> +		lba_list[paddr] = addr_empty;
>> 	}
>>
>> 	ret = pblk_submit_io_sync_sem(pblk, &rqd);
>> @@ -822,8 +892,13 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
>> 	if (rqd.error) {
>> 		pblk_log_write_err(pblk, &rqd);
>> 		ret = -EIO;
>> +		goto clear_rqd;
>> 	}
>>
>> +	rq_writes--;
>> +	if (rq_writes > 0)
>> +		goto next_rq;
>> +
>> clear_rqd:
>> 	pblk_free_rqd_meta(pblk, &rqd);
>> 	return ret;
>> @@ -1149,7 +1224,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
>> 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>> 	u64 off;
>> 	int bit = -1;
>> -	int emeta_secs;
>> +	int emeta_secs, smeta_secs;
>>
>> 	line->sec_in_line = lm->sec_per_line;
>>
>> @@ -1165,13 +1240,19 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
>> 	}
>>
>> 	/* Mark smeta metadata sectors as bad sectors */
>> -	bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line);
>> -	off = bit * geo->ws_opt;
>> -	bitmap_set(line->map_bitmap, off, lm->smeta_sec);
>> +	smeta_secs = lm->smeta_sec;
>> +	bit = -1;
>> +	while (smeta_secs) {
>> +		bit = find_next_zero_bit(line->blk_bitmap, lm->blk_per_line,
>> +					bit + 1);
>> +		off = bit * geo->ws_opt;
>> +		bitmap_set(line->map_bitmap, off, geo->ws_opt);
>> +		line->cur_sec = off + geo->ws_opt;
>> +		smeta_secs -= geo->ws_opt;
>> +	}
> 
> The idea with lm->smeta_sec was to abstract the sectors used for smeta
> from ws_opt, as this could change in the future.
> 
> What do you think about leaving lm->smeta_sec as the sectors needed for
> each copy of smeta and then used the lm->smeta_copies to calculate the
> total? (This is done in pblk_line_meta_init(), not here). This way, you
> can keep the code here without using ws_opt.

Sure, can change that.

> 
>> 	line->sec_in_line -= lm->smeta_sec;
>> -	line->cur_sec = off + lm->smeta_sec;
>>
>> -	if (init && pblk_line_smeta_write(pblk, line, off)) {
>> +	if (init && pblk_line_smeta_write(pblk, line)) {
>> 		pblk_debug(pblk, "line smeta I/O failed. Retry\n");
>> 		return 0;
>> 	}
>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>> index b7845f6..e771df6 100644
>> --- a/drivers/lightnvm/pblk-init.c
>> +++ b/drivers/lightnvm/pblk-init.c
>> @@ -27,6 +27,11 @@ static unsigned int write_buffer_size;
>> module_param(write_buffer_size, uint, 0644);
>> MODULE_PARM_DESC(write_buffer_size, "number of entries in a write buffer");
>>
>> +static unsigned int smeta_copies = 2;
> 
> Can we default to 1?
> 

Sure

>> […]
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 17/18] lightnvm: allow to use full device path
  2019-03-18 10:28   ` Hans Holmberg
@ 2019-03-18 13:18     ` Igor Konopko
  2019-03-18 14:41       ` Hans Holmberg
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-18 13:18 UTC (permalink / raw)
  To: Hans Holmberg
  Cc: Matias Bjorling, Javier González, Hans Holmberg, linux-block



On 18.03.2019 11:28, Hans Holmberg wrote:
> On Thu, Mar 14, 2019 at 5:11 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>> This patch adds the possibility to provide full device path (like
>> /dev/nvme0n1) when specifying device on top of which pblk instance
>> should be created/removed.
>>
>> This makes creation of targets from nvme-cli (or other ioctl based
>> tools) more unified with other commands in comparison with current
>> situation where almost all commands uses full device path with except
>> of lightnvm creation/removal parameter which uses just 'nvme0n1'
>> naming convention. After this changes both approach will be valid.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>>   drivers/lightnvm/core.c | 23 ++++++++++++++++++-----
>>   1 file changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>> index c01f83b..838c3d8 100644
>> --- a/drivers/lightnvm/core.c
>> +++ b/drivers/lightnvm/core.c
>> @@ -1195,6 +1195,21 @@ void nvm_unregister(struct nvm_dev *dev)
>>   }
>>   EXPORT_SYMBOL(nvm_unregister);
>>
>> +#define PREFIX_STR "/dev/"
>> +static void nvm_normalize_path(char *path)
>> +{
>> +       path[DISK_NAME_LEN - 1] = '\0';
>> +       if (!memcmp(PREFIX_STR, path,
>> +                               sizeof(char) * strlen(PREFIX_STR))) {
>> +               /*
>> +                * User provide name in '/dev/nvme0n1' format,
>> +                * so we need to skip '/dev/' for comparison
>> +                */
>> +               memmove(path, path + sizeof(char) * strlen(PREFIX_STR),
>> +                       (DISK_NAME_LEN - strlen(PREFIX_STR)) * sizeof(char));
>> +       }
>> +}
>> +
> 
> I don't like this. Why add string parsing to the kernel? Can't this
> feature be added to the nvme tool?

Since during target creation/removal in kernel, we already operate on 
strings multiple times (strcmp calls for target types, nvme device, 
target names) my idea was to keep this in the same layer too.

> 
>>   static int __nvm_configure_create(struct nvm_ioctl_create *create)
>>   {
>>          struct nvm_dev *dev;
>> @@ -1304,9 +1319,9 @@ static long nvm_ioctl_dev_create(struct file *file, void __user *arg)
>>                  return -EINVAL;
>>          }
>>
>> -       create.dev[DISK_NAME_LEN - 1] = '\0';
>> +       nvm_normalize_path(create.dev);
>> +       nvm_normalize_path(create.tgtname);
>>          create.tgttype[NVM_TTYPE_NAME_MAX - 1] = '\0';
>> -       create.tgtname[DISK_NAME_LEN - 1] = '\0';
>>
>>          if (create.flags != 0) {
>>                  __u32 flags = create.flags;
>> @@ -1333,7 +1348,7 @@ static long nvm_ioctl_dev_remove(struct file *file, void __user *arg)
>>          if (copy_from_user(&remove, arg, sizeof(struct nvm_ioctl_remove)))
>>                  return -EFAULT;
>>
>> -       remove.tgtname[DISK_NAME_LEN - 1] = '\0';
>> +       nvm_normalize_path(remove.tgtname);
>>
>>          if (remove.flags != 0) {
>>                  pr_err("nvm: no flags supported\n");
>> @@ -1373,8 +1388,6 @@ static long nvm_ioctl_dev_factory(struct file *file, void __user *arg)
>>          if (copy_from_user(&fact, arg, sizeof(struct nvm_ioctl_dev_factory)))
>>                  return -EFAULT;
>>
>> -       fact.dev[DISK_NAME_LEN - 1] = '\0';
>> -
>>          if (fact.flags & ~(NVM_FACTORY_NR_BITS - 1))
>>                  return -EINVAL;
>>
>> --
>> 2.9.5
>>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 14/18] lightnvm: pblk: GC error handling
  2019-03-18 12:14   ` Hans Holmberg
@ 2019-03-18 13:22     ` Igor Konopko
  2019-03-18 14:14       ` Hans Holmberg
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-18 13:22 UTC (permalink / raw)
  To: Hans Holmberg
  Cc: Matias Bjorling, Javier González, Hans Holmberg, linux-block



On 18.03.2019 13:14, Hans Holmberg wrote:
> On Thu, Mar 14, 2019 at 5:09 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>> Currently when there is an IO error (or similar) on GC read path, pblk
>> still moves this line to free state, what leads to data mismatch issue.
>>
>> This patch adds a handling for such an error - after that changes this
>> line will be returned to closed state so it can be still in use
>> for reading - at least we will be able to return error status to user
>> when user tries to read such a data.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>>   drivers/lightnvm/pblk-core.c | 8 ++++++++
>>   drivers/lightnvm/pblk-gc.c   | 5 ++---
>>   drivers/lightnvm/pblk-read.c | 1 -
>>   drivers/lightnvm/pblk.h      | 2 ++
>>   4 files changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index 4d5cd99..6817f8f 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -1786,6 +1786,14 @@ static void __pblk_line_put(struct pblk *pblk, struct pblk_line *line)
>>
>>          spin_lock(&line->lock);
>>          WARN_ON(line->state != PBLK_LINESTATE_GC);
>> +       if (line->w_err_gc->has_gc_err) {
>> +               spin_unlock(&line->lock);
>> +               pblk_err(pblk, "line %d had errors during GC\n", line->id);
>> +               pblk_put_line_back(pblk, line);
>> +               line->w_err_gc->has_gc_err = 0;
> 
> In a real bummer corner case, the line might have had a write error as well
> (line->w_err_gc->has_write_err == true)
> 
> In that case we need to inform the rate limiter:
> pblk_rl_werr_line_out(&pblk->rl);

Is that needed or rather preferred?

I'm not clearing w_err_gc->has_write_err value in that case (and thus 
not informing the rate limiter), so the line will go back to exactly the 
same state as before GC (still with write error marked).

> 
>> +               return;
>> +       }
>> +
>>          line->state = PBLK_LINESTATE_FREE;
>>          trace_pblk_line_state(pblk_disk_name(pblk), line->id,
>>                                          line->state);
>> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
>> index e23b192..63ee205 100644
>> --- a/drivers/lightnvm/pblk-gc.c
>> +++ b/drivers/lightnvm/pblk-gc.c
>> @@ -59,7 +59,7 @@ static void pblk_gc_writer_kick(struct pblk_gc *gc)
>>          wake_up_process(gc->gc_writer_ts);
>>   }
>>
>> -static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
>> +void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
>>   {
>>          struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>          struct list_head *move_list;
>> @@ -98,8 +98,7 @@ static void pblk_gc_line_ws(struct work_struct *work)
>>          /* Read from GC victim block */
>>          ret = pblk_submit_read_gc(pblk, gc_rq);
>>          if (ret) {
>> -               pblk_err(pblk, "failed GC read in line:%d (err:%d)\n",
>> -                                                               line->id, ret);
>> +               line->w_err_gc->has_gc_err = 1;
>>                  goto out;
>>          }
>>
>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>> index 54422a2..6a77c24 100644
>> --- a/drivers/lightnvm/pblk-read.c
>> +++ b/drivers/lightnvm/pblk-read.c
>> @@ -475,7 +475,6 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
>>
>>          if (pblk_submit_io_sync(pblk, &rqd)) {
>>                  ret = -EIO;
>> -               pblk_err(pblk, "GC read request failed\n");
>>                  goto err_free_bio;
>>          }
>>
>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>> index 5d1040a..52002f5 100644
>> --- a/drivers/lightnvm/pblk.h
>> +++ b/drivers/lightnvm/pblk.h
>> @@ -427,6 +427,7 @@ struct pblk_smeta {
>>
>>   struct pblk_w_err_gc {
>>          int has_write_err;
>> +       int has_gc_err;
>>          __le64 *lba_list;
>>   };
>>
>> @@ -909,6 +910,7 @@ void pblk_gc_free_full_lines(struct pblk *pblk);
>>   void pblk_gc_sysfs_state_show(struct pblk *pblk, int *gc_enabled,
>>                                int *gc_active);
>>   int pblk_gc_sysfs_force(struct pblk *pblk, int force);
>> +void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line);
>>
>>   /*
>>    * pblk rate limiter
>> --
>> 2.9.5
>>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 15/18] lightnvm: pblk: fix in case of lack of lines
  2019-03-18  7:42   ` Javier González
@ 2019-03-18 13:28     ` Igor Konopko
  2019-03-18 19:21       ` Javier González
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-18 13:28 UTC (permalink / raw)
  To: Javier González; +Cc: Matias Bjørling, Hans Holmberg, linux-block



On 18.03.2019 08:42, Javier González wrote:
>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>> In case when mapping fails (called from writer thread) due to lack of
>> lines, currently we are calling pblk_pipeline_stop(), which waits
>> for pending write IOs, so it will lead to the deadlock. Switching
>> to __pblk_pipeline_stop() in that case instead will fix that.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>> drivers/lightnvm/pblk-map.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>> index 5408e32..afc10306 100644
>> --- a/drivers/lightnvm/pblk-map.c
>> +++ b/drivers/lightnvm/pblk-map.c
>> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>> 		pblk_line_close_meta(pblk, prev_line);
>>
>> 		if (!line) {
>> -			pblk_pipeline_stop(pblk);
>> +			__pblk_pipeline_stop(pblk);
>> 			return -ENOSPC;
>> 		}
>>
>> --
>> 2.9.5
> 
> Have you seeing this problem?
> 
> Before checking if there is a line, we are closing metadata for the
> previous line, so all inflight I/Os should be clear. Can you develop on
> the case in which this would happen?

So we have following sequence: pblk_pipeline_stop() -> 
__pblk_pipeline_flush() -> pblk_flush_writer() -> wait for emptying 
round buffer.
This will never complete, since we still have some RB entries, which 
cannot be written since writer thread is blocked with waiting inside 
pblk_flush_writer().

Am I missing sth?

> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 14/18] lightnvm: pblk: GC error handling
  2019-03-18 13:22     ` Igor Konopko
@ 2019-03-18 14:14       ` Hans Holmberg
  0 siblings, 0 replies; 69+ messages in thread
From: Hans Holmberg @ 2019-03-18 14:14 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Matias Bjorling, Javier González, Hans Holmberg, linux-block

On Mon, Mar 18, 2019 at 2:22 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
>
>
> On 18.03.2019 13:14, Hans Holmberg wrote:
> > On Thu, Mar 14, 2019 at 5:09 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>
> >> Currently when there is an IO error (or similar) on GC read path, pblk
> >> still moves this line to free state, what leads to data mismatch issue.
> >>
> >> This patch adds a handling for such an error - after that changes this
> >> line will be returned to closed state so it can be still in use
> >> for reading - at least we will be able to return error status to user
> >> when user tries to read such a data.
> >>
> >> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >> ---
> >>   drivers/lightnvm/pblk-core.c | 8 ++++++++
> >>   drivers/lightnvm/pblk-gc.c   | 5 ++---
> >>   drivers/lightnvm/pblk-read.c | 1 -
> >>   drivers/lightnvm/pblk.h      | 2 ++
> >>   4 files changed, 12 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> >> index 4d5cd99..6817f8f 100644
> >> --- a/drivers/lightnvm/pblk-core.c
> >> +++ b/drivers/lightnvm/pblk-core.c
> >> @@ -1786,6 +1786,14 @@ static void __pblk_line_put(struct pblk *pblk, struct pblk_line *line)
> >>
> >>          spin_lock(&line->lock);
> >>          WARN_ON(line->state != PBLK_LINESTATE_GC);
> >> +       if (line->w_err_gc->has_gc_err) {
> >> +               spin_unlock(&line->lock);
> >> +               pblk_err(pblk, "line %d had errors during GC\n", line->id);
> >> +               pblk_put_line_back(pblk, line);
> >> +               line->w_err_gc->has_gc_err = 0;
> >
> > In a real bummer corner case, the line might have had a write error as well
> > (line->w_err_gc->has_write_err == true)
> >
> > In that case we need to inform the rate limiter:
> > pblk_rl_werr_line_out(&pblk->rl);
>
> Is that needed or rather preferred?
>
> I'm not clearing w_err_gc->has_write_err value in that case (and thus
> not informing the rate limiter), so the line will go back to exactly the
> same state as before GC (still with write error marked).

You're right, we will try to gc the line again. Thanks!

Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>

>
> >
> >> +               return;
> >> +       }
> >> +
> >>          line->state = PBLK_LINESTATE_FREE;
> >>          trace_pblk_line_state(pblk_disk_name(pblk), line->id,
> >>                                          line->state);
> >> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> >> index e23b192..63ee205 100644
> >> --- a/drivers/lightnvm/pblk-gc.c
> >> +++ b/drivers/lightnvm/pblk-gc.c
> >> @@ -59,7 +59,7 @@ static void pblk_gc_writer_kick(struct pblk_gc *gc)
> >>          wake_up_process(gc->gc_writer_ts);
> >>   }
> >>
> >> -static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
> >> +void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
> >>   {
> >>          struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> >>          struct list_head *move_list;
> >> @@ -98,8 +98,7 @@ static void pblk_gc_line_ws(struct work_struct *work)
> >>          /* Read from GC victim block */
> >>          ret = pblk_submit_read_gc(pblk, gc_rq);
> >>          if (ret) {
> >> -               pblk_err(pblk, "failed GC read in line:%d (err:%d)\n",
> >> -                                                               line->id, ret);
> >> +               line->w_err_gc->has_gc_err = 1;
> >>                  goto out;
> >>          }
> >>
> >> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> >> index 54422a2..6a77c24 100644
> >> --- a/drivers/lightnvm/pblk-read.c
> >> +++ b/drivers/lightnvm/pblk-read.c
> >> @@ -475,7 +475,6 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
> >>
> >>          if (pblk_submit_io_sync(pblk, &rqd)) {
> >>                  ret = -EIO;
> >> -               pblk_err(pblk, "GC read request failed\n");
> >>                  goto err_free_bio;
> >>          }
> >>
> >> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> >> index 5d1040a..52002f5 100644
> >> --- a/drivers/lightnvm/pblk.h
> >> +++ b/drivers/lightnvm/pblk.h
> >> @@ -427,6 +427,7 @@ struct pblk_smeta {
> >>
> >>   struct pblk_w_err_gc {
> >>          int has_write_err;
> >> +       int has_gc_err;
> >>          __le64 *lba_list;
> >>   };
> >>
> >> @@ -909,6 +910,7 @@ void pblk_gc_free_full_lines(struct pblk *pblk);
> >>   void pblk_gc_sysfs_state_show(struct pblk *pblk, int *gc_enabled,
> >>                                int *gc_active);
> >>   int pblk_gc_sysfs_force(struct pblk *pblk, int force);
> >> +void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line);
> >>
> >>   /*
> >>    * pblk rate limiter
> >> --
> >> 2.9.5
> >>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 17/18] lightnvm: allow to use full device path
  2019-03-18 13:18     ` Igor Konopko
@ 2019-03-18 14:41       ` Hans Holmberg
  2019-03-21 13:18         ` Igor Konopko
  0 siblings, 1 reply; 69+ messages in thread
From: Hans Holmberg @ 2019-03-18 14:41 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Matias Bjorling, Javier González, Hans Holmberg, linux-block

On Mon, Mar 18, 2019 at 2:18 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
>
>
> On 18.03.2019 11:28, Hans Holmberg wrote:
> > On Thu, Mar 14, 2019 at 5:11 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>
> >> This patch adds the possibility to provide full device path (like
> >> /dev/nvme0n1) when specifying device on top of which pblk instance
> >> should be created/removed.
> >>
> >> This makes creation of targets from nvme-cli (or other ioctl based
> >> tools) more unified with other commands in comparison with current
> >> situation where almost all commands uses full device path with except
> >> of lightnvm creation/removal parameter which uses just 'nvme0n1'
> >> naming convention. After this changes both approach will be valid.
> >>
> >> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >> ---
> >>   drivers/lightnvm/core.c | 23 ++++++++++++++++++-----
> >>   1 file changed, 18 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> >> index c01f83b..838c3d8 100644
> >> --- a/drivers/lightnvm/core.c
> >> +++ b/drivers/lightnvm/core.c
> >> @@ -1195,6 +1195,21 @@ void nvm_unregister(struct nvm_dev *dev)
> >>   }
> >>   EXPORT_SYMBOL(nvm_unregister);
> >>
> >> +#define PREFIX_STR "/dev/"
> >> +static void nvm_normalize_path(char *path)
> >> +{
> >> +       path[DISK_NAME_LEN - 1] = '\0';
> >> +       if (!memcmp(PREFIX_STR, path,
> >> +                               sizeof(char) * strlen(PREFIX_STR))) {
> >> +               /*
> >> +                * User provide name in '/dev/nvme0n1' format,
> >> +                * so we need to skip '/dev/' for comparison
> >> +                */
> >> +               memmove(path, path + sizeof(char) * strlen(PREFIX_STR),
> >> +                       (DISK_NAME_LEN - strlen(PREFIX_STR)) * sizeof(char));
> >> +       }
> >> +}
> >> +
> >
> > I don't like this. Why add string parsing to the kernel? Can't this
> > feature be added to the nvme tool?
>
> Since during target creation/removal in kernel, we already operate on
> strings multiple times (strcmp calls for target types, nvme device,
> target names) my idea was to keep this in the same layer too.

Oh, pardon the terse and rather grumpy review. Let me elaborate:

String parsing is best avoided when possible, and i don't think it's
worth increasing the kernel code size and changing the behavior of the
IOCTL when its fully doable to do this in userspace.

Thanks,
Hans

>
> >
> >>   static int __nvm_configure_create(struct nvm_ioctl_create *create)
> >>   {
> >>          struct nvm_dev *dev;
> >> @@ -1304,9 +1319,9 @@ static long nvm_ioctl_dev_create(struct file *file, void __user *arg)
> >>                  return -EINVAL;
> >>          }
> >>
> >> -       create.dev[DISK_NAME_LEN - 1] = '\0';
> >> +       nvm_normalize_path(create.dev);
> >> +       nvm_normalize_path(create.tgtname);
> >>          create.tgttype[NVM_TTYPE_NAME_MAX - 1] = '\0';
> >> -       create.tgtname[DISK_NAME_LEN - 1] = '\0';
> >>
> >>          if (create.flags != 0) {
> >>                  __u32 flags = create.flags;
> >> @@ -1333,7 +1348,7 @@ static long nvm_ioctl_dev_remove(struct file *file, void __user *arg)
> >>          if (copy_from_user(&remove, arg, sizeof(struct nvm_ioctl_remove)))
> >>                  return -EFAULT;
> >>
> >> -       remove.tgtname[DISK_NAME_LEN - 1] = '\0';
> >> +       nvm_normalize_path(remove.tgtname);
> >>
> >>          if (remove.flags != 0) {
> >>                  pr_err("nvm: no flags supported\n");
> >> @@ -1373,8 +1388,6 @@ static long nvm_ioctl_dev_factory(struct file *file, void __user *arg)
> >>          if (copy_from_user(&fact, arg, sizeof(struct nvm_ioctl_dev_factory)))
> >>                  return -EFAULT;
> >>
> >> -       fact.dev[DISK_NAME_LEN - 1] = '\0';
> >> -
> >>          if (fact.flags & ~(NVM_FACTORY_NR_BITS - 1))
> >>                  return -EINVAL;
> >>
> >> --
> >> 2.9.5
> >>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 06/18] lightnvm: pblk: recover only written metadata
  2019-03-18 12:54     ` Igor Konopko
@ 2019-03-18 15:04       ` Igor Konopko
  0 siblings, 0 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-18 15:04 UTC (permalink / raw)
  To: Javier González; +Cc: Matias Bjørling, Hans Holmberg, linux-block



On 18.03.2019 13:54, Igor Konopko wrote:
> 
> 
> On 17.03.2019 00:46, Javier González wrote:
>>> On 14 Mar 2019, at 09.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>
>>> This patch ensures that smeta/emeta was written properly before even
>>> trying to read it based on chunk table state and write pointer.
>>>
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-recovery.c | 43 
>>> ++++++++++++++++++++++++++++++++++++++--
>>> 1 file changed, 41 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/lightnvm/pblk-recovery.c 
>>> b/drivers/lightnvm/pblk-recovery.c
>>> index 688fdeb..ba1691d 100644
>>> --- a/drivers/lightnvm/pblk-recovery.c
>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>> @@ -653,8 +653,42 @@ static int pblk_line_was_written(struct 
>>> pblk_line *line,
>>>     bppa = pblk->luns[smeta_blk].bppa;
>>>     chunk = &line->chks[pblk_ppa_to_pos(geo, bppa)];
>>>
>>> -    if (chunk->state & NVM_CHK_ST_FREE)
>>> -        return 0;
>>> +    if (chunk->state & NVM_CHK_ST_CLOSED ||
>>> +        (chunk->state & NVM_CHK_ST_OPEN
>>> +         && chunk->wp >= lm->smeta_sec))
>>> +        return 1;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int pblk_line_was_emeta_written(struct pblk_line *line,
>>> +                       struct pblk *pblk)
>>> +{
>>> +
>>> +    struct pblk_line_meta *lm = &pblk->lm;
>>> +    struct nvm_tgt_dev *dev = pblk->dev;
>>> +    struct nvm_geo *geo = &dev->geo;
>>> +    struct nvm_chk_meta *chunk;
>>> +    struct ppa_addr ppa;
>>> +    int i, pos;
>>> +    int min = pblk->min_write_pgs;
>>> +    u64 paddr = line->emeta_ssec;
>>> +
>>> +    for (i = 0; i < lm->emeta_sec[0]; i++, paddr++) {
>>> +        ppa = addr_to_gen_ppa(pblk, paddr, line->id);
>>> +        pos = pblk_ppa_to_pos(geo, ppa);
>>> +        while (test_bit(pos, line->blk_bitmap)) {
>>> +            paddr += min;
>>> +            ppa = addr_to_gen_ppa(pblk, paddr, line->id);
>>> +            pos = pblk_ppa_to_pos(geo, ppa);
>>> +        }
>>> +        chunk = &line->chks[pos];
>>> +
>>> +        if (!(chunk->state & NVM_CHK_ST_CLOSED ||
>>> +            (chunk->state & NVM_CHK_ST_OPEN
>>> +             && chunk->wp > ppa.m.sec)))
>>> +            return 0;
>>> +    }
>>>
>>>     return 1;
>>> }
>>> @@ -788,6 +822,11 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
>>>             goto next;
>>>         }
>>>
>>> +        if (!pblk_line_was_emeta_written(line, pblk)) {
>>> +            pblk_recov_l2p_from_oob(pblk, line);
>>> +            goto next;
>>> +        }
>>> +
>>>         if (pblk_line_emeta_read(pblk, line, line->emeta->buf)) {
>>>             pblk_recov_l2p_from_oob(pblk, line);
>>>             goto next;
>>> -- 
>>> 2.9.5
>>
>>   I would like to avoid iterating on all chunks again an again to check
>>   for different things at boot time. What do you think having this as
>>   something like PBLK_LINESTATE_OPEN_INVALID or
>>   PBLK_LINESTATE_OPEN_NONRECOV, which you populate when collecting the
>>   chunk information? Then this becomes a simple check as opposed to the
>>   extra chunk iteration?
>>
> 
> Ok, will work on sth different.

I rethink my changes again and I messed up. Emeta check is redundant, 
since pblk_line_is_open() will handle the scenario when emeta was not 
fully written (since some chunk will be in open state).
So I'll drop the emeta part of that patch from v2 and will keep only 
smeta fix part.

> 
>> On the check itself: Is this done for completeness or have you hit a
>> case that is not covered by the smeta/emeta CRC protection?
>>
> 
> So generally my goal here was to avoid reading blank sectors if not needed.
> 
>>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 01/18] lightnvm: pblk: fix warning in pblk_l2p_init()
  2019-03-14 16:04 ` [PATCH 01/18] lightnvm: pblk: fix warning in pblk_l2p_init() Igor Konopko
  2019-03-16 22:29   ` Javier González
@ 2019-03-18 16:25   ` Matias Bjørling
  1 sibling, 0 replies; 69+ messages in thread
From: Matias Bjørling @ 2019-03-18 16:25 UTC (permalink / raw)
  To: Igor Konopko, javier, hans.holmberg; +Cc: linux-block

On 3/14/19 9:04 AM, Igor Konopko wrote:
> This patch fixes a compilation warning caused by improper format
> specifier provided in pblk_l2p_init().
> 
> Fixes: fe0c220 ("lightnvm: pblk: cleanly fail when there is not enough memory")
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>   drivers/lightnvm/pblk-init.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index 1940f89..1e227a0 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -159,7 +159,7 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
>   					| __GFP_RETRY_MAYFAIL | __GFP_HIGHMEM,
>   					PAGE_KERNEL);
>   	if (!pblk->trans_map) {
> -		pblk_err(pblk, "failed to allocate L2P (need %ld of memory)\n",
> +		pblk_err(pblk, "failed to allocate L2P (need %zu of memory)\n",
>   				map_size);
>   		return -ENOMEM;
>   	}
> 

Thanks Igor. I pulled it into the original patch.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 10/18] lightnvm: pblk: ensure that emeta is written
  2019-03-18 13:02     ` Igor Konopko
@ 2019-03-18 18:26       ` Javier González
  2019-03-21 13:34         ` Igor Konopko
  0 siblings, 1 reply; 69+ messages in thread
From: Javier González @ 2019-03-18 18:26 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 4317 bytes --]

> On 18 Mar 2019, at 14.02, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> 
> 
> On 17.03.2019 20:44, Matias Bjørling wrote:
>> On 3/14/19 9:04 AM, Igor Konopko wrote:
>>> When we are trying to switch to the new line, we need to ensure that
>>> emeta for n-2 line is already written. In other case we can end with
>>> deadlock scenario, when the writer has no more requests to write and
>>> thus there is no way to trigger emeta writes from writer thread. This
>>> is a corner case scenario which occurs in a case of multiple writes
>>> error and thus kind of early line close due to lack of line space.
>>> 
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>>   drivers/lightnvm/pblk-core.c  |  2 ++
>>>   drivers/lightnvm/pblk-write.c | 24 ++++++++++++++++++++++++
>>>   drivers/lightnvm/pblk.h       |  1 +
>>>   3 files changed, 27 insertions(+)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>> index 38e26fe..a683d1f 100644
>>> --- a/drivers/lightnvm/pblk-core.c
>>> +++ b/drivers/lightnvm/pblk-core.c
>>> @@ -1001,6 +1001,7 @@ static void pblk_line_setup_metadata(struct pblk_line *line,
>>>                        struct pblk_line_mgmt *l_mg,
>>>                        struct pblk_line_meta *lm)
>>>   {
>>> +    struct pblk *pblk = container_of(l_mg, struct pblk, l_mg);
>>>       int meta_line;
>>>       lockdep_assert_held(&l_mg->free_lock);
>>> @@ -1009,6 +1010,7 @@ static void pblk_line_setup_metadata(struct pblk_line *line,
>>>       meta_line = find_first_zero_bit(&l_mg->meta_bitmap, PBLK_DATA_LINES);
>>>       if (meta_line == PBLK_DATA_LINES) {
>>>           spin_unlock(&l_mg->free_lock);
>>> +        pblk_write_emeta_force(pblk);
>>>           io_schedule();
>>>           spin_lock(&l_mg->free_lock);
>>>           goto retry_meta;
>>> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
>>> index 4e63f9b..4fbb9b2 100644
>>> --- a/drivers/lightnvm/pblk-write.c
>>> +++ b/drivers/lightnvm/pblk-write.c
>>> @@ -505,6 +505,30 @@ static struct pblk_line *pblk_should_submit_meta_io(struct pblk *pblk,
>>>       return meta_line;
>>>   }
>>> +void pblk_write_emeta_force(struct pblk *pblk)
>>> +{
>>> +    struct pblk_line_meta *lm = &pblk->lm;
>>> +    struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>> +    struct pblk_line *meta_line;
>>> +
>>> +    while (true) {
>>> +        spin_lock(&l_mg->close_lock);
>>> +        if (list_empty(&l_mg->emeta_list)) {
>>> +            spin_unlock(&l_mg->close_lock);
>>> +            break;
>>> +        }
>>> +        meta_line = list_first_entry(&l_mg->emeta_list,
>>> +                        struct pblk_line, list);
>>> +        if (meta_line->emeta->mem >= lm->emeta_len[0]) {
>>> +            spin_unlock(&l_mg->close_lock);
>>> +            io_schedule();
>>> +            continue;
>>> +        }
>>> +        spin_unlock(&l_mg->close_lock);
>>> +        pblk_submit_meta_io(pblk, meta_line);
>>> +    }
>>> +}
>>> +
>>>   static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
>>>   {
>>>       struct ppa_addr erase_ppa;
>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>> index 0a85990..a42bbfb 100644
>>> --- a/drivers/lightnvm/pblk.h
>>> +++ b/drivers/lightnvm/pblk.h
>>> @@ -877,6 +877,7 @@ int pblk_write_ts(void *data);
>>>   void pblk_write_timer_fn(struct timer_list *t);
>>>   void pblk_write_should_kick(struct pblk *pblk);
>>>   void pblk_write_kick(struct pblk *pblk);
>>> +void pblk_write_emeta_force(struct pblk *pblk);
>>>   /*
>>>    * pblk read path
>> Hi Igor,
>> Is this an error that qemu can force pblk to expose? Can you provide a specific example on what is needed to force the error?
> 
> So I hit this error on PBLKs with low number of LUNs and multiple
> write IO errors (should be reproducible with error injection). Then
> pblk_map_remaining() quickly mapped all the sectors in line and thus
> writer thread was not able to issue all the necessary emeta IO writes,
> so it stucks when trying to replace line to new one. So this is
> definitely an error/corner case scenario.

If the cause if emeta writes, then there is a bug in
pblk_line_close_meta(), as the logic to prevent this case is in place.


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 15/18] lightnvm: pblk: fix in case of lack of lines
  2019-03-18 13:28     ` Igor Konopko
@ 2019-03-18 19:21       ` Javier González
  2019-03-21 13:21         ` Igor Konopko
  0 siblings, 1 reply; 69+ messages in thread
From: Javier González @ 2019-03-18 19:21 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 2429 bytes --]


> On 18 Mar 2019, at 14.28, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> 
> 
> On 18.03.2019 08:42, Javier González wrote:
>>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>> 
>>> In case when mapping fails (called from writer thread) due to lack of
>>> lines, currently we are calling pblk_pipeline_stop(), which waits
>>> for pending write IOs, so it will lead to the deadlock. Switching
>>> to __pblk_pipeline_stop() in that case instead will fix that.
>>> 
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-map.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>> index 5408e32..afc10306 100644
>>> --- a/drivers/lightnvm/pblk-map.c
>>> +++ b/drivers/lightnvm/pblk-map.c
>>> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>> 		pblk_line_close_meta(pblk, prev_line);
>>> 
>>> 		if (!line) {
>>> -			pblk_pipeline_stop(pblk);
>>> +			__pblk_pipeline_stop(pblk);
>>> 			return -ENOSPC;
>>> 		}
>>> 
>>> --
>>> 2.9.5
>> Have you seeing this problem?
>> Before checking if there is a line, we are closing metadata for the
>> previous line, so all inflight I/Os should be clear. Can you develop on
>> the case in which this would happen?
> 
> So we have following sequence: pblk_pipeline_stop() -> __pblk_pipeline_flush() -> pblk_flush_writer() -> wait for emptying round buffer.
> This will never complete, since we still have some RB entries, which cannot be written since writer thread is blocked with waiting inside pblk_flush_writer().
> 
> Am I missing sth?

So this will be the case in which we are in the last line and
pblk_flush_writer() needs to allocate an extra line persist the write
buffer? Shouldn’t the rate-limiter take care of this? As I recall, Hans
implemented some logic to guarantee that at least one line is always
available for GC, which in turn will free a line for user data. When we
hit this limit, performance will drop dramatically, but it should not
stall.

The reason I want to understand the real case behind this fix is that by
calling __pblk_pipeline_stop() we are basically stopping all other
inflight I/Os; we should be able to serve all inflight I/Os before a
mapping error triggers the pipeline to get into read-only mode.

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 04/18] lightnvm: pblk: OOB recovery for closed chunks fix
  2019-03-18 12:50       ` Igor Konopko
@ 2019-03-18 19:25         ` Javier González
  0 siblings, 0 replies; 69+ messages in thread
From: Javier González @ 2019-03-18 19:25 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 3360 bytes --]


> On 18 Mar 2019, at 13.50, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> 
> 
> On 17.03.2019 20:24, Matias Bjørling wrote:
>> On 3/16/19 3:43 PM, Javier González wrote:
>>>> On 14 Mar 2019, at 09.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>> 
>>>> In case of OOB recovery, when some of the chunks are in closed state,
>>>> we are calculating number of written sectors in line incorrectly,
>>>> because we are always counting chunk WP, which for closed chunks
>>>> does not longer reflects written sectors in particular chunks. This
>>>> patch for such a chunks takes clba field instead.
>>>> 
>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>> ---
>>>> drivers/lightnvm/pblk-recovery.c | 8 +++++++-
>>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>>>> index 83b467b..bcd3633 100644
>>>> --- a/drivers/lightnvm/pblk-recovery.c
>>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>>> @@ -101,6 +101,8 @@ static void pblk_update_line_wp(struct pblk *pblk, struct pblk_line *line,
>>>> 
>>>> static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line *line)
>>>> {
>>>> +    struct nvm_tgt_dev *dev = pblk->dev;
>>>> +    struct nvm_geo *geo = &dev->geo;
>>>>     struct pblk_line_meta *lm = &pblk->lm;
>>>>     int nr_bb = bitmap_weight(line->blk_bitmap, lm->blk_per_line);
>>>>     u64 written_secs = 0;
>>>> @@ -113,7 +115,11 @@ static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line *line)
>>>>         if (chunk->state & NVM_CHK_ST_OFFLINE)
>>>>             continue;
>>>> 
>>>> -        written_secs += chunk->wp;
>>>> +        if (chunk->state & NVM_CHK_ST_OPEN)
>>>> +            written_secs += chunk->wp;
>>>> +        else if (chunk->state & NVM_CHK_ST_CLOSED)
>>>> +            written_secs += geo->clba;
>>>> +
>>>>         valid_chunks++;
>>>>     }
>>>> 
>>>> --
>>>> 2.9.5
>>> 
>>> Mmmm. The change is correct, but can you develop on why the WP does not
>>> reflect the written sectors in a closed chunk? As I understand it, the
>>> WP reflects the last written sector; if it happens to be WP == clba, then
>>> the chunk state machine transitions to closed, but the WP remains
>>> untouched. It is only when we reset the chunk that the WP comes back to
>>> 0. Am I missing something?
>> I agree with Javier. In OCSSD, the write pointer shall always be valid.
> 
> So based on OCSSD 2.0 spec "If WP = SLBA + NLB, the chunk is closed" (also on "Figure 5: Chunk State Diagram in spec": WP = SLBA + NLB for closed chunk), my understanding it that the WP is not valid for that particular use case, since in written_secs we want to obtain NLB field (relative within chunk, without starting LBA value). So that's why I used fixed CLBA here, since it WP pointer for closed chunk is absolute value instead of relative one.
> Did I misunderstood sth in the spec?

I see where the confusion comes from. I have always understood it in the
way that WP points to the next available sector in the chunk. When WP =
SLBA + NLB, then the WP remains in the last valid sector (as there are
no more available sectors).

Both polk and QEMU are implemented this way, but it might be worth a
clarification in the spec (to be one way or the other)? Matias?


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 17/18] lightnvm: allow to use full device path
  2019-03-18 14:41       ` Hans Holmberg
@ 2019-03-21 13:18         ` Igor Konopko
  2019-03-25 11:40           ` Matias Bjørling
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-21 13:18 UTC (permalink / raw)
  To: Matias Bjorling
  Cc: Hans Holmberg, Javier González, Hans Holmberg, linux-block

Matias,
any opinion from your side, whether you would like to do such a changes 
in userspace tool or in lightnvm core? I can go both ways.
Thanks
Igor

On 18.03.2019 15:41, Hans Holmberg wrote:
> On Mon, Mar 18, 2019 at 2:18 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>>
>>
>> On 18.03.2019 11:28, Hans Holmberg wrote:
>>> On Thu, Mar 14, 2019 at 5:11 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>
>>>> This patch adds the possibility to provide full device path (like
>>>> /dev/nvme0n1) when specifying device on top of which pblk instance
>>>> should be created/removed.
>>>>
>>>> This makes creation of targets from nvme-cli (or other ioctl based
>>>> tools) more unified with other commands in comparison with current
>>>> situation where almost all commands uses full device path with except
>>>> of lightnvm creation/removal parameter which uses just 'nvme0n1'
>>>> naming convention. After this changes both approach will be valid.
>>>>
>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>> ---
>>>>    drivers/lightnvm/core.c | 23 ++++++++++++++++++-----
>>>>    1 file changed, 18 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>>> index c01f83b..838c3d8 100644
>>>> --- a/drivers/lightnvm/core.c
>>>> +++ b/drivers/lightnvm/core.c
>>>> @@ -1195,6 +1195,21 @@ void nvm_unregister(struct nvm_dev *dev)
>>>>    }
>>>>    EXPORT_SYMBOL(nvm_unregister);
>>>>
>>>> +#define PREFIX_STR "/dev/"
>>>> +static void nvm_normalize_path(char *path)
>>>> +{
>>>> +       path[DISK_NAME_LEN - 1] = '\0';
>>>> +       if (!memcmp(PREFIX_STR, path,
>>>> +                               sizeof(char) * strlen(PREFIX_STR))) {
>>>> +               /*
>>>> +                * User provide name in '/dev/nvme0n1' format,
>>>> +                * so we need to skip '/dev/' for comparison
>>>> +                */
>>>> +               memmove(path, path + sizeof(char) * strlen(PREFIX_STR),
>>>> +                       (DISK_NAME_LEN - strlen(PREFIX_STR)) * sizeof(char));
>>>> +       }
>>>> +}
>>>> +
>>>
>>> I don't like this. Why add string parsing to the kernel? Can't this
>>> feature be added to the nvme tool?
>>
>> Since during target creation/removal in kernel, we already operate on
>> strings multiple times (strcmp calls for target types, nvme device,
>> target names) my idea was to keep this in the same layer too.
> 
> Oh, pardon the terse and rather grumpy review. Let me elaborate:
> 
> String parsing is best avoided when possible, and i don't think it's
> worth increasing the kernel code size and changing the behavior of the
> IOCTL when its fully doable to do this in userspace.
> 
> Thanks,
> Hans
> 
>>
>>>
>>>>    static int __nvm_configure_create(struct nvm_ioctl_create *create)
>>>>    {
>>>>           struct nvm_dev *dev;
>>>> @@ -1304,9 +1319,9 @@ static long nvm_ioctl_dev_create(struct file *file, void __user *arg)
>>>>                   return -EINVAL;
>>>>           }
>>>>
>>>> -       create.dev[DISK_NAME_LEN - 1] = '\0';
>>>> +       nvm_normalize_path(create.dev);
>>>> +       nvm_normalize_path(create.tgtname);
>>>>           create.tgttype[NVM_TTYPE_NAME_MAX - 1] = '\0';
>>>> -       create.tgtname[DISK_NAME_LEN - 1] = '\0';
>>>>
>>>>           if (create.flags != 0) {
>>>>                   __u32 flags = create.flags;
>>>> @@ -1333,7 +1348,7 @@ static long nvm_ioctl_dev_remove(struct file *file, void __user *arg)
>>>>           if (copy_from_user(&remove, arg, sizeof(struct nvm_ioctl_remove)))
>>>>                   return -EFAULT;
>>>>
>>>> -       remove.tgtname[DISK_NAME_LEN - 1] = '\0';
>>>> +       nvm_normalize_path(remove.tgtname);
>>>>
>>>>           if (remove.flags != 0) {
>>>>                   pr_err("nvm: no flags supported\n");
>>>> @@ -1373,8 +1388,6 @@ static long nvm_ioctl_dev_factory(struct file *file, void __user *arg)
>>>>           if (copy_from_user(&fact, arg, sizeof(struct nvm_ioctl_dev_factory)))
>>>>                   return -EFAULT;
>>>>
>>>> -       fact.dev[DISK_NAME_LEN - 1] = '\0';
>>>> -
>>>>           if (fact.flags & ~(NVM_FACTORY_NR_BITS - 1))
>>>>                   return -EINVAL;
>>>>
>>>> --
>>>> 2.9.5
>>>>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 15/18] lightnvm: pblk: fix in case of lack of lines
  2019-03-18 19:21       ` Javier González
@ 2019-03-21 13:21         ` Igor Konopko
  2019-03-22 12:17           ` Hans Holmberg
  0 siblings, 1 reply; 69+ messages in thread
From: Igor Konopko @ 2019-03-21 13:21 UTC (permalink / raw)
  To: Javier González, Hans Holmberg; +Cc: Matias Bjørling, linux-block



On 18.03.2019 20:21, Javier González wrote:
> 
>> On 18 Mar 2019, at 14.28, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>>
>>
>> On 18.03.2019 08:42, Javier González wrote:
>>>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>
>>>> In case when mapping fails (called from writer thread) due to lack of
>>>> lines, currently we are calling pblk_pipeline_stop(), which waits
>>>> for pending write IOs, so it will lead to the deadlock. Switching
>>>> to __pblk_pipeline_stop() in that case instead will fix that.
>>>>
>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>> ---
>>>> drivers/lightnvm/pblk-map.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>>> index 5408e32..afc10306 100644
>>>> --- a/drivers/lightnvm/pblk-map.c
>>>> +++ b/drivers/lightnvm/pblk-map.c
>>>> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>> 		pblk_line_close_meta(pblk, prev_line);
>>>>
>>>> 		if (!line) {
>>>> -			pblk_pipeline_stop(pblk);
>>>> +			__pblk_pipeline_stop(pblk);
>>>> 			return -ENOSPC;
>>>> 		}
>>>>
>>>> --
>>>> 2.9.5
>>> Have you seeing this problem?
>>> Before checking if there is a line, we are closing metadata for the
>>> previous line, so all inflight I/Os should be clear. Can you develop on
>>> the case in which this would happen?
>>
>> So we have following sequence: pblk_pipeline_stop() -> __pblk_pipeline_flush() -> pblk_flush_writer() -> wait for emptying round buffer.
>> This will never complete, since we still have some RB entries, which cannot be written since writer thread is blocked with waiting inside pblk_flush_writer().
>>
>> Am I missing sth?
> 
> So this will be the case in which we are in the last line and
> pblk_flush_writer() needs to allocate an extra line persist the write
> buffer? Shouldn’t the rate-limiter take care of this? As I recall, Hans
> implemented some logic to guarantee that at least one line is always
> available for GC, which in turn will free a line for user data. When we
> hit this limit, performance will drop dramatically, but it should not
> stall.
> 
> The reason I want to understand the real case behind this fix is that by
> calling __pblk_pipeline_stop() we are basically stopping all other
> inflight I/Os; we should be able to serve all inflight I/Os before a
> mapping error triggers the pipeline to get into read-only mode.
> 

Javier,
my understanding was that if we hit that particular case, we are simply 
"done" with pblk and there is no way to recover it, so I made this 
changes based on code analysis, if it is not true, than this patch does 
not make sense anymore.

Hans,
could you help me to understand how rate limiter ensure that what Javier 
mentioned about? Thanks

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 10/18] lightnvm: pblk: ensure that emeta is written
  2019-03-18 18:26       ` Javier González
@ 2019-03-21 13:34         ` Igor Konopko
  0 siblings, 0 replies; 69+ messages in thread
From: Igor Konopko @ 2019-03-21 13:34 UTC (permalink / raw)
  To: Javier González; +Cc: Matias Bjørling, Hans Holmberg, linux-block



On 18.03.2019 19:26, Javier González wrote:
>> On 18 Mar 2019, at 14.02, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>>
>>
>> On 17.03.2019 20:44, Matias Bjørling wrote:
>>> On 3/14/19 9:04 AM, Igor Konopko wrote:
>>>> When we are trying to switch to the new line, we need to ensure that
>>>> emeta for n-2 line is already written. In other case we can end with
>>>> deadlock scenario, when the writer has no more requests to write and
>>>> thus there is no way to trigger emeta writes from writer thread. This
>>>> is a corner case scenario which occurs in a case of multiple writes
>>>> error and thus kind of early line close due to lack of line space.
>>>>
>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>> ---
>>>>    drivers/lightnvm/pblk-core.c  |  2 ++
>>>>    drivers/lightnvm/pblk-write.c | 24 ++++++++++++++++++++++++
>>>>    drivers/lightnvm/pblk.h       |  1 +
>>>>    3 files changed, 27 insertions(+)
>>>>
>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>> index 38e26fe..a683d1f 100644
>>>> --- a/drivers/lightnvm/pblk-core.c
>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>> @@ -1001,6 +1001,7 @@ static void pblk_line_setup_metadata(struct pblk_line *line,
>>>>                         struct pblk_line_mgmt *l_mg,
>>>>                         struct pblk_line_meta *lm)
>>>>    {
>>>> +    struct pblk *pblk = container_of(l_mg, struct pblk, l_mg);
>>>>        int meta_line;
>>>>        lockdep_assert_held(&l_mg->free_lock);
>>>> @@ -1009,6 +1010,7 @@ static void pblk_line_setup_metadata(struct pblk_line *line,
>>>>        meta_line = find_first_zero_bit(&l_mg->meta_bitmap, PBLK_DATA_LINES);
>>>>        if (meta_line == PBLK_DATA_LINES) {
>>>>            spin_unlock(&l_mg->free_lock);
>>>> +        pblk_write_emeta_force(pblk);
>>>>            io_schedule();
>>>>            spin_lock(&l_mg->free_lock);
>>>>            goto retry_meta;
>>>> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
>>>> index 4e63f9b..4fbb9b2 100644
>>>> --- a/drivers/lightnvm/pblk-write.c
>>>> +++ b/drivers/lightnvm/pblk-write.c
>>>> @@ -505,6 +505,30 @@ static struct pblk_line *pblk_should_submit_meta_io(struct pblk *pblk,
>>>>        return meta_line;
>>>>    }
>>>> +void pblk_write_emeta_force(struct pblk *pblk)
>>>> +{
>>>> +    struct pblk_line_meta *lm = &pblk->lm;
>>>> +    struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>>> +    struct pblk_line *meta_line;
>>>> +
>>>> +    while (true) {
>>>> +        spin_lock(&l_mg->close_lock);
>>>> +        if (list_empty(&l_mg->emeta_list)) {
>>>> +            spin_unlock(&l_mg->close_lock);
>>>> +            break;
>>>> +        }
>>>> +        meta_line = list_first_entry(&l_mg->emeta_list,
>>>> +                        struct pblk_line, list);
>>>> +        if (meta_line->emeta->mem >= lm->emeta_len[0]) {
>>>> +            spin_unlock(&l_mg->close_lock);
>>>> +            io_schedule();
>>>> +            continue;
>>>> +        }
>>>> +        spin_unlock(&l_mg->close_lock);
>>>> +        pblk_submit_meta_io(pblk, meta_line);
>>>> +    }
>>>> +}
>>>> +
>>>>    static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
>>>>    {
>>>>        struct ppa_addr erase_ppa;
>>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>>> index 0a85990..a42bbfb 100644
>>>> --- a/drivers/lightnvm/pblk.h
>>>> +++ b/drivers/lightnvm/pblk.h
>>>> @@ -877,6 +877,7 @@ int pblk_write_ts(void *data);
>>>>    void pblk_write_timer_fn(struct timer_list *t);
>>>>    void pblk_write_should_kick(struct pblk *pblk);
>>>>    void pblk_write_kick(struct pblk *pblk);
>>>> +void pblk_write_emeta_force(struct pblk *pblk);
>>>>    /*
>>>>     * pblk read path
>>> Hi Igor,
>>> Is this an error that qemu can force pblk to expose? Can you provide a specific example on what is needed to force the error?
>>
>> So I hit this error on PBLKs with low number of LUNs and multiple
>> write IO errors (should be reproducible with error injection). Then
>> pblk_map_remaining() quickly mapped all the sectors in line and thus
>> writer thread was not able to issue all the necessary emeta IO writes,
>> so it stucks when trying to replace line to new one. So this is
>> definitely an error/corner case scenario.
> 
> If the cause if emeta writes, then there is a bug in
> pblk_line_close_meta(), as the logic to prevent this case is in place.
> 

So I definitely saw this functions to be called few times in corner 
series scenarios, but I will drop this patch for now and I'll try to 
find out what is the reason of such a behavior, since this patch more 
looks like a workaround that a real fix for me now after the discussion.
Thanks
Igor

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 15/18] lightnvm: pblk: fix in case of lack of lines
  2019-03-21 13:21         ` Igor Konopko
@ 2019-03-22 12:17           ` Hans Holmberg
  0 siblings, 0 replies; 69+ messages in thread
From: Hans Holmberg @ 2019-03-22 12:17 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Javier González, Hans Holmberg, Matias Bjørling, linux-block

On Thu, Mar 21, 2019 at 2:21 PM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
>
>
> On 18.03.2019 20:21, Javier González wrote:
> >
> >> On 18 Mar 2019, at 14.28, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>
> >>
> >>
> >> On 18.03.2019 08:42, Javier González wrote:
> >>>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@intel.com> wrote:
> >>>>
> >>>> In case when mapping fails (called from writer thread) due to lack of
> >>>> lines, currently we are calling pblk_pipeline_stop(), which waits
> >>>> for pending write IOs, so it will lead to the deadlock. Switching
> >>>> to __pblk_pipeline_stop() in that case instead will fix that.
> >>>>
> >>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>>> ---
> >>>> drivers/lightnvm/pblk-map.c | 2 +-
> >>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
> >>>> index 5408e32..afc10306 100644
> >>>> --- a/drivers/lightnvm/pblk-map.c
> >>>> +++ b/drivers/lightnvm/pblk-map.c
> >>>> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
> >>>>            pblk_line_close_meta(pblk, prev_line);
> >>>>
> >>>>            if (!line) {
> >>>> -                  pblk_pipeline_stop(pblk);
> >>>> +                  __pblk_pipeline_stop(pblk);
> >>>>                    return -ENOSPC;
> >>>>            }
> >>>>
> >>>> --
> >>>> 2.9.5
> >>> Have you seeing this problem?
> >>> Before checking if there is a line, we are closing metadata for the
> >>> previous line, so all inflight I/Os should be clear. Can you develop on
> >>> the case in which this would happen?
> >>
> >> So we have following sequence: pblk_pipeline_stop() -> __pblk_pipeline_flush() -> pblk_flush_writer() -> wait for emptying round buffer.
> >> This will never complete, since we still have some RB entries, which cannot be written since writer thread is blocked with waiting inside pblk_flush_writer().
> >>
> >> Am I missing sth?
> >
> > So this will be the case in which we are in the last line and
> > pblk_flush_writer() needs to allocate an extra line persist the write
> > buffer? Shouldn’t the rate-limiter take care of this? As I recall, Hans
> > implemented some logic to guarantee that at least one line is always
> > available for GC, which in turn will free a line for user data. When we
> > hit this limit, performance will drop dramatically, but it should not
> > stall.
> >
> > The reason I want to understand the real case behind this fix is that by
> > calling __pblk_pipeline_stop() we are basically stopping all other
> > inflight I/Os; we should be able to serve all inflight I/Os before a
> > mapping error triggers the pipeline to get into read-only mode.
> >
>
> Javier,
> my understanding was that if we hit that particular case, we are simply
> "done" with pblk and there is no way to recover it, so I made this
> changes based on code analysis, if it is not true, than this patch does
> not make sense anymore.
>
> Hans,
> could you help me to understand how rate limiter ensure that what Javier
> mentioned about? Thanks

I think Javier refers to: '3bcebc5bac09 ("lightnvm: pblk: set
conservative threshold for user writes")'

See the commit message. Let me know if something is unclear :)

Hans

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 17/18] lightnvm: allow to use full device path
  2019-03-21 13:18         ` Igor Konopko
@ 2019-03-25 11:40           ` Matias Bjørling
  0 siblings, 0 replies; 69+ messages in thread
From: Matias Bjørling @ 2019-03-25 11:40 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Hans Holmberg, Javier González, Hans Holmberg, linux-block

On 3/21/19 2:18 PM, Igor Konopko wrote:
> Matias,
> any opinion from your side, whether you would like to do such a changes 
> in userspace tool or in lightnvm core? I can go both ways.
> Thanks
> Igor

I'm a user-space freak. Let's fix it up in nvme-cli.

> 
> On 18.03.2019 15:41, Hans Holmberg wrote:
>> On Mon, Mar 18, 2019 at 2:18 PM Igor Konopko 
>> <igor.j.konopko@intel.com> wrote:
>>>
>>>
>>>
>>> On 18.03.2019 11:28, Hans Holmberg wrote:
>>>> On Thu, Mar 14, 2019 at 5:11 PM Igor Konopko 
>>>> <igor.j.konopko@intel.com> wrote:
>>>>>
>>>>> This patch adds the possibility to provide full device path (like
>>>>> /dev/nvme0n1) when specifying device on top of which pblk instance
>>>>> should be created/removed.
>>>>>
>>>>> This makes creation of targets from nvme-cli (or other ioctl based
>>>>> tools) more unified with other commands in comparison with current
>>>>> situation where almost all commands uses full device path with except
>>>>> of lightnvm creation/removal parameter which uses just 'nvme0n1'
>>>>> naming convention. After this changes both approach will be valid.
>>>>>
>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>> ---
>>>>>    drivers/lightnvm/core.c | 23 ++++++++++++++++++-----
>>>>>    1 file changed, 18 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>>>> index c01f83b..838c3d8 100644
>>>>> --- a/drivers/lightnvm/core.c
>>>>> +++ b/drivers/lightnvm/core.c
>>>>> @@ -1195,6 +1195,21 @@ void nvm_unregister(struct nvm_dev *dev)
>>>>>    }
>>>>>    EXPORT_SYMBOL(nvm_unregister);
>>>>>
>>>>> +#define PREFIX_STR "/dev/"
>>>>> +static void nvm_normalize_path(char *path)
>>>>> +{
>>>>> +       path[DISK_NAME_LEN - 1] = '\0';
>>>>> +       if (!memcmp(PREFIX_STR, path,
>>>>> +                               sizeof(char) * strlen(PREFIX_STR))) {
>>>>> +               /*
>>>>> +                * User provide name in '/dev/nvme0n1' format,
>>>>> +                * so we need to skip '/dev/' for comparison
>>>>> +                */
>>>>> +               memmove(path, path + sizeof(char) * 
>>>>> strlen(PREFIX_STR),
>>>>> +                       (DISK_NAME_LEN - strlen(PREFIX_STR)) * 
>>>>> sizeof(char));
>>>>> +       }
>>>>> +}
>>>>> +
>>>>
>>>> I don't like this. Why add string parsing to the kernel? Can't this
>>>> feature be added to the nvme tool?
>>>
>>> Since during target creation/removal in kernel, we already operate on
>>> strings multiple times (strcmp calls for target types, nvme device,
>>> target names) my idea was to keep this in the same layer too.
>>
>> Oh, pardon the terse and rather grumpy review. Let me elaborate:
>>
>> String parsing is best avoided when possible, and i don't think it's
>> worth increasing the kernel code size and changing the behavior of the
>> IOCTL when its fully doable to do this in userspace.
>>
>> Thanks,
>> Hans
>>
>>>
>>>>
>>>>>    static int __nvm_configure_create(struct nvm_ioctl_create *create)
>>>>>    {
>>>>>           struct nvm_dev *dev;
>>>>> @@ -1304,9 +1319,9 @@ static long nvm_ioctl_dev_create(struct file 
>>>>> *file, void __user *arg)
>>>>>                   return -EINVAL;
>>>>>           }
>>>>>
>>>>> -       create.dev[DISK_NAME_LEN - 1] = '\0';
>>>>> +       nvm_normalize_path(create.dev);
>>>>> +       nvm_normalize_path(create.tgtname);
>>>>>           create.tgttype[NVM_TTYPE_NAME_MAX - 1] = '\0';
>>>>> -       create.tgtname[DISK_NAME_LEN - 1] = '\0';
>>>>>
>>>>>           if (create.flags != 0) {
>>>>>                   __u32 flags = create.flags;
>>>>> @@ -1333,7 +1348,7 @@ static long nvm_ioctl_dev_remove(struct file 
>>>>> *file, void __user *arg)
>>>>>           if (copy_from_user(&remove, arg, sizeof(struct 
>>>>> nvm_ioctl_remove)))
>>>>>                   return -EFAULT;
>>>>>
>>>>> -       remove.tgtname[DISK_NAME_LEN - 1] = '\0';
>>>>> +       nvm_normalize_path(remove.tgtname);
>>>>>
>>>>>           if (remove.flags != 0) {
>>>>>                   pr_err("nvm: no flags supported\n");
>>>>> @@ -1373,8 +1388,6 @@ static long nvm_ioctl_dev_factory(struct file 
>>>>> *file, void __user *arg)
>>>>>           if (copy_from_user(&fact, arg, sizeof(struct 
>>>>> nvm_ioctl_dev_factory)))
>>>>>                   return -EFAULT;
>>>>>
>>>>> -       fact.dev[DISK_NAME_LEN - 1] = '\0';
>>>>> -
>>>>>           if (fact.flags & ~(NVM_FACTORY_NR_BITS - 1))
>>>>>                   return -EINVAL;
>>>>>
>>>>> -- 
>>>>> 2.9.5
>>>>>


^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2019-03-25 11:40 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-14 16:04 [PATCH 00/18] lightnvm: next set of improvements for 5.2 Igor Konopko
2019-03-14 16:04 ` [PATCH 01/18] lightnvm: pblk: fix warning in pblk_l2p_init() Igor Konopko
2019-03-16 22:29   ` Javier González
2019-03-18 16:25   ` Matias Bjørling
2019-03-14 16:04 ` [PATCH 02/18] lightnvm: pblk: warn when there are opened chunks Igor Konopko
2019-03-16 22:36   ` Javier González
2019-03-17 19:39   ` Matias Bjørling
2019-03-14 16:04 ` [PATCH 03/18] lightnvm: pblk: simplify partial read path Igor Konopko
2019-03-14 21:35   ` Heiner Litz
2019-03-15  9:52     ` Igor Konopko
2019-03-16 22:28       ` Javier González
2019-03-18 12:44         ` Igor Konopko
2019-03-14 16:04 ` [PATCH 04/18] lightnvm: pblk: OOB recovery for closed chunks fix Igor Konopko
2019-03-16 22:43   ` Javier González
2019-03-17 19:24     ` Matias Bjørling
2019-03-18 12:50       ` Igor Konopko
2019-03-18 19:25         ` Javier González
2019-03-14 16:04 ` [PATCH 05/18] lightnvm: pblk: propagate errors when reading meta Igor Konopko
2019-03-16 22:48   ` Javier González
2019-03-18 11:54   ` Hans Holmberg
2019-03-14 16:04 ` [PATCH 06/18] lightnvm: pblk: recover only written metadata Igor Konopko
2019-03-16 23:46   ` Javier González
2019-03-18 12:54     ` Igor Konopko
2019-03-18 15:04       ` Igor Konopko
2019-03-14 16:04 ` [PATCH 07/18] lightnvm: pblk: wait for inflight IOs in recovery Igor Konopko
2019-03-17 19:33   ` Matias Bjørling
2019-03-18 12:58     ` Igor Konopko
2019-03-14 16:04 ` [PATCH 08/18] lightnvm: pblk: fix spin_unlock order Igor Konopko
2019-03-16 23:49   ` Javier González
2019-03-18 11:55   ` Hans Holmberg
2019-03-14 16:04 ` [PATCH 09/18] lightnvm: pblk: kick writer on write recovery path Igor Konopko
2019-03-16 23:54   ` Javier González
2019-03-18 11:58   ` Hans Holmberg
2019-03-14 16:04 ` [PATCH 10/18] lightnvm: pblk: ensure that emeta is written Igor Konopko
2019-03-17 19:44   ` Matias Bjørling
2019-03-18 13:02     ` Igor Konopko
2019-03-18 18:26       ` Javier González
2019-03-21 13:34         ` Igor Konopko
2019-03-18  7:46   ` Javier González
2019-03-14 16:04 ` [PATCH 11/18] lightnvm: pblk: fix update line wp in OOB recovery Igor Konopko
2019-03-18  6:56   ` Javier González
2019-03-18 13:06     ` Igor Konopko
2019-03-14 16:04 ` [PATCH 12/18] lightnvm: pblk: do not read OOB from emeta region Igor Konopko
2019-03-17 19:56   ` Matias Bjørling
2019-03-18 13:05     ` Igor Konopko
2019-03-14 16:04 ` [PATCH 13/18] lightnvm: pblk: store multiple copies of smeta Igor Konopko
2019-03-18  7:33   ` Javier González
2019-03-18 13:12     ` Igor Konopko
2019-03-14 16:04 ` [PATCH 14/18] lightnvm: pblk: GC error handling Igor Konopko
2019-03-18  7:39   ` Javier González
2019-03-18 12:14   ` Hans Holmberg
2019-03-18 13:22     ` Igor Konopko
2019-03-18 14:14       ` Hans Holmberg
2019-03-14 16:04 ` [PATCH 15/18] lightnvm: pblk: fix in case of lack of lines Igor Konopko
2019-03-18  7:42   ` Javier González
2019-03-18 13:28     ` Igor Konopko
2019-03-18 19:21       ` Javier González
2019-03-21 13:21         ` Igor Konopko
2019-03-22 12:17           ` Hans Holmberg
2019-03-14 16:04 ` [PATCH 16/18] lightnvm: pblk: use nvm_rq_to_ppa_list() Igor Konopko
2019-03-18  7:48   ` Javier González
2019-03-14 16:04 ` [PATCH 17/18] lightnvm: allow to use full device path Igor Konopko
2019-03-18  7:49   ` Javier González
2019-03-18 10:28   ` Hans Holmberg
2019-03-18 13:18     ` Igor Konopko
2019-03-18 14:41       ` Hans Holmberg
2019-03-21 13:18         ` Igor Konopko
2019-03-25 11:40           ` Matias Bjørling
2019-03-14 16:04 ` [PATCH 18/18] lightnvm: track inflight target creations Igor Konopko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).