linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/7] lightnvm: next set of improvements for 5.2
@ 2019-04-16 10:16 Igor Konopko
  2019-04-16 10:16 ` [PATCH v4 1/7] lightnvm: pblk: IO path reorganization Igor Konopko
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Igor Konopko @ 2019-04-16 10:16 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This is another set of fixes and improvements to both pblk and lightnvm
core. 

First & second patches are the most crutial, since they changes
the approach to the partial read path, so detailed review is needed
especially here.

Other patches are my other findings related to some bugs or potential
improvements, mostly related to some corner cases.

Changes v3 -> v4:
-dropped patches which were already pulled into for-5.2/core branch
-major changes for patch #2 based on code review
-patch #6 modified to use krefs
-new patch #7 which extends the patch #6

Changes v2 -> v3:
-dropped some not needed patches
-dropped patches which were already pulled into for-5.2/core branch
-commit messages cleanup

Changes v1 -> v2:
-dropped some not needed patches
-review feedback incorporated for some of the patches
-partial read path changes patch splited into two patches

Igor Konopko (7):
  lightnvm: pblk: IO path reorganization
  lightnvm: pblk: simplify partial read path
  lightnvm: pblk: recover only written metadata
  lightnvm: pblk: store multiple copies of smeta
  lightnvm: pblk: use nvm_rq_to_ppa_list()
  lightnvm: track inflight target creations
  lightnvm: do not remove instance under global lock

 drivers/lightnvm/core.c          |  75 +++++---
 drivers/lightnvm/pblk-cache.c    |   8 +-
 drivers/lightnvm/pblk-core.c     | 159 +++++++++++++----
 drivers/lightnvm/pblk-init.c     |  37 ++--
 drivers/lightnvm/pblk-rb.c       |  11 +-
 drivers/lightnvm/pblk-read.c     | 376 +++++++++++----------------------------
 drivers/lightnvm/pblk-recovery.c |  35 ++--
 drivers/lightnvm/pblk-rl.c       |   3 +-
 drivers/lightnvm/pblk.h          |  23 +--
 include/linux/lightnvm.h         |   1 +
 10 files changed, 333 insertions(+), 395 deletions(-)

-- 
2.9.5


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/7] lightnvm: pblk: IO path reorganization
  2019-04-16 10:16 [PATCH v4 0/7] lightnvm: next set of improvements for 5.2 Igor Konopko
@ 2019-04-16 10:16 ` Igor Konopko
  2019-04-16 10:16 ` [PATCH v4 2/7] lightnvm: pblk: simplify partial read path Igor Konopko
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Igor Konopko @ 2019-04-16 10:16 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This patch is made in order to prepare read path for new approach to
partial read handling, which is simpler in compare with previous one.

The most important change is to move the handling of completed and
failed bio from the pblk_make_rq() to particular read and write
functions. This is needed, since after partial read path changes,
sometimes completed/failed bio will be different from original one, so
we cannot do this any longer in pblk_make_rq().

Other changes are small read path refactor in order to reduce the size
of the following patch with partial read changes.

Generally the goal of this patch is not to change the functionality,
but just to prepare the code for the following changes.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
---
 drivers/lightnvm/pblk-cache.c |  8 +++--
 drivers/lightnvm/pblk-init.c  | 14 ++------
 drivers/lightnvm/pblk-read.c  | 83 ++++++++++++++++++++-----------------------
 drivers/lightnvm/pblk.h       |  4 +--
 4 files changed, 48 insertions(+), 61 deletions(-)

diff --git a/drivers/lightnvm/pblk-cache.c b/drivers/lightnvm/pblk-cache.c
index c9fa26f..5c1034c 100644
--- a/drivers/lightnvm/pblk-cache.c
+++ b/drivers/lightnvm/pblk-cache.c
@@ -18,7 +18,8 @@
 
 #include "pblk.h"
 
-int pblk_write_to_cache(struct pblk *pblk, struct bio *bio, unsigned long flags)
+void pblk_write_to_cache(struct pblk *pblk, struct bio *bio,
+				unsigned long flags)
 {
 	struct request_queue *q = pblk->dev->q;
 	struct pblk_w_ctx w_ctx;
@@ -43,6 +44,7 @@ int pblk_write_to_cache(struct pblk *pblk, struct bio *bio, unsigned long flags)
 		goto retry;
 	case NVM_IO_ERR:
 		pblk_pipeline_stop(pblk);
+		bio_io_error(bio);
 		goto out;
 	}
 
@@ -79,7 +81,9 @@ int pblk_write_to_cache(struct pblk *pblk, struct bio *bio, unsigned long flags)
 out:
 	generic_end_io_acct(q, REQ_OP_WRITE, &pblk->disk->part0, start_time);
 	pblk_write_should_kick(pblk);
-	return ret;
+
+	if (ret == NVM_IO_DONE)
+		bio_endio(bio);
 }
 
 /*
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 1e227a0..b351c7f 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -50,7 +50,6 @@ struct bio_set pblk_bio_set;
 static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio)
 {
 	struct pblk *pblk = q->queuedata;
-	int ret;
 
 	if (bio_op(bio) == REQ_OP_DISCARD) {
 		pblk_discard(pblk, bio);
@@ -65,7 +64,7 @@ static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio)
 	 */
 	if (bio_data_dir(bio) == READ) {
 		blk_queue_split(q, &bio);
-		ret = pblk_submit_read(pblk, bio);
+		pblk_submit_read(pblk, bio);
 	} else {
 		/* Prevent deadlock in the case of a modest LUN configuration
 		 * and large user I/Os. Unless stalled, the rate limiter
@@ -74,16 +73,7 @@ static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio)
 		if (pblk_get_secs(bio) > pblk_rl_max_io(&pblk->rl))
 			blk_queue_split(q, &bio);
 
-		ret = pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER);
-	}
-
-	switch (ret) {
-	case NVM_IO_ERR:
-		bio_io_error(bio);
-		break;
-	case NVM_IO_DONE:
-		bio_endio(bio);
-		break;
+		pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER);
 	}
 
 	return BLK_QC_T_NONE;
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index f08f7d9..0953c34 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -179,7 +179,8 @@ static void pblk_end_user_read(struct bio *bio, int error)
 {
 	if (error && error != NVM_RSP_WARN_HIGHECC)
 		bio_io_error(bio);
-	bio_endio(bio);
+	else
+		bio_endio(bio);
 }
 
 static void __pblk_end_io_read(struct pblk *pblk, struct nvm_rq *rqd,
@@ -383,7 +384,6 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 
 	/* Free allocated pages in new bio */
 	pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt);
-	__pblk_end_io_read(pblk, rqd, false);
 	return NVM_IO_ERR;
 }
 
@@ -428,7 +428,7 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
 	}
 }
 
-int pblk_submit_read(struct pblk *pblk, struct bio *bio)
+void pblk_submit_read(struct pblk *pblk, struct bio *bio)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct request_queue *q = dev->q;
@@ -436,9 +436,9 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 	unsigned int nr_secs = pblk_get_secs(bio);
 	struct pblk_g_ctx *r_ctx;
 	struct nvm_rq *rqd;
+	struct bio *int_bio;
 	unsigned int bio_init_idx;
 	DECLARE_BITMAP(read_bitmap, NVM_MAX_VLBA);
-	int ret = NVM_IO_ERR;
 
 	generic_start_io_acct(q, REQ_OP_READ, bio_sectors(bio),
 			      &pblk->disk->part0);
@@ -449,74 +449,67 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 
 	rqd->opcode = NVM_OP_PREAD;
 	rqd->nr_ppas = nr_secs;
-	rqd->bio = NULL; /* cloned bio if needed */
 	rqd->private = pblk;
 	rqd->end_io = pblk_end_io_read;
 
 	r_ctx = nvm_rq_to_pdu(rqd);
 	r_ctx->start_time = jiffies;
 	r_ctx->lba = blba;
-	r_ctx->private = bio; /* original bio */
 
 	/* Save the index for this bio's start. This is needed in case
 	 * we need to fill a partial read.
 	 */
 	bio_init_idx = pblk_get_bi_idx(bio);
 
-	if (pblk_alloc_rqd_meta(pblk, rqd))
-		goto fail_rqd_free;
+	if (pblk_alloc_rqd_meta(pblk, rqd)) {
+		bio_io_error(bio);
+		pblk_free_rqd(pblk, rqd, PBLK_READ);
+		return;
+	}
+
+	/* Clone read bio to deal internally with:
+	 * -read errors when reading from drive
+	 * -bio_advance() calls during l2p lookup and cache reads
+	 */
+	int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
 
 	if (nr_secs > 1)
 		pblk_read_ppalist_rq(pblk, rqd, bio, blba, read_bitmap);
 	else
 		pblk_read_rq(pblk, rqd, bio, blba, read_bitmap);
 
+	r_ctx->private = bio; /* original bio */
+	rqd->bio = int_bio; /* internal bio */
+
 	if (bitmap_full(read_bitmap, nr_secs)) {
+		pblk_end_user_read(bio, 0);
 		atomic_inc(&pblk->inflight_io);
 		__pblk_end_io_read(pblk, rqd, false);
-		return NVM_IO_DONE;
+		return;
 	}
 
-	/* All sectors are to be read from the device */
-	if (bitmap_empty(read_bitmap, rqd->nr_ppas)) {
-		struct bio *int_bio = NULL;
-
-		/* Clone read bio to deal with read errors internally */
-		int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
-		if (!int_bio) {
-			pblk_err(pblk, "could not clone read bio\n");
-			goto fail_end_io;
-		}
-
-		rqd->bio = int_bio;
-
-		if (pblk_submit_io(pblk, rqd)) {
+	if (!bitmap_empty(read_bitmap, rqd->nr_ppas)) {
+		/* The read bio request could be partially filled by the write
+		 * buffer, but there are some holes that need to be read from
+		 * the drive.
+		 */
+		bio_put(int_bio);
+		rqd->bio = NULL;
+		if (pblk_partial_read_bio(pblk, rqd, bio_init_idx, read_bitmap,
+					    nr_secs)) {
 			pblk_err(pblk, "read IO submission failed\n");
-			ret = NVM_IO_ERR;
-			goto fail_end_io;
+			bio_io_error(bio);
+			__pblk_end_io_read(pblk, rqd, false);
 		}
-
-		return NVM_IO_OK;
+		return;
 	}
 
-	/* The read bio request could be partially filled by the write buffer,
-	 * but there are some holes that need to be read from the drive.
-	 */
-	ret = pblk_partial_read_bio(pblk, rqd, bio_init_idx, read_bitmap,
-				    nr_secs);
-	if (ret)
-		goto fail_meta_free;
-
-	return NVM_IO_OK;
-
-fail_meta_free:
-	nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
-fail_rqd_free:
-	pblk_free_rqd(pblk, rqd, PBLK_READ);
-	return ret;
-fail_end_io:
-	__pblk_end_io_read(pblk, rqd, false);
-	return ret;
+	/* All sectors are to be read from the device */
+	if (pblk_submit_io(pblk, rqd)) {
+		pblk_err(pblk, "read IO submission failed\n");
+		bio_io_error(bio);
+		__pblk_end_io_read(pblk, rqd, false);
+	}
 }
 
 static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index e304754..17ced12 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -867,7 +867,7 @@ void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
 /*
  * pblk user I/O write path
  */
-int pblk_write_to_cache(struct pblk *pblk, struct bio *bio,
+void pblk_write_to_cache(struct pblk *pblk, struct bio *bio,
 			unsigned long flags);
 int pblk_write_gc_to_cache(struct pblk *pblk, struct pblk_gc_rq *gc_rq);
 
@@ -893,7 +893,7 @@ void pblk_write_kick(struct pblk *pblk);
  * pblk read path
  */
 extern struct bio_set pblk_bio_set;
-int pblk_submit_read(struct pblk *pblk, struct bio *bio);
+void pblk_submit_read(struct pblk *pblk, struct bio *bio);
 int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq);
 /*
  * pblk recovery
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/7] lightnvm: pblk: simplify partial read path
  2019-04-16 10:16 [PATCH v4 0/7] lightnvm: next set of improvements for 5.2 Igor Konopko
  2019-04-16 10:16 ` [PATCH v4 1/7] lightnvm: pblk: IO path reorganization Igor Konopko
@ 2019-04-16 10:16 ` Igor Konopko
  2019-04-17 17:11   ` Heiner Litz
  2019-04-16 10:16 ` [PATCH v4 3/7] lightnvm: pblk: recover only written metadata Igor Konopko
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Igor Konopko @ 2019-04-16 10:16 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This patch changes the approach to handling partial read path.

In old approach merging of data from round buffer and drive was fully
made by drive. This had some disadvantages - code was complex and
relies on bio internals, so it was hard to maintain and was strongly
dependent on bio changes.

In new approach most of the handling is done mostly by block layer
functions such as bio_split(), bio_chain() and generic_make request()
and generally is less complex and easier to maintain. Below some more
details of the new approach.

When read bio arrives, it is cloned for pblk internal purposes. All
the L2P mapping, which includes copying data from round buffer to bio
and thus bio_advance() calls is done on the cloned bio, so the original
bio is untouched. If we found that we have partial read case, we
still have original bio untouched, so we can split it and continue to
process only first part of it in current context, when the rest will be
called as separate bio request which is passed to generic_make_request()
for further processing.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c |  13 +-
 drivers/lightnvm/pblk-rb.c   |  11 +-
 drivers/lightnvm/pblk-read.c | 333 +++++++++++--------------------------------
 drivers/lightnvm/pblk.h      |  18 +--
 4 files changed, 100 insertions(+), 275 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 73be3a0..07270ba 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -2147,8 +2147,8 @@ void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
 	spin_unlock(&pblk->trans_lock);
 }
 
-void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
-			 sector_t blba, int nr_secs)
+int pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
+			 sector_t blba, int nr_secs, bool *from_cache)
 {
 	int i;
 
@@ -2162,10 +2162,19 @@ void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
 		if (!pblk_ppa_empty(ppa) && !pblk_addr_in_cache(ppa)) {
 			struct pblk_line *line = pblk_ppa_to_line(pblk, ppa);
 
+			if (i > 0 && *from_cache)
+				break;
+			*from_cache = false;
+
 			kref_get(&line->ref);
+		} else {
+			if (i > 0 && !*from_cache)
+				break;
+			*from_cache = true;
 		}
 	}
 	spin_unlock(&pblk->trans_lock);
+	return i;
 }
 
 void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 3555014..5abb170 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -642,7 +642,7 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
  * be directed to disk.
  */
 int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
-			struct ppa_addr ppa, int bio_iter, bool advanced_bio)
+			struct ppa_addr ppa)
 {
 	struct pblk *pblk = container_of(rb, struct pblk, rwb);
 	struct pblk_rb_entry *entry;
@@ -673,15 +673,6 @@ int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
 		ret = 0;
 		goto out;
 	}
-
-	/* Only advance the bio if it hasn't been advanced already. If advanced,
-	 * this bio is at least a partial bio (i.e., it has partially been
-	 * filled with data from the cache). If part of the data resides on the
-	 * media, we will read later on
-	 */
-	if (unlikely(!advanced_bio))
-		bio_advance(bio, bio_iter * PBLK_EXPOSED_PAGE_SIZE);
-
 	data = bio_data(bio);
 	memcpy(data, entry->data, rb->seg_size);
 
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 0953c34..d98ea39 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -26,8 +26,7 @@
  * issued.
  */
 static int pblk_read_from_cache(struct pblk *pblk, struct bio *bio,
-				sector_t lba, struct ppa_addr ppa,
-				int bio_iter, bool advanced_bio)
+				sector_t lba, struct ppa_addr ppa)
 {
 #ifdef CONFIG_NVM_PBLK_DEBUG
 	/* Callers must ensure that the ppa points to a cache address */
@@ -35,73 +34,75 @@ static int pblk_read_from_cache(struct pblk *pblk, struct bio *bio,
 	BUG_ON(!pblk_addr_in_cache(ppa));
 #endif
 
-	return pblk_rb_copy_to_bio(&pblk->rwb, bio, lba, ppa,
-						bio_iter, advanced_bio);
+	return pblk_rb_copy_to_bio(&pblk->rwb, bio, lba, ppa);
 }
 
-static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
+static int pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
 				 struct bio *bio, sector_t blba,
-				 unsigned long *read_bitmap)
+				 bool *from_cache)
 {
 	void *meta_list = rqd->meta_list;
-	struct ppa_addr ppas[NVM_MAX_VLBA];
-	int nr_secs = rqd->nr_ppas;
-	bool advanced_bio = false;
-	int i, j = 0;
+	int nr_secs, i;
 
-	pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
+retry:
+	nr_secs = pblk_lookup_l2p_seq(pblk, rqd->ppa_list, blba, rqd->nr_ppas,
+					from_cache);
+
+	if (!*from_cache)
+		goto end;
 
 	for (i = 0; i < nr_secs; i++) {
-		struct ppa_addr p = ppas[i];
 		struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
 		sector_t lba = blba + i;
 
-retry:
-		if (pblk_ppa_empty(p)) {
+		if (pblk_ppa_empty(rqd->ppa_list[i])) {
 			__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
 
-			WARN_ON(test_and_set_bit(i, read_bitmap));
 			meta->lba = addr_empty;
-
-			if (unlikely(!advanced_bio)) {
-				bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
-				advanced_bio = true;
+		} else if (pblk_addr_in_cache(rqd->ppa_list[i])) {
+			/*
+			 * Try to read from write buffer. The address is later
+			 * checked on the write buffer to prevent retrieving
+			 * overwritten data.
+			 */
+			if (!pblk_read_from_cache(pblk, bio, lba,
+							rqd->ppa_list[i])) {
+				if (i == 0) {
+					/*
+					 * We didn't call with bio_advance()
+					 * yet, so we can just retry.
+					 */
+					goto retry;
+				} else {
+					/*
+					 * We already call bio_advance()
+					 * so we cannot retry and we need
+					 * to quit that function in order
+					 * to allow caller to handle the bio
+					 * splitting in the current sector
+					 * position.
+					 */
+					nr_secs = i;
+					goto end;
+				}
 			}
-
-			goto next;
-		}
-
-		/* Try to read from write buffer. The address is later checked
-		 * on the write buffer to prevent retrieving overwritten data.
-		 */
-		if (pblk_addr_in_cache(p)) {
-			if (!pblk_read_from_cache(pblk, bio, lba, p, i,
-								advanced_bio)) {
-				pblk_lookup_l2p_seq(pblk, &p, lba, 1);
-				goto retry;
-			}
-			WARN_ON(test_and_set_bit(i, read_bitmap));
 			meta->lba = cpu_to_le64(lba);
-			advanced_bio = true;
 #ifdef CONFIG_NVM_PBLK_DEBUG
 			atomic_long_inc(&pblk->cache_reads);
 #endif
-		} else {
-			/* Read from media non-cached sectors */
-			rqd->ppa_list[j++] = p;
 		}
-
-next:
-		if (advanced_bio)
-			bio_advance(bio, PBLK_EXPOSED_PAGE_SIZE);
+		bio_advance(bio, PBLK_EXPOSED_PAGE_SIZE);
 	}
 
+end:
 	if (pblk_io_aligned(pblk, nr_secs))
 		rqd->is_seq = 1;
 
 #ifdef CONFIG_NVM_PBLK_DEBUG
 	atomic_long_add(nr_secs, &pblk->inflight_reads);
 #endif
+
+	return nr_secs;
 }
 
 
@@ -197,9 +198,7 @@ static void __pblk_end_io_read(struct pblk *pblk, struct nvm_rq *rqd,
 		pblk_log_read_err(pblk, rqd);
 
 	pblk_read_check_seq(pblk, rqd, r_ctx->lba);
-
-	if (int_bio)
-		bio_put(int_bio);
+	bio_put(int_bio);
 
 	if (put_line)
 		pblk_rq_to_line_put(pblk, rqd);
@@ -223,177 +222,13 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
 	__pblk_end_io_read(pblk, rqd, true);
 }
 
-static void pblk_end_partial_read(struct nvm_rq *rqd)
-{
-	struct pblk *pblk = rqd->private;
-	struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
-	struct pblk_pr_ctx *pr_ctx = r_ctx->private;
-	struct pblk_sec_meta *meta;
-	struct bio *new_bio = rqd->bio;
-	struct bio *bio = pr_ctx->orig_bio;
-	struct bio_vec src_bv, dst_bv;
-	void *meta_list = rqd->meta_list;
-	int bio_init_idx = pr_ctx->bio_init_idx;
-	unsigned long *read_bitmap = pr_ctx->bitmap;
-	int nr_secs = pr_ctx->orig_nr_secs;
-	int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
-	void *src_p, *dst_p;
-	int hole, i;
-
-	if (unlikely(nr_holes == 1)) {
-		struct ppa_addr ppa;
-
-		ppa = rqd->ppa_addr;
-		rqd->ppa_list = pr_ctx->ppa_ptr;
-		rqd->dma_ppa_list = pr_ctx->dma_ppa_list;
-		rqd->ppa_list[0] = ppa;
-	}
-
-	for (i = 0; i < nr_secs; i++) {
-		meta = pblk_get_meta(pblk, meta_list, i);
-		pr_ctx->lba_list_media[i] = le64_to_cpu(meta->lba);
-		meta->lba = cpu_to_le64(pr_ctx->lba_list_mem[i]);
-	}
-
-	/* Fill the holes in the original bio */
-	i = 0;
-	hole = find_first_zero_bit(read_bitmap, nr_secs);
-	do {
-		struct pblk_line *line;
-
-		line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
-		kref_put(&line->ref, pblk_line_put);
-
-		meta = pblk_get_meta(pblk, meta_list, hole);
-		meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]);
-
-		src_bv = new_bio->bi_io_vec[i++];
-		dst_bv = bio->bi_io_vec[bio_init_idx + hole];
-
-		src_p = kmap_atomic(src_bv.bv_page);
-		dst_p = kmap_atomic(dst_bv.bv_page);
-
-		memcpy(dst_p + dst_bv.bv_offset,
-			src_p + src_bv.bv_offset,
-			PBLK_EXPOSED_PAGE_SIZE);
-
-		kunmap_atomic(src_p);
-		kunmap_atomic(dst_p);
-
-		mempool_free(src_bv.bv_page, &pblk->page_bio_pool);
-
-		hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1);
-	} while (hole < nr_secs);
-
-	bio_put(new_bio);
-	kfree(pr_ctx);
-
-	/* restore original request */
-	rqd->bio = NULL;
-	rqd->nr_ppas = nr_secs;
-
-	pblk_end_user_read(bio, rqd->error);
-	__pblk_end_io_read(pblk, rqd, false);
-}
-
-static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
-			    unsigned int bio_init_idx,
-			    unsigned long *read_bitmap,
-			    int nr_holes)
-{
-	void *meta_list = rqd->meta_list;
-	struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
-	struct pblk_pr_ctx *pr_ctx;
-	struct bio *new_bio, *bio = r_ctx->private;
-	int nr_secs = rqd->nr_ppas;
-	int i;
-
-	new_bio = bio_alloc(GFP_KERNEL, nr_holes);
-
-	if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
-		goto fail_bio_put;
-
-	if (nr_holes != new_bio->bi_vcnt) {
-		WARN_ONCE(1, "pblk: malformed bio\n");
-		goto fail_free_pages;
-	}
-
-	pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
-	if (!pr_ctx)
-		goto fail_free_pages;
-
-	for (i = 0; i < nr_secs; i++) {
-		struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
-
-		pr_ctx->lba_list_mem[i] = le64_to_cpu(meta->lba);
-	}
-
-	new_bio->bi_iter.bi_sector = 0; /* internal bio */
-	bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
-
-	rqd->bio = new_bio;
-	rqd->nr_ppas = nr_holes;
-
-	pr_ctx->orig_bio = bio;
-	bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
-	pr_ctx->bio_init_idx = bio_init_idx;
-	pr_ctx->orig_nr_secs = nr_secs;
-	r_ctx->private = pr_ctx;
-
-	if (unlikely(nr_holes == 1)) {
-		pr_ctx->ppa_ptr = rqd->ppa_list;
-		pr_ctx->dma_ppa_list = rqd->dma_ppa_list;
-		rqd->ppa_addr = rqd->ppa_list[0];
-	}
-	return 0;
-
-fail_free_pages:
-	pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
-fail_bio_put:
-	bio_put(new_bio);
-
-	return -ENOMEM;
-}
-
-static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
-				 unsigned int bio_init_idx,
-				 unsigned long *read_bitmap, int nr_secs)
-{
-	int nr_holes;
-	int ret;
-
-	nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
-
-	if (pblk_setup_partial_read(pblk, rqd, bio_init_idx, read_bitmap,
-				    nr_holes))
-		return NVM_IO_ERR;
-
-	rqd->end_io = pblk_end_partial_read;
-
-	ret = pblk_submit_io(pblk, rqd);
-	if (ret) {
-		bio_put(rqd->bio);
-		pblk_err(pblk, "partial read IO submission failed\n");
-		goto err;
-	}
-
-	return NVM_IO_OK;
-
-err:
-	pblk_err(pblk, "failed to perform partial read\n");
-
-	/* Free allocated pages in new bio */
-	pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt);
-	return NVM_IO_ERR;
-}
-
 static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
-			 sector_t lba, unsigned long *read_bitmap)
+			 sector_t lba, bool *from_cache)
 {
 	struct pblk_sec_meta *meta = pblk_get_meta(pblk, rqd->meta_list, 0);
 	struct ppa_addr ppa;
 
-	pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
+	pblk_lookup_l2p_seq(pblk, &ppa, lba, 1, from_cache);
 
 #ifdef CONFIG_NVM_PBLK_DEBUG
 	atomic_long_inc(&pblk->inflight_reads);
@@ -403,7 +238,6 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
 	if (pblk_ppa_empty(ppa)) {
 		__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
 
-		WARN_ON(test_and_set_bit(0, read_bitmap));
 		meta->lba = addr_empty;
 		return;
 	}
@@ -412,12 +246,11 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
 	 * write buffer to prevent retrieving overwritten data.
 	 */
 	if (pblk_addr_in_cache(ppa)) {
-		if (!pblk_read_from_cache(pblk, bio, lba, ppa, 0, 1)) {
-			pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
+		if (!pblk_read_from_cache(pblk, bio, lba, ppa)) {
+			pblk_lookup_l2p_seq(pblk, &ppa, lba, 1, from_cache);
 			goto retry;
 		}
 
-		WARN_ON(test_and_set_bit(0, read_bitmap));
 		meta->lba = cpu_to_le64(lba);
 
 #ifdef CONFIG_NVM_PBLK_DEBUG
@@ -434,17 +267,14 @@ void pblk_submit_read(struct pblk *pblk, struct bio *bio)
 	struct request_queue *q = dev->q;
 	sector_t blba = pblk_get_lba(bio);
 	unsigned int nr_secs = pblk_get_secs(bio);
+	bool from_cache;
 	struct pblk_g_ctx *r_ctx;
 	struct nvm_rq *rqd;
-	struct bio *int_bio;
-	unsigned int bio_init_idx;
-	DECLARE_BITMAP(read_bitmap, NVM_MAX_VLBA);
+	struct bio *int_bio, *split_bio;
 
 	generic_start_io_acct(q, REQ_OP_READ, bio_sectors(bio),
 			      &pblk->disk->part0);
 
-	bitmap_zero(read_bitmap, nr_secs);
-
 	rqd = pblk_alloc_rqd(pblk, PBLK_READ);
 
 	rqd->opcode = NVM_OP_PREAD;
@@ -456,11 +286,6 @@ void pblk_submit_read(struct pblk *pblk, struct bio *bio)
 	r_ctx->start_time = jiffies;
 	r_ctx->lba = blba;
 
-	/* Save the index for this bio's start. This is needed in case
-	 * we need to fill a partial read.
-	 */
-	bio_init_idx = pblk_get_bi_idx(bio);
-
 	if (pblk_alloc_rqd_meta(pblk, rqd)) {
 		bio_io_error(bio);
 		pblk_free_rqd(pblk, rqd, PBLK_READ);
@@ -469,46 +294,58 @@ void pblk_submit_read(struct pblk *pblk, struct bio *bio)
 
 	/* Clone read bio to deal internally with:
 	 * -read errors when reading from drive
-	 * -bio_advance() calls during l2p lookup and cache reads
+	 * -bio_advance() calls during cache reads
 	 */
 	int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
 
 	if (nr_secs > 1)
-		pblk_read_ppalist_rq(pblk, rqd, bio, blba, read_bitmap);
+		nr_secs = pblk_read_ppalist_rq(pblk, rqd, int_bio, blba,
+						&from_cache);
 	else
-		pblk_read_rq(pblk, rqd, bio, blba, read_bitmap);
+		pblk_read_rq(pblk, rqd, int_bio, blba, &from_cache);
 
+split_retry:
 	r_ctx->private = bio; /* original bio */
 	rqd->bio = int_bio; /* internal bio */
 
-	if (bitmap_full(read_bitmap, nr_secs)) {
+	if (from_cache && nr_secs == rqd->nr_ppas) {
+		/* All data was read from cache, we can complete the IO. */
 		pblk_end_user_read(bio, 0);
 		atomic_inc(&pblk->inflight_io);
 		__pblk_end_io_read(pblk, rqd, false);
-		return;
-	}
-
-	if (!bitmap_empty(read_bitmap, rqd->nr_ppas)) {
+	} else if (nr_secs != rqd->nr_ppas) {
 		/* The read bio request could be partially filled by the write
 		 * buffer, but there are some holes that need to be read from
-		 * the drive.
+		 * the drive. In order to handle this, we will use block layer
+		 * mechanism to split this request in to smaller ones and make
+		 * a chain of it.
 		 */
-		bio_put(int_bio);
-		rqd->bio = NULL;
-		if (pblk_partial_read_bio(pblk, rqd, bio_init_idx, read_bitmap,
-					    nr_secs)) {
-			pblk_err(pblk, "read IO submission failed\n");
-			bio_io_error(bio);
-			__pblk_end_io_read(pblk, rqd, false);
-		}
-		return;
-	}
+		split_bio = bio_split(bio, nr_secs * NR_PHY_IN_LOG, GFP_KERNEL,
+					&pblk_bio_set);
+		bio_chain(split_bio, bio);
+		generic_make_request(bio);
+
+		/* New bio contains first N sectors of the previous one, so
+		 * we can continue to use existing rqd, but we need to shrink
+		 * the number of PPAs in it. New bio is also guaranteed that
+		 * it contains only either data from cache or from drive, newer
+		 * mix of them.
+		 */
+		bio = split_bio;
+		rqd->nr_ppas = nr_secs;
+		if (rqd->nr_ppas == 1)
+			rqd->ppa_addr = rqd->ppa_list[0];
 
-	/* All sectors are to be read from the device */
-	if (pblk_submit_io(pblk, rqd)) {
-		pblk_err(pblk, "read IO submission failed\n");
-		bio_io_error(bio);
-		__pblk_end_io_read(pblk, rqd, false);
+		/* Recreate int_bio - existing might have some needed internal
+		 * fields modified already.
+		 */
+		bio_put(int_bio);
+		int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
+		goto split_retry;
+	} else if (pblk_submit_io(pblk, rqd)) {
+		/* Submitting IO to drive failed, let's report an error */
+		rqd->error = -ENODEV;
+		pblk_end_io_read(rqd);
 	}
 }
 
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 17ced12..a678553 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -121,18 +121,6 @@ struct pblk_g_ctx {
 	u64 lba;
 };
 
-/* partial read context */
-struct pblk_pr_ctx {
-	struct bio *orig_bio;
-	DECLARE_BITMAP(bitmap, NVM_MAX_VLBA);
-	unsigned int orig_nr_secs;
-	unsigned int bio_init_idx;
-	void *ppa_ptr;
-	dma_addr_t dma_ppa_list;
-	u64 lba_list_mem[NVM_MAX_VLBA];
-	u64 lba_list_media[NVM_MAX_VLBA];
-};
-
 /* Pad context */
 struct pblk_pad_rq {
 	struct pblk *pblk;
@@ -759,7 +747,7 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
 				 unsigned int pos, unsigned int nr_entries,
 				 unsigned int count);
 int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
-			struct ppa_addr ppa, int bio_iter, bool advanced_bio);
+			struct ppa_addr ppa);
 unsigned int pblk_rb_read_commit(struct pblk_rb *rb, unsigned int entries);
 
 unsigned int pblk_rb_sync_init(struct pblk_rb *rb, unsigned long *flags);
@@ -859,8 +847,8 @@ int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa,
 		       struct pblk_line *gc_line, u64 paddr);
 void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
 			  u64 *lba_list, int nr_secs);
-void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
-			 sector_t blba, int nr_secs);
+int pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
+			 sector_t blba, int nr_secs, bool *from_cache);
 void *pblk_get_meta_for_writes(struct pblk *pblk, struct nvm_rq *rqd);
 void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/7] lightnvm: pblk: recover only written metadata
  2019-04-16 10:16 [PATCH v4 0/7] lightnvm: next set of improvements for 5.2 Igor Konopko
  2019-04-16 10:16 ` [PATCH v4 1/7] lightnvm: pblk: IO path reorganization Igor Konopko
  2019-04-16 10:16 ` [PATCH v4 2/7] lightnvm: pblk: simplify partial read path Igor Konopko
@ 2019-04-16 10:16 ` Igor Konopko
  2019-04-16 10:16 ` [PATCH v4 4/7] lightnvm: pblk: store multiple copies of smeta Igor Konopko
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Igor Konopko @ 2019-04-16 10:16 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This patch ensures that smeta was fully written before even
trying to read it based on chunk table state and write pointer.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-recovery.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 865fe31..a9085b0 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -655,10 +655,12 @@ static int pblk_line_was_written(struct pblk_line *line,
 	bppa = pblk->luns[smeta_blk].bppa;
 	chunk = &line->chks[pblk_ppa_to_pos(geo, bppa)];
 
-	if (chunk->state & NVM_CHK_ST_FREE)
-		return 0;
+	if (chunk->state & NVM_CHK_ST_CLOSED ||
+	    (chunk->state & NVM_CHK_ST_OPEN
+	     && chunk->wp >= lm->smeta_sec))
+		return 1;
 
-	return 1;
+	return 0;
 }
 
 static bool pblk_line_is_open(struct pblk *pblk, struct pblk_line *line)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/7] lightnvm: pblk: store multiple copies of smeta
  2019-04-16 10:16 [PATCH v4 0/7] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (2 preceding siblings ...)
  2019-04-16 10:16 ` [PATCH v4 3/7] lightnvm: pblk: recover only written metadata Igor Konopko
@ 2019-04-16 10:16 ` Igor Konopko
  2019-04-16 10:16 ` [PATCH v4 5/7] lightnvm: pblk: use nvm_rq_to_ppa_list() Igor Konopko
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Igor Konopko @ 2019-04-16 10:16 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

Currently there is only one copy of smeta stored per line in pblk. This
is risky, because in case of read error on such a chunk, we are losing
all the data from whole line, what leads to silent data corruption.

This patch changes this behaviour and allows to store more then one
copy of the smeta (specified by module parameter) in order to provide
higher reliability by storing mirrored copies of smeta struct and
providing possibility to failover to another copy of that struct in
case of read error. Such an approach ensures that copies of that
critical structures will be stored on different dies and thus predicted
UBER is multiple times higher

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
---
 drivers/lightnvm/pblk-core.c     | 124 ++++++++++++++++++++++++++++++++-------
 drivers/lightnvm/pblk-init.c     |  23 ++++++--
 drivers/lightnvm/pblk-recovery.c |  14 +++--
 drivers/lightnvm/pblk-rl.c       |   3 +-
 drivers/lightnvm/pblk.h          |   1 +
 5 files changed, 132 insertions(+), 33 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 07270ba..e6b9295 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -720,13 +720,14 @@ u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line)
 	return bit * geo->ws_opt;
 }
 
-int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
+static int pblk_line_smeta_read_copy(struct pblk *pblk,
+				     struct pblk_line *line, u64 paddr)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct bio *bio;
 	struct nvm_rq rqd;
-	u64 paddr = pblk_line_smeta_start(pblk, line);
 	int i, ret;
 
 	memset(&rqd, 0, sizeof(struct nvm_rq));
@@ -749,8 +750,20 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
 	rqd.nr_ppas = lm->smeta_sec;
 	rqd.is_seq = 1;
 
-	for (i = 0; i < lm->smeta_sec; i++, paddr++)
-		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
+	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
+		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line->id);
+		int pos = pblk_ppa_to_pos(geo, ppa);
+
+		while (test_bit(pos, line->blk_bitmap)) {
+			paddr += pblk->min_write_pgs;
+			ppa = addr_to_gen_ppa(pblk, paddr, line->id);
+			pos = pblk_ppa_to_pos(geo, ppa);
+		}
+
+		rqd.ppa_list[i] = ppa;
+		pblk_get_meta(pblk, rqd.meta_list, i)->lba =
+				  cpu_to_le64(ADDR_EMPTY);
+	}
 
 	ret = pblk_submit_io_sync(pblk, &rqd);
 	if (ret) {
@@ -771,16 +784,63 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
 	return ret;
 }
 
-static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
-				 u64 paddr)
+int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
+{
+	struct pblk_line_meta *lm = &pblk->lm;
+	int i, ret = 0;
+	u64 paddr = pblk_line_smeta_start(pblk, line);
+
+	for (i = 0; i < lm->smeta_copies; i++) {
+		ret = pblk_line_smeta_read_copy(pblk, line,
+						paddr + (i * lm->smeta_sec));
+		if (!ret) {
+			/*
+			 * Just one successfully read copy of smeta is
+			 * enough for us for recovery, don't need to
+			 * read another one.
+			 */
+			return ret;
+		}
+	}
+	return ret;
+}
+
+static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct bio *bio;
 	struct nvm_rq rqd;
 	__le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
 	__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
-	int i, ret;
+	u64 paddr = 0;
+	int smeta_wr_len = lm->smeta_len;
+	int smeta_wr_sec = lm->smeta_sec;
+	int i, ret, rq_writes;
+
+	/*
+	 * Check if we can write all the smeta copies with
+	 * a single write command.
+	 * If yes -> copy smeta sector into multiple copies
+	 * in buffer to write.
+	 * If no -> issue writes one by one using the same
+	 * buffer space.
+	 * Only if all the copies are written correctly
+	 * we are treating this line as valid for proper
+	 * UBER reliability.
+	 */
+	if (lm->smeta_sec * lm->smeta_copies > pblk->max_write_pgs) {
+		rq_writes = lm->smeta_copies;
+	} else {
+		rq_writes = 1;
+		for (i = 1; i < lm->smeta_copies; i++) {
+			memcpy(line->smeta + i * lm->smeta_len,
+			       line->smeta, lm->smeta_len);
+		}
+		smeta_wr_len *= lm->smeta_copies;
+		smeta_wr_sec *= lm->smeta_copies;
+	}
 
 	memset(&rqd, 0, sizeof(struct nvm_rq));
 
@@ -788,7 +848,8 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
 	if (ret)
 		return ret;
 
-	bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
+next_rq:
+	bio = bio_map_kern(dev->q, line->smeta, smeta_wr_len, GFP_KERNEL);
 	if (IS_ERR(bio)) {
 		ret = PTR_ERR(bio);
 		goto clear_rqd;
@@ -799,15 +860,23 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
 
 	rqd.bio = bio;
 	rqd.opcode = NVM_OP_PWRITE;
-	rqd.nr_ppas = lm->smeta_sec;
+	rqd.nr_ppas = smeta_wr_sec;
 	rqd.is_seq = 1;
 
-	for (i = 0; i < lm->smeta_sec; i++, paddr++) {
-		struct pblk_sec_meta *meta = pblk_get_meta(pblk,
-							   rqd.meta_list, i);
+	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
+		void *meta_list = rqd.meta_list;
+		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line->id);
+		int pos = pblk_ppa_to_pos(geo, ppa);
 
-		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
-		meta->lba = lba_list[paddr] = addr_empty;
+		while (test_bit(pos, line->blk_bitmap)) {
+			paddr += pblk->min_write_pgs;
+			ppa = addr_to_gen_ppa(pblk, paddr, line->id);
+			pos = pblk_ppa_to_pos(geo, ppa);
+		}
+
+		rqd.ppa_list[i] = ppa;
+		pblk_get_meta(pblk, meta_list, i)->lba = addr_empty;
+		lba_list[paddr] = addr_empty;
 	}
 
 	ret = pblk_submit_io_sync_sem(pblk, &rqd);
@@ -822,8 +891,13 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
 	if (rqd.error) {
 		pblk_log_write_err(pblk, &rqd);
 		ret = -EIO;
+		goto clear_rqd;
 	}
 
+	rq_writes--;
+	if (rq_writes > 0)
+		goto next_rq;
+
 clear_rqd:
 	pblk_free_rqd_meta(pblk, &rqd);
 	return ret;
@@ -1020,7 +1094,7 @@ static void pblk_line_setup_metadata(struct pblk_line *line,
 	line->smeta = l_mg->sline_meta[meta_line];
 	line->emeta = l_mg->eline_meta[meta_line];
 
-	memset(line->smeta, 0, lm->smeta_len);
+	memset(line->smeta, 0, lm->smeta_len * lm->smeta_copies);
 	memset(line->emeta->buf, 0, lm->emeta_len[0]);
 
 	line->emeta->mem = 0;
@@ -1147,7 +1221,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	u64 off;
 	int bit = -1;
-	int emeta_secs;
+	int emeta_secs, smeta_secs;
 
 	line->sec_in_line = lm->sec_per_line;
 
@@ -1163,13 +1237,19 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
 	}
 
 	/* Mark smeta metadata sectors as bad sectors */
-	bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line);
-	off = bit * geo->ws_opt;
-	bitmap_set(line->map_bitmap, off, lm->smeta_sec);
-	line->sec_in_line -= lm->smeta_sec;
-	line->cur_sec = off + lm->smeta_sec;
+	smeta_secs = lm->smeta_sec * lm->smeta_copies;
+	bit = -1;
+	while (smeta_secs) {
+		bit = find_next_zero_bit(line->blk_bitmap, lm->blk_per_line,
+					bit + 1);
+		off = bit * geo->ws_opt;
+		bitmap_set(line->map_bitmap, off, geo->ws_opt);
+		line->cur_sec = off + geo->ws_opt;
+		smeta_secs -= lm->smeta_sec;
+	}
+	line->sec_in_line -= (lm->smeta_sec * lm->smeta_copies);
 
-	if (init && pblk_line_smeta_write(pblk, line, off)) {
+	if (init && pblk_line_smeta_write(pblk, line)) {
 		pblk_debug(pblk, "line smeta I/O failed. Retry\n");
 		return 0;
 	}
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index b351c7f..4f6d214 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -27,6 +27,11 @@ static unsigned int write_buffer_size;
 module_param(write_buffer_size, uint, 0644);
 MODULE_PARM_DESC(write_buffer_size, "number of entries in a write buffer");
 
+static unsigned int smeta_copies = 1;
+
+module_param(smeta_copies, int, 0644);
+MODULE_PARM_DESC(smeta_copies, "number of smeta copies");
+
 struct pblk_global_caches {
 	struct kmem_cache	*ws;
 	struct kmem_cache	*rec;
@@ -864,7 +869,8 @@ static int pblk_line_mg_init(struct pblk *pblk)
 	 * emeta depends on the number of LUNs allocated to the pblk instance
 	 */
 	for (i = 0; i < PBLK_DATA_LINES; i++) {
-		l_mg->sline_meta[i] = kmalloc(lm->smeta_len, GFP_KERNEL);
+		l_mg->sline_meta[i] = kmalloc(lm->smeta_len
+						* lm->smeta_copies, GFP_KERNEL);
 		if (!l_mg->sline_meta[i])
 			goto fail_free_smeta;
 	}
@@ -964,6 +970,12 @@ static int pblk_line_meta_init(struct pblk *pblk)
 	lm->mid_thrs = lm->sec_per_line / 2;
 	lm->high_thrs = lm->sec_per_line / 4;
 	lm->meta_distance = (geo->all_luns / 2) * pblk->min_write_pgs;
+	lm->smeta_copies = smeta_copies;
+
+	if (lm->smeta_copies < 1 || lm->smeta_copies > geo->all_luns) {
+		pblk_err(pblk, "unsupported smeta copies parameter\n");
+		return -EINVAL;
+	}
 
 	/* Calculate necessary pages for smeta. See comment over struct
 	 * line_smeta definition
@@ -995,10 +1007,11 @@ static int pblk_line_meta_init(struct pblk *pblk)
 
 	lm->emeta_bb = geo->all_luns > i ? geo->all_luns - i : 0;
 
-	lm->min_blk_line = 1;
-	if (geo->all_luns > 1)
-		lm->min_blk_line += DIV_ROUND_UP(lm->smeta_sec +
-					lm->emeta_sec[0], geo->clba);
+	lm->min_blk_line = lm->smeta_copies;
+	if (geo->all_luns > lm->smeta_copies) {
+		lm->min_blk_line += DIV_ROUND_UP((lm->smeta_sec
+			* lm->smeta_copies) + lm->emeta_sec[0], geo->clba);
+	}
 
 	if (lm->min_blk_line > lm->blk_per_line) {
 		pblk_err(pblk, "config. not supported. Min. LUN in line:%d\n",
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index a9085b0..9a992af 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -51,7 +51,8 @@ static int pblk_recov_l2p_from_emeta(struct pblk *pblk, struct pblk_line *line)
 	if (!lba_list)
 		return 1;
 
-	data_start = pblk_line_smeta_start(pblk, line) + lm->smeta_sec;
+	data_start = pblk_line_smeta_start(pblk, line)
+					+ (lm->smeta_sec * lm->smeta_copies);
 	data_end = line->emeta_ssec;
 	nr_valid_lbas = le64_to_cpu(emeta_buf->nr_valid_lbas);
 
@@ -134,7 +135,8 @@ static u64 pblk_sec_in_open_line(struct pblk *pblk, struct pblk_line *line)
 	if (lm->blk_per_line - nr_bb != valid_chunks)
 		pblk_err(pblk, "recovery line %d is bad\n", line->id);
 
-	pblk_update_line_wp(pblk, line, written_secs - lm->smeta_sec);
+	pblk_update_line_wp(pblk, line, written_secs -
+					(lm->smeta_sec * lm->smeta_copies));
 
 	return written_secs;
 }
@@ -377,12 +379,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 	void *data;
 	dma_addr_t dma_ppa_list, dma_meta_list;
 	__le64 *lba_list;
-	u64 paddr = pblk_line_smeta_start(pblk, line) + lm->smeta_sec;
+	u64 paddr = pblk_line_smeta_start(pblk, line) +
+					(lm->smeta_sec * lm->smeta_copies);
 	bool padded = false;
 	int rq_ppas, rq_len;
 	int i, j;
 	int ret;
-	u64 left_ppas = pblk_sec_in_open_line(pblk, line) - lm->smeta_sec;
+	u64 left_ppas = pblk_sec_in_open_line(pblk, line) -
+					(lm->smeta_sec * lm->smeta_copies);
 
 	if (pblk_line_wps_are_unbalanced(pblk, line))
 		pblk_warn(pblk, "recovering unbalanced line (%d)\n", line->id);
@@ -706,7 +710,7 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
 
 		line = &pblk->lines[i];
 
-		memset(smeta, 0, lm->smeta_len);
+		memset(smeta, 0, lm->smeta_len * lm->smeta_copies);
 		line->smeta = smeta;
 		line->lun_bitmap = ((void *)(smeta_buf)) +
 						sizeof(struct line_smeta);
diff --git a/drivers/lightnvm/pblk-rl.c b/drivers/lightnvm/pblk-rl.c
index a5f8bc2..c74ec73 100644
--- a/drivers/lightnvm/pblk-rl.c
+++ b/drivers/lightnvm/pblk-rl.c
@@ -218,7 +218,8 @@ void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold)
 	unsigned int rb_windows;
 
 	/* Consider sectors used for metadata */
-	sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
+	sec_meta = ((lm->smeta_sec * lm->smeta_copies)
+			+ lm->emeta_sec[0]) * l_mg->nr_free_lines;
 	blk_meta = DIV_ROUND_UP(sec_meta, geo->clba);
 
 	rl->high = pblk->op_blks - blk_meta - lm->blk_per_line;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index a678553..183bc99 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -548,6 +548,7 @@ struct pblk_line_mgmt {
 struct pblk_line_meta {
 	unsigned int smeta_len;		/* Total length for smeta */
 	unsigned int smeta_sec;		/* Sectors needed for smeta */
+	unsigned int smeta_copies;	/* Number of smeta copies */
 
 	unsigned int emeta_len[4];	/* Lengths for emeta:
 					 *  [0]: Total
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 5/7] lightnvm: pblk: use nvm_rq_to_ppa_list()
  2019-04-16 10:16 [PATCH v4 0/7] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (3 preceding siblings ...)
  2019-04-16 10:16 ` [PATCH v4 4/7] lightnvm: pblk: store multiple copies of smeta Igor Konopko
@ 2019-04-16 10:16 ` Igor Konopko
  2019-04-16 10:16 ` [PATCH v4 6/7] lightnvm: track inflight target creations Igor Konopko
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Igor Konopko @ 2019-04-16 10:16 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

This patch replaces few remaining usages of rqd->ppa_list[] with
existing nvm_rq_to_ppa_list() helpers. This is needed for theoretical
devices with ws_min/ws_opt equal to 1.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@javigon.com>
---
 drivers/lightnvm/pblk-core.c     | 26 ++++++++++++++------------
 drivers/lightnvm/pblk-recovery.c | 13 ++++++++-----
 2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index e6b9295..9cf894a 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -562,11 +562,9 @@ int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd)
 
 int pblk_submit_io_sync_sem(struct pblk *pblk, struct nvm_rq *rqd)
 {
-	struct ppa_addr *ppa_list;
+	struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
 	int ret;
 
-	ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
-
 	pblk_down_chunk(pblk, ppa_list[0]);
 	ret = pblk_submit_io_sync(pblk, rqd);
 	pblk_up_chunk(pblk, ppa_list[0]);
@@ -727,6 +725,7 @@ static int pblk_line_smeta_read_copy(struct pblk *pblk,
 	struct nvm_geo *geo = &dev->geo;
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct bio *bio;
+	struct ppa_addr *ppa_list;
 	struct nvm_rq rqd;
 	int i, ret;
 
@@ -749,6 +748,7 @@ static int pblk_line_smeta_read_copy(struct pblk *pblk,
 	rqd.opcode = NVM_OP_PREAD;
 	rqd.nr_ppas = lm->smeta_sec;
 	rqd.is_seq = 1;
+	ppa_list = nvm_rq_to_ppa_list(&rqd);
 
 	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
 		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line->id);
@@ -760,7 +760,7 @@ static int pblk_line_smeta_read_copy(struct pblk *pblk,
 			pos = pblk_ppa_to_pos(geo, ppa);
 		}
 
-		rqd.ppa_list[i] = ppa;
+		ppa_list[i] = ppa;
 		pblk_get_meta(pblk, rqd.meta_list, i)->lba =
 				  cpu_to_le64(ADDR_EMPTY);
 	}
@@ -811,6 +811,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line)
 	struct nvm_geo *geo = &dev->geo;
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct bio *bio;
+	struct ppa_addr *ppa_list;
 	struct nvm_rq rqd;
 	__le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
 	__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
@@ -862,6 +863,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line)
 	rqd.opcode = NVM_OP_PWRITE;
 	rqd.nr_ppas = smeta_wr_sec;
 	rqd.is_seq = 1;
+	ppa_list = nvm_rq_to_ppa_list(&rqd);
 
 	for (i = 0; i < rqd.nr_ppas; i++, paddr++) {
 		void *meta_list = rqd.meta_list;
@@ -874,7 +876,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line)
 			pos = pblk_ppa_to_pos(geo, ppa);
 		}
 
-		rqd.ppa_list[i] = ppa;
+		ppa_list[i] = ppa;
 		pblk_get_meta(pblk, meta_list, i)->lba = addr_empty;
 		lba_list[paddr] = addr_empty;
 	}
@@ -910,8 +912,9 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 	struct nvm_geo *geo = &dev->geo;
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct pblk_line_meta *lm = &pblk->lm;
-	void *ppa_list, *meta_list;
+	void *ppa_list_buf, *meta_list;
 	struct bio *bio;
+	struct ppa_addr *ppa_list;
 	struct nvm_rq rqd;
 	u64 paddr = line->emeta_ssec;
 	dma_addr_t dma_ppa_list, dma_meta_list;
@@ -927,7 +930,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 	if (!meta_list)
 		return -ENOMEM;
 
-	ppa_list = meta_list + pblk_dma_meta_size(pblk);
+	ppa_list_buf = meta_list + pblk_dma_meta_size(pblk);
 	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
 next_rq:
@@ -948,11 +951,12 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 
 	rqd.bio = bio;
 	rqd.meta_list = meta_list;
-	rqd.ppa_list = ppa_list;
+	rqd.ppa_list = ppa_list_buf;
 	rqd.dma_meta_list = dma_meta_list;
 	rqd.dma_ppa_list = dma_ppa_list;
 	rqd.opcode = NVM_OP_PREAD;
 	rqd.nr_ppas = rq_ppas;
+	ppa_list = nvm_rq_to_ppa_list(&rqd);
 
 	for (i = 0; i < rqd.nr_ppas; ) {
 		struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line_id);
@@ -980,7 +984,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
 		}
 
 		for (j = 0; j < min; j++, i++, paddr++)
-			rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line_id);
+			ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line_id);
 	}
 
 	ret = pblk_submit_io_sync(pblk, &rqd);
@@ -1605,11 +1609,9 @@ void pblk_ppa_to_line_put(struct pblk *pblk, struct ppa_addr ppa)
 
 void pblk_rq_to_line_put(struct pblk *pblk, struct nvm_rq *rqd)
 {
-	struct ppa_addr *ppa_list;
+	struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
 	int i;
 
-	ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
-
 	for (i = 0; i < rqd->nr_ppas; i++)
 		pblk_ppa_to_line_put(pblk, ppa_list[i]);
 }
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 9a992af..2fca21e 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -181,6 +181,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	struct pblk_pad_rq *pad_rq;
 	struct nvm_rq *rqd;
 	struct bio *bio;
+	struct ppa_addr *ppa_list;
 	void *data;
 	__le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
 	u64 w_ptr = line->cur_sec;
@@ -241,6 +242,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 	rqd->end_io = pblk_end_io_recov;
 	rqd->private = pad_rq;
 
+	ppa_list = nvm_rq_to_ppa_list(rqd);
 	meta_list = rqd->meta_list;
 
 	for (i = 0; i < rqd->nr_ppas; ) {
@@ -268,17 +270,17 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
 			lba_list[w_ptr] = addr_empty;
 			meta = pblk_get_meta(pblk, meta_list, i);
 			meta->lba = addr_empty;
-			rqd->ppa_list[i] = dev_ppa;
+			ppa_list[i] = dev_ppa;
 		}
 	}
 
 	kref_get(&pad_rq->ref);
-	pblk_down_chunk(pblk, rqd->ppa_list[0]);
+	pblk_down_chunk(pblk, ppa_list[0]);
 
 	ret = pblk_submit_io(pblk, rqd);
 	if (ret) {
 		pblk_err(pblk, "I/O submission failed: %d\n", ret);
-		pblk_up_chunk(pblk, rqd->ppa_list[0]);
+		pblk_up_chunk(pblk, ppa_list[0]);
 		kref_put(&pad_rq->ref, pblk_recov_complete);
 		pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
 		bio_put(bio);
@@ -424,6 +426,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 	rqd->ppa_list = ppa_list;
 	rqd->dma_ppa_list = dma_ppa_list;
 	rqd->dma_meta_list = dma_meta_list;
+	ppa_list = nvm_rq_to_ppa_list(rqd);
 
 	if (pblk_io_aligned(pblk, rq_ppas))
 		rqd->is_seq = 1;
@@ -442,7 +445,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 		}
 
 		for (j = 0; j < pblk->min_write_pgs; j++, i++)
-			rqd->ppa_list[i] =
+			ppa_list[i] =
 				addr_to_gen_ppa(pblk, paddr + j, line->id);
 	}
 
@@ -490,7 +493,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 			continue;
 
 		line->nr_valid_lbas++;
-		pblk_update_map(pblk, lba, rqd->ppa_list[i]);
+		pblk_update_map(pblk, lba, ppa_list[i]);
 	}
 
 	left_ppas -= rq_ppas;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 6/7] lightnvm: track inflight target creations
  2019-04-16 10:16 [PATCH v4 0/7] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (4 preceding siblings ...)
  2019-04-16 10:16 ` [PATCH v4 5/7] lightnvm: pblk: use nvm_rq_to_ppa_list() Igor Konopko
@ 2019-04-16 10:16 ` Igor Konopko
  2019-04-23  7:12   ` Javier González
  2019-04-16 10:16 ` [PATCH v4 7/7] lightnvm: do not remove instance under global lock Igor Konopko
  2019-04-26 12:34 ` [PATCH v4 0/7] lightnvm: next set of improvements for 5.2 Matias Bjørling
  7 siblings, 1 reply; 13+ messages in thread
From: Igor Konopko @ 2019-04-16 10:16 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

When creation process is still in progress, target is not yet on
targets list. This causes a chance for removing whole lightnvm
subsystem by calling nvm_unregister() in the meantime and finally by
causing kernel panic inside target init function.

This patch changes the behaviour by adding kref variable which tracks
all the users of nvm_dev structure. When nvm_dev is allocated, kref
value is set to 1. Then before every target creation the value is
increased and decreased after target removal. The extra reference
is decreased when nvm subsystem is unregistered.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/core.c  | 41 +++++++++++++++++++++++++++++++----------
 include/linux/lightnvm.h |  1 +
 2 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index e2abe88..0e9f7996 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -45,6 +45,8 @@ struct nvm_dev_map {
 	int num_ch;
 };
 
+static void nvm_free(struct kref *ref);
+
 static struct nvm_target *nvm_find_target(struct nvm_dev *dev, const char *name)
 {
 	struct nvm_target *tgt;
@@ -501,6 +503,7 @@ static int nvm_remove_tgt(struct nvm_dev *dev, struct nvm_ioctl_remove *remove)
 	}
 	__nvm_remove_target(t, true);
 	mutex_unlock(&dev->mlock);
+	kref_put(&dev->ref, nvm_free);
 
 	return 0;
 }
@@ -1094,15 +1097,16 @@ static int nvm_core_init(struct nvm_dev *dev)
 	return ret;
 }
 
-static void nvm_free(struct nvm_dev *dev)
+static void nvm_free(struct kref *ref)
 {
-	if (!dev)
-		return;
+	struct nvm_dev *dev = container_of(ref, struct nvm_dev, ref);
 
 	if (dev->dma_pool)
 		dev->ops->destroy_dma_pool(dev->dma_pool);
 
-	nvm_unregister_map(dev);
+	if (dev->rmap)
+		nvm_unregister_map(dev);
+
 	kfree(dev->lun_map);
 	kfree(dev);
 }
@@ -1139,7 +1143,13 @@ static int nvm_init(struct nvm_dev *dev)
 
 struct nvm_dev *nvm_alloc_dev(int node)
 {
-	return kzalloc_node(sizeof(struct nvm_dev), GFP_KERNEL, node);
+	struct nvm_dev *dev;
+
+	dev = kzalloc_node(sizeof(struct nvm_dev), GFP_KERNEL, node);
+	if (dev)
+		kref_init(&dev->ref);
+
+	return dev;
 }
 EXPORT_SYMBOL(nvm_alloc_dev);
 
@@ -1147,12 +1157,16 @@ int nvm_register(struct nvm_dev *dev)
 {
 	int ret, exp_pool_size;
 
-	if (!dev->q || !dev->ops)
+	if (!dev->q || !dev->ops) {
+		kref_put(&dev->ref, nvm_free);
 		return -EINVAL;
+	}
 
 	ret = nvm_init(dev);
-	if (ret)
+	if (ret) {
+		kref_put(&dev->ref, nvm_free);
 		return ret;
+	}
 
 	exp_pool_size = max_t(int, PAGE_SIZE,
 			      (NVM_MAX_VLBA * (sizeof(u64) + dev->geo.sos)));
@@ -1162,7 +1176,7 @@ int nvm_register(struct nvm_dev *dev)
 						  exp_pool_size);
 	if (!dev->dma_pool) {
 		pr_err("nvm: could not create dma pool\n");
-		nvm_free(dev);
+		kref_put(&dev->ref, nvm_free);
 		return -ENOMEM;
 	}
 
@@ -1184,6 +1198,7 @@ void nvm_unregister(struct nvm_dev *dev)
 		if (t->dev->parent != dev)
 			continue;
 		__nvm_remove_target(t, false);
+		kref_put(&dev->ref, nvm_free);
 	}
 	mutex_unlock(&dev->mlock);
 
@@ -1191,13 +1206,14 @@ void nvm_unregister(struct nvm_dev *dev)
 	list_del(&dev->devices);
 	up_write(&nvm_lock);
 
-	nvm_free(dev);
+	kref_put(&dev->ref, nvm_free);
 }
 EXPORT_SYMBOL(nvm_unregister);
 
 static int __nvm_configure_create(struct nvm_ioctl_create *create)
 {
 	struct nvm_dev *dev;
+	int ret;
 
 	down_write(&nvm_lock);
 	dev = nvm_find_nvm_dev(create->dev);
@@ -1208,7 +1224,12 @@ static int __nvm_configure_create(struct nvm_ioctl_create *create)
 		return -EINVAL;
 	}
 
-	return nvm_create_tgt(dev, create);
+	kref_get(&dev->ref);
+	ret = nvm_create_tgt(dev, create);
+	if (ret)
+		kref_put(&dev->ref, nvm_free);
+
+	return ret;
 }
 
 static long nvm_ioctl_info(struct file *file, void __user *arg)
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index d3b0270..4d0d565 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -428,6 +428,7 @@ struct nvm_dev {
 	char name[DISK_NAME_LEN];
 	void *private_data;
 
+	struct kref ref;
 	void *rmap;
 
 	struct mutex mlock;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 7/7] lightnvm: do not remove instance under global lock
  2019-04-16 10:16 [PATCH v4 0/7] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (5 preceding siblings ...)
  2019-04-16 10:16 ` [PATCH v4 6/7] lightnvm: track inflight target creations Igor Konopko
@ 2019-04-16 10:16 ` Igor Konopko
  2019-04-23  7:13   ` Javier González
  2019-04-26 12:34 ` [PATCH v4 0/7] lightnvm: next set of improvements for 5.2 Matias Bjørling
  7 siblings, 1 reply; 13+ messages in thread
From: Igor Konopko @ 2019-04-16 10:16 UTC (permalink / raw)
  To: mb, javier, hans.holmberg; +Cc: linux-block, igor.j.konopko

Currently all the target instances are removed under global nvm_lock.
This was needed to ensure that nvm_dev struct will not be freed by
hot unplug event during target removal. However, current implementation
has some drawbacks, since the same lock is used when new nvme subsystem
is registered, so we can have a situation, that due to long process of
target removal on drive A, registration (and listing in OS) of the
drive B will take a lot of time, since it will wait for that lock.

Now when we have kref which ensures that nvm_dev will not be freed in
the meantime, we can easily get rid of this lock for a time when we are
removing nvm targets.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/core.c | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 0e9f7996..0df7454 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -483,7 +483,6 @@ static void __nvm_remove_target(struct nvm_target *t, bool graceful)
 
 /**
  * nvm_remove_tgt - Removes a target from the media manager
- * @dev:	device
  * @remove:	ioctl structure with target name to remove.
  *
  * Returns:
@@ -491,18 +490,27 @@ static void __nvm_remove_target(struct nvm_target *t, bool graceful)
  * 1: on not found
  * <0: on error
  */
-static int nvm_remove_tgt(struct nvm_dev *dev, struct nvm_ioctl_remove *remove)
+static int nvm_remove_tgt(struct nvm_ioctl_remove *remove)
 {
 	struct nvm_target *t;
+	struct nvm_dev *dev;
 
-	mutex_lock(&dev->mlock);
-	t = nvm_find_target(dev, remove->tgtname);
-	if (!t) {
+	down_read(&nvm_lock);
+	list_for_each_entry(dev, &nvm_devices, devices) {
+		mutex_lock(&dev->mlock);
+		t = nvm_find_target(dev, remove->tgtname);
+		if (t) {
+			mutex_unlock(&dev->mlock);
+			break;
+		}
 		mutex_unlock(&dev->mlock);
-		return 1;
 	}
+	up_read(&nvm_lock);
+
+	if (!t)
+		return 1;
+
 	__nvm_remove_target(t, true);
-	mutex_unlock(&dev->mlock);
 	kref_put(&dev->ref, nvm_free);
 
 	return 0;
@@ -1348,8 +1356,6 @@ static long nvm_ioctl_dev_create(struct file *file, void __user *arg)
 static long nvm_ioctl_dev_remove(struct file *file, void __user *arg)
 {
 	struct nvm_ioctl_remove remove;
-	struct nvm_dev *dev;
-	int ret = 0;
 
 	if (copy_from_user(&remove, arg, sizeof(struct nvm_ioctl_remove)))
 		return -EFAULT;
@@ -1361,15 +1367,7 @@ static long nvm_ioctl_dev_remove(struct file *file, void __user *arg)
 		return -EINVAL;
 	}
 
-	down_read(&nvm_lock);
-	list_for_each_entry(dev, &nvm_devices, devices) {
-		ret = nvm_remove_tgt(dev, &remove);
-		if (!ret)
-			break;
-	}
-	up_read(&nvm_lock);
-
-	return ret;
+	return nvm_remove_tgt(&remove);
 }
 
 /* kept for compatibility reasons */
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 2/7] lightnvm: pblk: simplify partial read path
  2019-04-16 10:16 ` [PATCH v4 2/7] lightnvm: pblk: simplify partial read path Igor Konopko
@ 2019-04-17 17:11   ` Heiner Litz
  2019-04-23 13:51     ` Igor Konopko
  0 siblings, 1 reply; 13+ messages in thread
From: Heiner Litz @ 2019-04-17 17:11 UTC (permalink / raw)
  To: Igor Konopko
  Cc: Matias Bjørling, Javier González, Hans Holmberg, linux-block

Hi Igor,
thank you for doing this. For the most part, this looks great. Some comments:

1. When performing cached reads, you can bio_advance all sectors at
once. So you try reading all from the cache and only if successful,
you bio_advance all the cached-read sectors. This means you can always
retry and do not have to goto end.
2. should we set
 split->bi_rw |= REQ_NOMERGE
as in blk_queue_split?
3. Did you test rqds where sequences of cached and non-cached segments
alternate?

Heiner

On Tue, Apr 16, 2019 at 3:19 AM Igor Konopko <igor.j.konopko@intel.com> wrote:
>
> This patch changes the approach to handling partial read path.
>
> In old approach merging of data from round buffer and drive was fully
> made by drive. This had some disadvantages - code was complex and
> relies on bio internals, so it was hard to maintain and was strongly
> dependent on bio changes.
>
> In new approach most of the handling is done mostly by block layer
> functions such as bio_split(), bio_chain() and generic_make request()
> and generally is less complex and easier to maintain. Below some more
> details of the new approach.
>
> When read bio arrives, it is cloned for pblk internal purposes. All
> the L2P mapping, which includes copying data from round buffer to bio
> and thus bio_advance() calls is done on the cloned bio, so the original
> bio is untouched. If we found that we have partial read case, we
> still have original bio untouched, so we can split it and continue to
> process only first part of it in current context, when the rest will be
> called as separate bio request which is passed to generic_make_request()
> for further processing.
>
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>  drivers/lightnvm/pblk-core.c |  13 +-
>  drivers/lightnvm/pblk-rb.c   |  11 +-
>  drivers/lightnvm/pblk-read.c | 333 +++++++++++--------------------------------
>  drivers/lightnvm/pblk.h      |  18 +--
>  4 files changed, 100 insertions(+), 275 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 73be3a0..07270ba 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -2147,8 +2147,8 @@ void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
>         spin_unlock(&pblk->trans_lock);
>  }
>
> -void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
> -                        sector_t blba, int nr_secs)
> +int pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
> +                        sector_t blba, int nr_secs, bool *from_cache)
>  {
>         int i;
>
> @@ -2162,10 +2162,19 @@ void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
>                 if (!pblk_ppa_empty(ppa) && !pblk_addr_in_cache(ppa)) {
>                         struct pblk_line *line = pblk_ppa_to_line(pblk, ppa);
>
> +                       if (i > 0 && *from_cache)
> +                               break;
> +                       *from_cache = false;
> +
>                         kref_get(&line->ref);
> +               } else {
> +                       if (i > 0 && !*from_cache)
> +                               break;
> +                       *from_cache = true;
>                 }
>         }
>         spin_unlock(&pblk->trans_lock);
> +       return i;
>  }
>
>  void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
> index 3555014..5abb170 100644
> --- a/drivers/lightnvm/pblk-rb.c
> +++ b/drivers/lightnvm/pblk-rb.c
> @@ -642,7 +642,7 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
>   * be directed to disk.
>   */
>  int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
> -                       struct ppa_addr ppa, int bio_iter, bool advanced_bio)
> +                       struct ppa_addr ppa)
>  {
>         struct pblk *pblk = container_of(rb, struct pblk, rwb);
>         struct pblk_rb_entry *entry;
> @@ -673,15 +673,6 @@ int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
>                 ret = 0;
>                 goto out;
>         }
> -
> -       /* Only advance the bio if it hasn't been advanced already. If advanced,
> -        * this bio is at least a partial bio (i.e., it has partially been
> -        * filled with data from the cache). If part of the data resides on the
> -        * media, we will read later on
> -        */
> -       if (unlikely(!advanced_bio))
> -               bio_advance(bio, bio_iter * PBLK_EXPOSED_PAGE_SIZE);
> -
>         data = bio_data(bio);
>         memcpy(data, entry->data, rb->seg_size);
>
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 0953c34..d98ea39 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -26,8 +26,7 @@
>   * issued.
>   */
>  static int pblk_read_from_cache(struct pblk *pblk, struct bio *bio,
> -                               sector_t lba, struct ppa_addr ppa,
> -                               int bio_iter, bool advanced_bio)
> +                               sector_t lba, struct ppa_addr ppa)
>  {
>  #ifdef CONFIG_NVM_PBLK_DEBUG
>         /* Callers must ensure that the ppa points to a cache address */
> @@ -35,73 +34,75 @@ static int pblk_read_from_cache(struct pblk *pblk, struct bio *bio,
>         BUG_ON(!pblk_addr_in_cache(ppa));
>  #endif
>
> -       return pblk_rb_copy_to_bio(&pblk->rwb, bio, lba, ppa,
> -                                               bio_iter, advanced_bio);
> +       return pblk_rb_copy_to_bio(&pblk->rwb, bio, lba, ppa);
>  }
>
> -static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
> +static int pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>                                  struct bio *bio, sector_t blba,
> -                                unsigned long *read_bitmap)
> +                                bool *from_cache)
>  {
>         void *meta_list = rqd->meta_list;
> -       struct ppa_addr ppas[NVM_MAX_VLBA];
> -       int nr_secs = rqd->nr_ppas;
> -       bool advanced_bio = false;
> -       int i, j = 0;
> +       int nr_secs, i;
>
> -       pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
> +retry:
> +       nr_secs = pblk_lookup_l2p_seq(pblk, rqd->ppa_list, blba, rqd->nr_ppas,
> +                                       from_cache);
> +
> +       if (!*from_cache)
> +               goto end;
>
>         for (i = 0; i < nr_secs; i++) {
> -               struct ppa_addr p = ppas[i];
>                 struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
>                 sector_t lba = blba + i;
>
> -retry:
> -               if (pblk_ppa_empty(p)) {
> +               if (pblk_ppa_empty(rqd->ppa_list[i])) {
>                         __le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
>
> -                       WARN_ON(test_and_set_bit(i, read_bitmap));
>                         meta->lba = addr_empty;
> -
> -                       if (unlikely(!advanced_bio)) {
> -                               bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
> -                               advanced_bio = true;
> +               } else if (pblk_addr_in_cache(rqd->ppa_list[i])) {
> +                       /*
> +                        * Try to read from write buffer. The address is later
> +                        * checked on the write buffer to prevent retrieving
> +                        * overwritten data.
> +                        */
> +                       if (!pblk_read_from_cache(pblk, bio, lba,
> +                                                       rqd->ppa_list[i])) {
> +                               if (i == 0) {
> +                                       /*
> +                                        * We didn't call with bio_advance()
> +                                        * yet, so we can just retry.
> +                                        */
> +                                       goto retry;
> +                               } else {
> +                                       /*
> +                                        * We already call bio_advance()
> +                                        * so we cannot retry and we need
> +                                        * to quit that function in order
> +                                        * to allow caller to handle the bio
> +                                        * splitting in the current sector
> +                                        * position.
> +                                        */
> +                                       nr_secs = i;
> +                                       goto end;
> +                               }
>                         }
> -
> -                       goto next;
> -               }
> -
> -               /* Try to read from write buffer. The address is later checked
> -                * on the write buffer to prevent retrieving overwritten data.
> -                */
> -               if (pblk_addr_in_cache(p)) {
> -                       if (!pblk_read_from_cache(pblk, bio, lba, p, i,
> -                                                               advanced_bio)) {
> -                               pblk_lookup_l2p_seq(pblk, &p, lba, 1);
> -                               goto retry;
> -                       }
> -                       WARN_ON(test_and_set_bit(i, read_bitmap));
>                         meta->lba = cpu_to_le64(lba);
> -                       advanced_bio = true;
>  #ifdef CONFIG_NVM_PBLK_DEBUG
>                         atomic_long_inc(&pblk->cache_reads);
>  #endif
> -               } else {
> -                       /* Read from media non-cached sectors */
> -                       rqd->ppa_list[j++] = p;
>                 }
> -
> -next:
> -               if (advanced_bio)
> -                       bio_advance(bio, PBLK_EXPOSED_PAGE_SIZE);
> +               bio_advance(bio, PBLK_EXPOSED_PAGE_SIZE);
>         }
>
> +end:
>         if (pblk_io_aligned(pblk, nr_secs))
>                 rqd->is_seq = 1;
>
>  #ifdef CONFIG_NVM_PBLK_DEBUG
>         atomic_long_add(nr_secs, &pblk->inflight_reads);
>  #endif
> +
> +       return nr_secs;
>  }
>
>
> @@ -197,9 +198,7 @@ static void __pblk_end_io_read(struct pblk *pblk, struct nvm_rq *rqd,
>                 pblk_log_read_err(pblk, rqd);
>
>         pblk_read_check_seq(pblk, rqd, r_ctx->lba);
> -
> -       if (int_bio)
> -               bio_put(int_bio);
> +       bio_put(int_bio);
>
>         if (put_line)
>                 pblk_rq_to_line_put(pblk, rqd);
> @@ -223,177 +222,13 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>         __pblk_end_io_read(pblk, rqd, true);
>  }
>
> -static void pblk_end_partial_read(struct nvm_rq *rqd)
> -{
> -       struct pblk *pblk = rqd->private;
> -       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
> -       struct pblk_pr_ctx *pr_ctx = r_ctx->private;
> -       struct pblk_sec_meta *meta;
> -       struct bio *new_bio = rqd->bio;
> -       struct bio *bio = pr_ctx->orig_bio;
> -       struct bio_vec src_bv, dst_bv;
> -       void *meta_list = rqd->meta_list;
> -       int bio_init_idx = pr_ctx->bio_init_idx;
> -       unsigned long *read_bitmap = pr_ctx->bitmap;
> -       int nr_secs = pr_ctx->orig_nr_secs;
> -       int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
> -       void *src_p, *dst_p;
> -       int hole, i;
> -
> -       if (unlikely(nr_holes == 1)) {
> -               struct ppa_addr ppa;
> -
> -               ppa = rqd->ppa_addr;
> -               rqd->ppa_list = pr_ctx->ppa_ptr;
> -               rqd->dma_ppa_list = pr_ctx->dma_ppa_list;
> -               rqd->ppa_list[0] = ppa;
> -       }
> -
> -       for (i = 0; i < nr_secs; i++) {
> -               meta = pblk_get_meta(pblk, meta_list, i);
> -               pr_ctx->lba_list_media[i] = le64_to_cpu(meta->lba);
> -               meta->lba = cpu_to_le64(pr_ctx->lba_list_mem[i]);
> -       }
> -
> -       /* Fill the holes in the original bio */
> -       i = 0;
> -       hole = find_first_zero_bit(read_bitmap, nr_secs);
> -       do {
> -               struct pblk_line *line;
> -
> -               line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
> -               kref_put(&line->ref, pblk_line_put);
> -
> -               meta = pblk_get_meta(pblk, meta_list, hole);
> -               meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]);
> -
> -               src_bv = new_bio->bi_io_vec[i++];
> -               dst_bv = bio->bi_io_vec[bio_init_idx + hole];
> -
> -               src_p = kmap_atomic(src_bv.bv_page);
> -               dst_p = kmap_atomic(dst_bv.bv_page);
> -
> -               memcpy(dst_p + dst_bv.bv_offset,
> -                       src_p + src_bv.bv_offset,
> -                       PBLK_EXPOSED_PAGE_SIZE);
> -
> -               kunmap_atomic(src_p);
> -               kunmap_atomic(dst_p);
> -
> -               mempool_free(src_bv.bv_page, &pblk->page_bio_pool);
> -
> -               hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1);
> -       } while (hole < nr_secs);
> -
> -       bio_put(new_bio);
> -       kfree(pr_ctx);
> -
> -       /* restore original request */
> -       rqd->bio = NULL;
> -       rqd->nr_ppas = nr_secs;
> -
> -       pblk_end_user_read(bio, rqd->error);
> -       __pblk_end_io_read(pblk, rqd, false);
> -}
> -
> -static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
> -                           unsigned int bio_init_idx,
> -                           unsigned long *read_bitmap,
> -                           int nr_holes)
> -{
> -       void *meta_list = rqd->meta_list;
> -       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
> -       struct pblk_pr_ctx *pr_ctx;
> -       struct bio *new_bio, *bio = r_ctx->private;
> -       int nr_secs = rqd->nr_ppas;
> -       int i;
> -
> -       new_bio = bio_alloc(GFP_KERNEL, nr_holes);
> -
> -       if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
> -               goto fail_bio_put;
> -
> -       if (nr_holes != new_bio->bi_vcnt) {
> -               WARN_ONCE(1, "pblk: malformed bio\n");
> -               goto fail_free_pages;
> -       }
> -
> -       pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
> -       if (!pr_ctx)
> -               goto fail_free_pages;
> -
> -       for (i = 0; i < nr_secs; i++) {
> -               struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
> -
> -               pr_ctx->lba_list_mem[i] = le64_to_cpu(meta->lba);
> -       }
> -
> -       new_bio->bi_iter.bi_sector = 0; /* internal bio */
> -       bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
> -
> -       rqd->bio = new_bio;
> -       rqd->nr_ppas = nr_holes;
> -
> -       pr_ctx->orig_bio = bio;
> -       bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
> -       pr_ctx->bio_init_idx = bio_init_idx;
> -       pr_ctx->orig_nr_secs = nr_secs;
> -       r_ctx->private = pr_ctx;
> -
> -       if (unlikely(nr_holes == 1)) {
> -               pr_ctx->ppa_ptr = rqd->ppa_list;
> -               pr_ctx->dma_ppa_list = rqd->dma_ppa_list;
> -               rqd->ppa_addr = rqd->ppa_list[0];
> -       }
> -       return 0;
> -
> -fail_free_pages:
> -       pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
> -fail_bio_put:
> -       bio_put(new_bio);
> -
> -       return -ENOMEM;
> -}
> -
> -static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
> -                                unsigned int bio_init_idx,
> -                                unsigned long *read_bitmap, int nr_secs)
> -{
> -       int nr_holes;
> -       int ret;
> -
> -       nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
> -
> -       if (pblk_setup_partial_read(pblk, rqd, bio_init_idx, read_bitmap,
> -                                   nr_holes))
> -               return NVM_IO_ERR;
> -
> -       rqd->end_io = pblk_end_partial_read;
> -
> -       ret = pblk_submit_io(pblk, rqd);
> -       if (ret) {
> -               bio_put(rqd->bio);
> -               pblk_err(pblk, "partial read IO submission failed\n");
> -               goto err;
> -       }
> -
> -       return NVM_IO_OK;
> -
> -err:
> -       pblk_err(pblk, "failed to perform partial read\n");
> -
> -       /* Free allocated pages in new bio */
> -       pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt);
> -       return NVM_IO_ERR;
> -}
> -
>  static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
> -                        sector_t lba, unsigned long *read_bitmap)
> +                        sector_t lba, bool *from_cache)
>  {
>         struct pblk_sec_meta *meta = pblk_get_meta(pblk, rqd->meta_list, 0);
>         struct ppa_addr ppa;
>
> -       pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
> +       pblk_lookup_l2p_seq(pblk, &ppa, lba, 1, from_cache);
>
>  #ifdef CONFIG_NVM_PBLK_DEBUG
>         atomic_long_inc(&pblk->inflight_reads);
> @@ -403,7 +238,6 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>         if (pblk_ppa_empty(ppa)) {
>                 __le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
>
> -               WARN_ON(test_and_set_bit(0, read_bitmap));
>                 meta->lba = addr_empty;
>                 return;
>         }
> @@ -412,12 +246,11 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>          * write buffer to prevent retrieving overwritten data.
>          */
>         if (pblk_addr_in_cache(ppa)) {
> -               if (!pblk_read_from_cache(pblk, bio, lba, ppa, 0, 1)) {
> -                       pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
> +               if (!pblk_read_from_cache(pblk, bio, lba, ppa)) {
> +                       pblk_lookup_l2p_seq(pblk, &ppa, lba, 1, from_cache);
>                         goto retry;
>                 }
>
> -               WARN_ON(test_and_set_bit(0, read_bitmap));
>                 meta->lba = cpu_to_le64(lba);
>
>  #ifdef CONFIG_NVM_PBLK_DEBUG
> @@ -434,17 +267,14 @@ void pblk_submit_read(struct pblk *pblk, struct bio *bio)
>         struct request_queue *q = dev->q;
>         sector_t blba = pblk_get_lba(bio);
>         unsigned int nr_secs = pblk_get_secs(bio);
> +       bool from_cache;
>         struct pblk_g_ctx *r_ctx;
>         struct nvm_rq *rqd;
> -       struct bio *int_bio;
> -       unsigned int bio_init_idx;
> -       DECLARE_BITMAP(read_bitmap, NVM_MAX_VLBA);
> +       struct bio *int_bio, *split_bio;
>
>         generic_start_io_acct(q, REQ_OP_READ, bio_sectors(bio),
>                               &pblk->disk->part0);
>
> -       bitmap_zero(read_bitmap, nr_secs);
> -
>         rqd = pblk_alloc_rqd(pblk, PBLK_READ);
>
>         rqd->opcode = NVM_OP_PREAD;
> @@ -456,11 +286,6 @@ void pblk_submit_read(struct pblk *pblk, struct bio *bio)
>         r_ctx->start_time = jiffies;
>         r_ctx->lba = blba;
>
> -       /* Save the index for this bio's start. This is needed in case
> -        * we need to fill a partial read.
> -        */
> -       bio_init_idx = pblk_get_bi_idx(bio);
> -
>         if (pblk_alloc_rqd_meta(pblk, rqd)) {
>                 bio_io_error(bio);
>                 pblk_free_rqd(pblk, rqd, PBLK_READ);
> @@ -469,46 +294,58 @@ void pblk_submit_read(struct pblk *pblk, struct bio *bio)
>
>         /* Clone read bio to deal internally with:
>          * -read errors when reading from drive
> -        * -bio_advance() calls during l2p lookup and cache reads
> +        * -bio_advance() calls during cache reads
>          */
>         int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
>
>         if (nr_secs > 1)
> -               pblk_read_ppalist_rq(pblk, rqd, bio, blba, read_bitmap);
> +               nr_secs = pblk_read_ppalist_rq(pblk, rqd, int_bio, blba,
> +                                               &from_cache);
>         else
> -               pblk_read_rq(pblk, rqd, bio, blba, read_bitmap);
> +               pblk_read_rq(pblk, rqd, int_bio, blba, &from_cache);
>
> +split_retry:
>         r_ctx->private = bio; /* original bio */
>         rqd->bio = int_bio; /* internal bio */
>
> -       if (bitmap_full(read_bitmap, nr_secs)) {
> +       if (from_cache && nr_secs == rqd->nr_ppas) {
> +               /* All data was read from cache, we can complete the IO. */
>                 pblk_end_user_read(bio, 0);
>                 atomic_inc(&pblk->inflight_io);
>                 __pblk_end_io_read(pblk, rqd, false);
> -               return;
> -       }
> -
> -       if (!bitmap_empty(read_bitmap, rqd->nr_ppas)) {
> +       } else if (nr_secs != rqd->nr_ppas) {
>                 /* The read bio request could be partially filled by the write
>                  * buffer, but there are some holes that need to be read from
> -                * the drive.
> +                * the drive. In order to handle this, we will use block layer
> +                * mechanism to split this request in to smaller ones and make
> +                * a chain of it.
>                  */
> -               bio_put(int_bio);
> -               rqd->bio = NULL;
> -               if (pblk_partial_read_bio(pblk, rqd, bio_init_idx, read_bitmap,
> -                                           nr_secs)) {
> -                       pblk_err(pblk, "read IO submission failed\n");
> -                       bio_io_error(bio);
> -                       __pblk_end_io_read(pblk, rqd, false);
> -               }
> -               return;
> -       }
> +               split_bio = bio_split(bio, nr_secs * NR_PHY_IN_LOG, GFP_KERNEL,
> +                                       &pblk_bio_set);
> +               bio_chain(split_bio, bio);
> +               generic_make_request(bio);
> +
> +               /* New bio contains first N sectors of the previous one, so
> +                * we can continue to use existing rqd, but we need to shrink
> +                * the number of PPAs in it. New bio is also guaranteed that
> +                * it contains only either data from cache or from drive, newer
> +                * mix of them.
> +                */
> +               bio = split_bio;
> +               rqd->nr_ppas = nr_secs;
> +               if (rqd->nr_ppas == 1)
> +                       rqd->ppa_addr = rqd->ppa_list[0];
>
> -       /* All sectors are to be read from the device */
> -       if (pblk_submit_io(pblk, rqd)) {
> -               pblk_err(pblk, "read IO submission failed\n");
> -               bio_io_error(bio);
> -               __pblk_end_io_read(pblk, rqd, false);
> +               /* Recreate int_bio - existing might have some needed internal
> +                * fields modified already.
> +                */
> +               bio_put(int_bio);
> +               int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
> +               goto split_retry;
> +       } else if (pblk_submit_io(pblk, rqd)) {
> +               /* Submitting IO to drive failed, let's report an error */
> +               rqd->error = -ENODEV;
> +               pblk_end_io_read(rqd);
>         }
>  }
>
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 17ced12..a678553 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -121,18 +121,6 @@ struct pblk_g_ctx {
>         u64 lba;
>  };
>
> -/* partial read context */
> -struct pblk_pr_ctx {
> -       struct bio *orig_bio;
> -       DECLARE_BITMAP(bitmap, NVM_MAX_VLBA);
> -       unsigned int orig_nr_secs;
> -       unsigned int bio_init_idx;
> -       void *ppa_ptr;
> -       dma_addr_t dma_ppa_list;
> -       u64 lba_list_mem[NVM_MAX_VLBA];
> -       u64 lba_list_media[NVM_MAX_VLBA];
> -};
> -
>  /* Pad context */
>  struct pblk_pad_rq {
>         struct pblk *pblk;
> @@ -759,7 +747,7 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
>                                  unsigned int pos, unsigned int nr_entries,
>                                  unsigned int count);
>  int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
> -                       struct ppa_addr ppa, int bio_iter, bool advanced_bio);
> +                       struct ppa_addr ppa);
>  unsigned int pblk_rb_read_commit(struct pblk_rb *rb, unsigned int entries);
>
>  unsigned int pblk_rb_sync_init(struct pblk_rb *rb, unsigned long *flags);
> @@ -859,8 +847,8 @@ int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa,
>                        struct pblk_line *gc_line, u64 paddr);
>  void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>                           u64 *lba_list, int nr_secs);
> -void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
> -                        sector_t blba, int nr_secs);
> +int pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
> +                        sector_t blba, int nr_secs, bool *from_cache);
>  void *pblk_get_meta_for_writes(struct pblk *pblk, struct nvm_rq *rqd);
>  void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
>
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 6/7] lightnvm: track inflight target creations
  2019-04-16 10:16 ` [PATCH v4 6/7] lightnvm: track inflight target creations Igor Konopko
@ 2019-04-23  7:12   ` Javier González
  0 siblings, 0 replies; 13+ messages in thread
From: Javier González @ 2019-04-23  7:12 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 4604 bytes --]

> On 16 Apr 2019, at 12.16, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> When creation process is still in progress, target is not yet on
> targets list. This causes a chance for removing whole lightnvm
> subsystem by calling nvm_unregister() in the meantime and finally by
> causing kernel panic inside target init function.
> 
> This patch changes the behaviour by adding kref variable which tracks
> all the users of nvm_dev structure. When nvm_dev is allocated, kref
> value is set to 1. Then before every target creation the value is
> increased and decreased after target removal. The extra reference
> is decreased when nvm subsystem is unregistered.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/core.c  | 41 +++++++++++++++++++++++++++++++----------
> include/linux/lightnvm.h |  1 +
> 2 files changed, 32 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> index e2abe88..0e9f7996 100644
> --- a/drivers/lightnvm/core.c
> +++ b/drivers/lightnvm/core.c
> @@ -45,6 +45,8 @@ struct nvm_dev_map {
> 	int num_ch;
> };
> 
> +static void nvm_free(struct kref *ref);
> +
> static struct nvm_target *nvm_find_target(struct nvm_dev *dev, const char *name)
> {
> 	struct nvm_target *tgt;
> @@ -501,6 +503,7 @@ static int nvm_remove_tgt(struct nvm_dev *dev, struct nvm_ioctl_remove *remove)
> 	}
> 	__nvm_remove_target(t, true);
> 	mutex_unlock(&dev->mlock);
> +	kref_put(&dev->ref, nvm_free);
> 
> 	return 0;
> }
> @@ -1094,15 +1097,16 @@ static int nvm_core_init(struct nvm_dev *dev)
> 	return ret;
> }
> 
> -static void nvm_free(struct nvm_dev *dev)
> +static void nvm_free(struct kref *ref)
> {
> -	if (!dev)
> -		return;
> +	struct nvm_dev *dev = container_of(ref, struct nvm_dev, ref);
> 
> 	if (dev->dma_pool)
> 		dev->ops->destroy_dma_pool(dev->dma_pool);
> 
> -	nvm_unregister_map(dev);
> +	if (dev->rmap)
> +		nvm_unregister_map(dev);
> +
> 	kfree(dev->lun_map);
> 	kfree(dev);
> }
> @@ -1139,7 +1143,13 @@ static int nvm_init(struct nvm_dev *dev)
> 
> struct nvm_dev *nvm_alloc_dev(int node)
> {
> -	return kzalloc_node(sizeof(struct nvm_dev), GFP_KERNEL, node);
> +	struct nvm_dev *dev;
> +
> +	dev = kzalloc_node(sizeof(struct nvm_dev), GFP_KERNEL, node);
> +	if (dev)
> +		kref_init(&dev->ref);
> +
> +	return dev;
> }
> EXPORT_SYMBOL(nvm_alloc_dev);
> 
> @@ -1147,12 +1157,16 @@ int nvm_register(struct nvm_dev *dev)
> {
> 	int ret, exp_pool_size;
> 
> -	if (!dev->q || !dev->ops)
> +	if (!dev->q || !dev->ops) {
> +		kref_put(&dev->ref, nvm_free);
> 		return -EINVAL;
> +	}
> 
> 	ret = nvm_init(dev);
> -	if (ret)
> +	if (ret) {
> +		kref_put(&dev->ref, nvm_free);
> 		return ret;
> +	}
> 
> 	exp_pool_size = max_t(int, PAGE_SIZE,
> 			      (NVM_MAX_VLBA * (sizeof(u64) + dev->geo.sos)));
> @@ -1162,7 +1176,7 @@ int nvm_register(struct nvm_dev *dev)
> 						  exp_pool_size);
> 	if (!dev->dma_pool) {
> 		pr_err("nvm: could not create dma pool\n");
> -		nvm_free(dev);
> +		kref_put(&dev->ref, nvm_free);
> 		return -ENOMEM;
> 	}
> 
> @@ -1184,6 +1198,7 @@ void nvm_unregister(struct nvm_dev *dev)
> 		if (t->dev->parent != dev)
> 			continue;
> 		__nvm_remove_target(t, false);
> +		kref_put(&dev->ref, nvm_free);
> 	}
> 	mutex_unlock(&dev->mlock);
> 
> @@ -1191,13 +1206,14 @@ void nvm_unregister(struct nvm_dev *dev)
> 	list_del(&dev->devices);
> 	up_write(&nvm_lock);
> 
> -	nvm_free(dev);
> +	kref_put(&dev->ref, nvm_free);
> }
> EXPORT_SYMBOL(nvm_unregister);
> 
> static int __nvm_configure_create(struct nvm_ioctl_create *create)
> {
> 	struct nvm_dev *dev;
> +	int ret;
> 
> 	down_write(&nvm_lock);
> 	dev = nvm_find_nvm_dev(create->dev);
> @@ -1208,7 +1224,12 @@ static int __nvm_configure_create(struct nvm_ioctl_create *create)
> 		return -EINVAL;
> 	}
> 
> -	return nvm_create_tgt(dev, create);
> +	kref_get(&dev->ref);
> +	ret = nvm_create_tgt(dev, create);
> +	if (ret)
> +		kref_put(&dev->ref, nvm_free);
> +
> +	return ret;
> }
> 
> static long nvm_ioctl_info(struct file *file, void __user *arg)
> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
> index d3b0270..4d0d565 100644
> --- a/include/linux/lightnvm.h
> +++ b/include/linux/lightnvm.h
> @@ -428,6 +428,7 @@ struct nvm_dev {
> 	char name[DISK_NAME_LEN];
> 	void *private_data;
> 
> +	struct kref ref;
> 	void *rmap;
> 
> 	struct mutex mlock;
> --
> 2.9.5

Much better with the kref()


Reviewed-by: Javier González <javier@javigon.com>


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 7/7] lightnvm: do not remove instance under global lock
  2019-04-16 10:16 ` [PATCH v4 7/7] lightnvm: do not remove instance under global lock Igor Konopko
@ 2019-04-23  7:13   ` Javier González
  0 siblings, 0 replies; 13+ messages in thread
From: Javier González @ 2019-04-23  7:13 UTC (permalink / raw)
  To: Konopko, Igor J; +Cc: Matias Bjørling, Hans Holmberg, linux-block

[-- Attachment #1: Type: text/plain, Size: 3165 bytes --]

> On 16 Apr 2019, at 12.16, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> Currently all the target instances are removed under global nvm_lock.
> This was needed to ensure that nvm_dev struct will not be freed by
> hot unplug event during target removal. However, current implementation
> has some drawbacks, since the same lock is used when new nvme subsystem
> is registered, so we can have a situation, that due to long process of
> target removal on drive A, registration (and listing in OS) of the
> drive B will take a lot of time, since it will wait for that lock.
> 
> Now when we have kref which ensures that nvm_dev will not be freed in
> the meantime, we can easily get rid of this lock for a time when we are
> removing nvm targets.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/core.c | 34 ++++++++++++++++------------------
> 1 file changed, 16 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> index 0e9f7996..0df7454 100644
> --- a/drivers/lightnvm/core.c
> +++ b/drivers/lightnvm/core.c
> @@ -483,7 +483,6 @@ static void __nvm_remove_target(struct nvm_target *t, bool graceful)
> 
> /**
>  * nvm_remove_tgt - Removes a target from the media manager
> - * @dev:	device
>  * @remove:	ioctl structure with target name to remove.
>  *
>  * Returns:
> @@ -491,18 +490,27 @@ static void __nvm_remove_target(struct nvm_target *t, bool graceful)
>  * 1: on not found
>  * <0: on error
>  */
> -static int nvm_remove_tgt(struct nvm_dev *dev, struct nvm_ioctl_remove *remove)
> +static int nvm_remove_tgt(struct nvm_ioctl_remove *remove)
> {
> 	struct nvm_target *t;
> +	struct nvm_dev *dev;
> 
> -	mutex_lock(&dev->mlock);
> -	t = nvm_find_target(dev, remove->tgtname);
> -	if (!t) {
> +	down_read(&nvm_lock);
> +	list_for_each_entry(dev, &nvm_devices, devices) {
> +		mutex_lock(&dev->mlock);
> +		t = nvm_find_target(dev, remove->tgtname);
> +		if (t) {
> +			mutex_unlock(&dev->mlock);
> +			break;
> +		}
> 		mutex_unlock(&dev->mlock);
> -		return 1;
> 	}
> +	up_read(&nvm_lock);
> +
> +	if (!t)
> +		return 1;
> +
> 	__nvm_remove_target(t, true);
> -	mutex_unlock(&dev->mlock);
> 	kref_put(&dev->ref, nvm_free);
> 
> 	return 0;
> @@ -1348,8 +1356,6 @@ static long nvm_ioctl_dev_create(struct file *file, void __user *arg)
> static long nvm_ioctl_dev_remove(struct file *file, void __user *arg)
> {
> 	struct nvm_ioctl_remove remove;
> -	struct nvm_dev *dev;
> -	int ret = 0;
> 
> 	if (copy_from_user(&remove, arg, sizeof(struct nvm_ioctl_remove)))
> 		return -EFAULT;
> @@ -1361,15 +1367,7 @@ static long nvm_ioctl_dev_remove(struct file *file, void __user *arg)
> 		return -EINVAL;
> 	}
> 
> -	down_read(&nvm_lock);
> -	list_for_each_entry(dev, &nvm_devices, devices) {
> -		ret = nvm_remove_tgt(dev, &remove);
> -		if (!ret)
> -			break;
> -	}
> -	up_read(&nvm_lock);
> -
> -	return ret;
> +	return nvm_remove_tgt(&remove);
> }
> 
> /* kept for compatibility reasons */
> --
> 2.9.5

Looks good to me


Reviewed-by: Javier González <javier@javigon.com>


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 2/7] lightnvm: pblk: simplify partial read path
  2019-04-17 17:11   ` Heiner Litz
@ 2019-04-23 13:51     ` Igor Konopko
  0 siblings, 0 replies; 13+ messages in thread
From: Igor Konopko @ 2019-04-23 13:51 UTC (permalink / raw)
  To: Heiner Litz
  Cc: Matias Bjørling, Javier González, Hans Holmberg, linux-block



On 17.04.2019 19:11, Heiner Litz wrote:
> Hi Igor,
> thank you for doing this. For the most part, this looks great. Some comments:
> 
> 1. When performing cached reads, you can bio_advance all sectors at
> once. So you try reading all from the cache and only if successful,
> you bio_advance all the cached-read sectors. This means you can always
> retry and do not have to goto end.

I'm not sure how you would like to do this instead. The goal of using 
bio_advance() here was to allow pblk_rb_copy_to_bio() to call bio_data() 
in order to get proper buffer to write the data from cache. If you have 
some other idea how to access the particular part of bio data buffer 
without using bio_advance() let me know.

> 2. should we set
>   split->bi_rw |= REQ_NOMERGE
> as in blk_queue_split?

So I check the other users of bio_split() (like md) and they did not set 
this flag, so looks like it is not needed in that case.

> 3. Did you test rqds where sequences of cached and non-cached segments
> alternate?

Yes, I did some testing including such a scenarios (with debug prints 
not included in this patch).

> 
> Heiner
> 
> On Tue, Apr 16, 2019 at 3:19 AM Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>> This patch changes the approach to handling partial read path.
>>
>> In old approach merging of data from round buffer and drive was fully
>> made by drive. This had some disadvantages - code was complex and
>> relies on bio internals, so it was hard to maintain and was strongly
>> dependent on bio changes.
>>
>> In new approach most of the handling is done mostly by block layer
>> functions such as bio_split(), bio_chain() and generic_make request()
>> and generally is less complex and easier to maintain. Below some more
>> details of the new approach.
>>
>> When read bio arrives, it is cloned for pblk internal purposes. All
>> the L2P mapping, which includes copying data from round buffer to bio
>> and thus bio_advance() calls is done on the cloned bio, so the original
>> bio is untouched. If we found that we have partial read case, we
>> still have original bio untouched, so we can split it and continue to
>> process only first part of it in current context, when the rest will be
>> called as separate bio request which is passed to generic_make_request()
>> for further processing.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>>   drivers/lightnvm/pblk-core.c |  13 +-
>>   drivers/lightnvm/pblk-rb.c   |  11 +-
>>   drivers/lightnvm/pblk-read.c | 333 +++++++++++--------------------------------
>>   drivers/lightnvm/pblk.h      |  18 +--
>>   4 files changed, 100 insertions(+), 275 deletions(-)
>>
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index 73be3a0..07270ba 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -2147,8 +2147,8 @@ void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
>>          spin_unlock(&pblk->trans_lock);
>>   }
>>
>> -void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
>> -                        sector_t blba, int nr_secs)
>> +int pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
>> +                        sector_t blba, int nr_secs, bool *from_cache)
>>   {
>>          int i;
>>
>> @@ -2162,10 +2162,19 @@ void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
>>                  if (!pblk_ppa_empty(ppa) && !pblk_addr_in_cache(ppa)) {
>>                          struct pblk_line *line = pblk_ppa_to_line(pblk, ppa);
>>
>> +                       if (i > 0 && *from_cache)
>> +                               break;
>> +                       *from_cache = false;
>> +
>>                          kref_get(&line->ref);
>> +               } else {
>> +                       if (i > 0 && !*from_cache)
>> +                               break;
>> +                       *from_cache = true;
>>                  }
>>          }
>>          spin_unlock(&pblk->trans_lock);
>> +       return i;
>>   }
>>
>>   void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
>> index 3555014..5abb170 100644
>> --- a/drivers/lightnvm/pblk-rb.c
>> +++ b/drivers/lightnvm/pblk-rb.c
>> @@ -642,7 +642,7 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
>>    * be directed to disk.
>>    */
>>   int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
>> -                       struct ppa_addr ppa, int bio_iter, bool advanced_bio)
>> +                       struct ppa_addr ppa)
>>   {
>>          struct pblk *pblk = container_of(rb, struct pblk, rwb);
>>          struct pblk_rb_entry *entry;
>> @@ -673,15 +673,6 @@ int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
>>                  ret = 0;
>>                  goto out;
>>          }
>> -
>> -       /* Only advance the bio if it hasn't been advanced already. If advanced,
>> -        * this bio is at least a partial bio (i.e., it has partially been
>> -        * filled with data from the cache). If part of the data resides on the
>> -        * media, we will read later on
>> -        */
>> -       if (unlikely(!advanced_bio))
>> -               bio_advance(bio, bio_iter * PBLK_EXPOSED_PAGE_SIZE);
>> -
>>          data = bio_data(bio);
>>          memcpy(data, entry->data, rb->seg_size);
>>
>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>> index 0953c34..d98ea39 100644
>> --- a/drivers/lightnvm/pblk-read.c
>> +++ b/drivers/lightnvm/pblk-read.c
>> @@ -26,8 +26,7 @@
>>    * issued.
>>    */
>>   static int pblk_read_from_cache(struct pblk *pblk, struct bio *bio,
>> -                               sector_t lba, struct ppa_addr ppa,
>> -                               int bio_iter, bool advanced_bio)
>> +                               sector_t lba, struct ppa_addr ppa)
>>   {
>>   #ifdef CONFIG_NVM_PBLK_DEBUG
>>          /* Callers must ensure that the ppa points to a cache address */
>> @@ -35,73 +34,75 @@ static int pblk_read_from_cache(struct pblk *pblk, struct bio *bio,
>>          BUG_ON(!pblk_addr_in_cache(ppa));
>>   #endif
>>
>> -       return pblk_rb_copy_to_bio(&pblk->rwb, bio, lba, ppa,
>> -                                               bio_iter, advanced_bio);
>> +       return pblk_rb_copy_to_bio(&pblk->rwb, bio, lba, ppa);
>>   }
>>
>> -static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>> +static int pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>                                   struct bio *bio, sector_t blba,
>> -                                unsigned long *read_bitmap)
>> +                                bool *from_cache)
>>   {
>>          void *meta_list = rqd->meta_list;
>> -       struct ppa_addr ppas[NVM_MAX_VLBA];
>> -       int nr_secs = rqd->nr_ppas;
>> -       bool advanced_bio = false;
>> -       int i, j = 0;
>> +       int nr_secs, i;
>>
>> -       pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
>> +retry:
>> +       nr_secs = pblk_lookup_l2p_seq(pblk, rqd->ppa_list, blba, rqd->nr_ppas,
>> +                                       from_cache);
>> +
>> +       if (!*from_cache)
>> +               goto end;
>>
>>          for (i = 0; i < nr_secs; i++) {
>> -               struct ppa_addr p = ppas[i];
>>                  struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
>>                  sector_t lba = blba + i;
>>
>> -retry:
>> -               if (pblk_ppa_empty(p)) {
>> +               if (pblk_ppa_empty(rqd->ppa_list[i])) {
>>                          __le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
>>
>> -                       WARN_ON(test_and_set_bit(i, read_bitmap));
>>                          meta->lba = addr_empty;
>> -
>> -                       if (unlikely(!advanced_bio)) {
>> -                               bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
>> -                               advanced_bio = true;
>> +               } else if (pblk_addr_in_cache(rqd->ppa_list[i])) {
>> +                       /*
>> +                        * Try to read from write buffer. The address is later
>> +                        * checked on the write buffer to prevent retrieving
>> +                        * overwritten data.
>> +                        */
>> +                       if (!pblk_read_from_cache(pblk, bio, lba,
>> +                                                       rqd->ppa_list[i])) {
>> +                               if (i == 0) {
>> +                                       /*
>> +                                        * We didn't call with bio_advance()
>> +                                        * yet, so we can just retry.
>> +                                        */
>> +                                       goto retry;
>> +                               } else {
>> +                                       /*
>> +                                        * We already call bio_advance()
>> +                                        * so we cannot retry and we need
>> +                                        * to quit that function in order
>> +                                        * to allow caller to handle the bio
>> +                                        * splitting in the current sector
>> +                                        * position.
>> +                                        */
>> +                                       nr_secs = i;
>> +                                       goto end;
>> +                               }
>>                          }
>> -
>> -                       goto next;
>> -               }
>> -
>> -               /* Try to read from write buffer. The address is later checked
>> -                * on the write buffer to prevent retrieving overwritten data.
>> -                */
>> -               if (pblk_addr_in_cache(p)) {
>> -                       if (!pblk_read_from_cache(pblk, bio, lba, p, i,
>> -                                                               advanced_bio)) {
>> -                               pblk_lookup_l2p_seq(pblk, &p, lba, 1);
>> -                               goto retry;
>> -                       }
>> -                       WARN_ON(test_and_set_bit(i, read_bitmap));
>>                          meta->lba = cpu_to_le64(lba);
>> -                       advanced_bio = true;
>>   #ifdef CONFIG_NVM_PBLK_DEBUG
>>                          atomic_long_inc(&pblk->cache_reads);
>>   #endif
>> -               } else {
>> -                       /* Read from media non-cached sectors */
>> -                       rqd->ppa_list[j++] = p;
>>                  }
>> -
>> -next:
>> -               if (advanced_bio)
>> -                       bio_advance(bio, PBLK_EXPOSED_PAGE_SIZE);
>> +               bio_advance(bio, PBLK_EXPOSED_PAGE_SIZE);
>>          }
>>
>> +end:
>>          if (pblk_io_aligned(pblk, nr_secs))
>>                  rqd->is_seq = 1;
>>
>>   #ifdef CONFIG_NVM_PBLK_DEBUG
>>          atomic_long_add(nr_secs, &pblk->inflight_reads);
>>   #endif
>> +
>> +       return nr_secs;
>>   }
>>
>>
>> @@ -197,9 +198,7 @@ static void __pblk_end_io_read(struct pblk *pblk, struct nvm_rq *rqd,
>>                  pblk_log_read_err(pblk, rqd);
>>
>>          pblk_read_check_seq(pblk, rqd, r_ctx->lba);
>> -
>> -       if (int_bio)
>> -               bio_put(int_bio);
>> +       bio_put(int_bio);
>>
>>          if (put_line)
>>                  pblk_rq_to_line_put(pblk, rqd);
>> @@ -223,177 +222,13 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
>>          __pblk_end_io_read(pblk, rqd, true);
>>   }
>>
>> -static void pblk_end_partial_read(struct nvm_rq *rqd)
>> -{
>> -       struct pblk *pblk = rqd->private;
>> -       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>> -       struct pblk_pr_ctx *pr_ctx = r_ctx->private;
>> -       struct pblk_sec_meta *meta;
>> -       struct bio *new_bio = rqd->bio;
>> -       struct bio *bio = pr_ctx->orig_bio;
>> -       struct bio_vec src_bv, dst_bv;
>> -       void *meta_list = rqd->meta_list;
>> -       int bio_init_idx = pr_ctx->bio_init_idx;
>> -       unsigned long *read_bitmap = pr_ctx->bitmap;
>> -       int nr_secs = pr_ctx->orig_nr_secs;
>> -       int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
>> -       void *src_p, *dst_p;
>> -       int hole, i;
>> -
>> -       if (unlikely(nr_holes == 1)) {
>> -               struct ppa_addr ppa;
>> -
>> -               ppa = rqd->ppa_addr;
>> -               rqd->ppa_list = pr_ctx->ppa_ptr;
>> -               rqd->dma_ppa_list = pr_ctx->dma_ppa_list;
>> -               rqd->ppa_list[0] = ppa;
>> -       }
>> -
>> -       for (i = 0; i < nr_secs; i++) {
>> -               meta = pblk_get_meta(pblk, meta_list, i);
>> -               pr_ctx->lba_list_media[i] = le64_to_cpu(meta->lba);
>> -               meta->lba = cpu_to_le64(pr_ctx->lba_list_mem[i]);
>> -       }
>> -
>> -       /* Fill the holes in the original bio */
>> -       i = 0;
>> -       hole = find_first_zero_bit(read_bitmap, nr_secs);
>> -       do {
>> -               struct pblk_line *line;
>> -
>> -               line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
>> -               kref_put(&line->ref, pblk_line_put);
>> -
>> -               meta = pblk_get_meta(pblk, meta_list, hole);
>> -               meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]);
>> -
>> -               src_bv = new_bio->bi_io_vec[i++];
>> -               dst_bv = bio->bi_io_vec[bio_init_idx + hole];
>> -
>> -               src_p = kmap_atomic(src_bv.bv_page);
>> -               dst_p = kmap_atomic(dst_bv.bv_page);
>> -
>> -               memcpy(dst_p + dst_bv.bv_offset,
>> -                       src_p + src_bv.bv_offset,
>> -                       PBLK_EXPOSED_PAGE_SIZE);
>> -
>> -               kunmap_atomic(src_p);
>> -               kunmap_atomic(dst_p);
>> -
>> -               mempool_free(src_bv.bv_page, &pblk->page_bio_pool);
>> -
>> -               hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1);
>> -       } while (hole < nr_secs);
>> -
>> -       bio_put(new_bio);
>> -       kfree(pr_ctx);
>> -
>> -       /* restore original request */
>> -       rqd->bio = NULL;
>> -       rqd->nr_ppas = nr_secs;
>> -
>> -       pblk_end_user_read(bio, rqd->error);
>> -       __pblk_end_io_read(pblk, rqd, false);
>> -}
>> -
>> -static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>> -                           unsigned int bio_init_idx,
>> -                           unsigned long *read_bitmap,
>> -                           int nr_holes)
>> -{
>> -       void *meta_list = rqd->meta_list;
>> -       struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>> -       struct pblk_pr_ctx *pr_ctx;
>> -       struct bio *new_bio, *bio = r_ctx->private;
>> -       int nr_secs = rqd->nr_ppas;
>> -       int i;
>> -
>> -       new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>> -
>> -       if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
>> -               goto fail_bio_put;
>> -
>> -       if (nr_holes != new_bio->bi_vcnt) {
>> -               WARN_ONCE(1, "pblk: malformed bio\n");
>> -               goto fail_free_pages;
>> -       }
>> -
>> -       pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
>> -       if (!pr_ctx)
>> -               goto fail_free_pages;
>> -
>> -       for (i = 0; i < nr_secs; i++) {
>> -               struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
>> -
>> -               pr_ctx->lba_list_mem[i] = le64_to_cpu(meta->lba);
>> -       }
>> -
>> -       new_bio->bi_iter.bi_sector = 0; /* internal bio */
>> -       bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>> -
>> -       rqd->bio = new_bio;
>> -       rqd->nr_ppas = nr_holes;
>> -
>> -       pr_ctx->orig_bio = bio;
>> -       bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
>> -       pr_ctx->bio_init_idx = bio_init_idx;
>> -       pr_ctx->orig_nr_secs = nr_secs;
>> -       r_ctx->private = pr_ctx;
>> -
>> -       if (unlikely(nr_holes == 1)) {
>> -               pr_ctx->ppa_ptr = rqd->ppa_list;
>> -               pr_ctx->dma_ppa_list = rqd->dma_ppa_list;
>> -               rqd->ppa_addr = rqd->ppa_list[0];
>> -       }
>> -       return 0;
>> -
>> -fail_free_pages:
>> -       pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
>> -fail_bio_put:
>> -       bio_put(new_bio);
>> -
>> -       return -ENOMEM;
>> -}
>> -
>> -static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
>> -                                unsigned int bio_init_idx,
>> -                                unsigned long *read_bitmap, int nr_secs)
>> -{
>> -       int nr_holes;
>> -       int ret;
>> -
>> -       nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
>> -
>> -       if (pblk_setup_partial_read(pblk, rqd, bio_init_idx, read_bitmap,
>> -                                   nr_holes))
>> -               return NVM_IO_ERR;
>> -
>> -       rqd->end_io = pblk_end_partial_read;
>> -
>> -       ret = pblk_submit_io(pblk, rqd);
>> -       if (ret) {
>> -               bio_put(rqd->bio);
>> -               pblk_err(pblk, "partial read IO submission failed\n");
>> -               goto err;
>> -       }
>> -
>> -       return NVM_IO_OK;
>> -
>> -err:
>> -       pblk_err(pblk, "failed to perform partial read\n");
>> -
>> -       /* Free allocated pages in new bio */
>> -       pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt);
>> -       return NVM_IO_ERR;
>> -}
>> -
>>   static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>> -                        sector_t lba, unsigned long *read_bitmap)
>> +                        sector_t lba, bool *from_cache)
>>   {
>>          struct pblk_sec_meta *meta = pblk_get_meta(pblk, rqd->meta_list, 0);
>>          struct ppa_addr ppa;
>>
>> -       pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
>> +       pblk_lookup_l2p_seq(pblk, &ppa, lba, 1, from_cache);
>>
>>   #ifdef CONFIG_NVM_PBLK_DEBUG
>>          atomic_long_inc(&pblk->inflight_reads);
>> @@ -403,7 +238,6 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>          if (pblk_ppa_empty(ppa)) {
>>                  __le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
>>
>> -               WARN_ON(test_and_set_bit(0, read_bitmap));
>>                  meta->lba = addr_empty;
>>                  return;
>>          }
>> @@ -412,12 +246,11 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>           * write buffer to prevent retrieving overwritten data.
>>           */
>>          if (pblk_addr_in_cache(ppa)) {
>> -               if (!pblk_read_from_cache(pblk, bio, lba, ppa, 0, 1)) {
>> -                       pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
>> +               if (!pblk_read_from_cache(pblk, bio, lba, ppa)) {
>> +                       pblk_lookup_l2p_seq(pblk, &ppa, lba, 1, from_cache);
>>                          goto retry;
>>                  }
>>
>> -               WARN_ON(test_and_set_bit(0, read_bitmap));
>>                  meta->lba = cpu_to_le64(lba);
>>
>>   #ifdef CONFIG_NVM_PBLK_DEBUG
>> @@ -434,17 +267,14 @@ void pblk_submit_read(struct pblk *pblk, struct bio *bio)
>>          struct request_queue *q = dev->q;
>>          sector_t blba = pblk_get_lba(bio);
>>          unsigned int nr_secs = pblk_get_secs(bio);
>> +       bool from_cache;
>>          struct pblk_g_ctx *r_ctx;
>>          struct nvm_rq *rqd;
>> -       struct bio *int_bio;
>> -       unsigned int bio_init_idx;
>> -       DECLARE_BITMAP(read_bitmap, NVM_MAX_VLBA);
>> +       struct bio *int_bio, *split_bio;
>>
>>          generic_start_io_acct(q, REQ_OP_READ, bio_sectors(bio),
>>                                &pblk->disk->part0);
>>
>> -       bitmap_zero(read_bitmap, nr_secs);
>> -
>>          rqd = pblk_alloc_rqd(pblk, PBLK_READ);
>>
>>          rqd->opcode = NVM_OP_PREAD;
>> @@ -456,11 +286,6 @@ void pblk_submit_read(struct pblk *pblk, struct bio *bio)
>>          r_ctx->start_time = jiffies;
>>          r_ctx->lba = blba;
>>
>> -       /* Save the index for this bio's start. This is needed in case
>> -        * we need to fill a partial read.
>> -        */
>> -       bio_init_idx = pblk_get_bi_idx(bio);
>> -
>>          if (pblk_alloc_rqd_meta(pblk, rqd)) {
>>                  bio_io_error(bio);
>>                  pblk_free_rqd(pblk, rqd, PBLK_READ);
>> @@ -469,46 +294,58 @@ void pblk_submit_read(struct pblk *pblk, struct bio *bio)
>>
>>          /* Clone read bio to deal internally with:
>>           * -read errors when reading from drive
>> -        * -bio_advance() calls during l2p lookup and cache reads
>> +        * -bio_advance() calls during cache reads
>>           */
>>          int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
>>
>>          if (nr_secs > 1)
>> -               pblk_read_ppalist_rq(pblk, rqd, bio, blba, read_bitmap);
>> +               nr_secs = pblk_read_ppalist_rq(pblk, rqd, int_bio, blba,
>> +                                               &from_cache);
>>          else
>> -               pblk_read_rq(pblk, rqd, bio, blba, read_bitmap);
>> +               pblk_read_rq(pblk, rqd, int_bio, blba, &from_cache);
>>
>> +split_retry:
>>          r_ctx->private = bio; /* original bio */
>>          rqd->bio = int_bio; /* internal bio */
>>
>> -       if (bitmap_full(read_bitmap, nr_secs)) {
>> +       if (from_cache && nr_secs == rqd->nr_ppas) {
>> +               /* All data was read from cache, we can complete the IO. */
>>                  pblk_end_user_read(bio, 0);
>>                  atomic_inc(&pblk->inflight_io);
>>                  __pblk_end_io_read(pblk, rqd, false);
>> -               return;
>> -       }
>> -
>> -       if (!bitmap_empty(read_bitmap, rqd->nr_ppas)) {
>> +       } else if (nr_secs != rqd->nr_ppas) {
>>                  /* The read bio request could be partially filled by the write
>>                   * buffer, but there are some holes that need to be read from
>> -                * the drive.
>> +                * the drive. In order to handle this, we will use block layer
>> +                * mechanism to split this request in to smaller ones and make
>> +                * a chain of it.
>>                   */
>> -               bio_put(int_bio);
>> -               rqd->bio = NULL;
>> -               if (pblk_partial_read_bio(pblk, rqd, bio_init_idx, read_bitmap,
>> -                                           nr_secs)) {
>> -                       pblk_err(pblk, "read IO submission failed\n");
>> -                       bio_io_error(bio);
>> -                       __pblk_end_io_read(pblk, rqd, false);
>> -               }
>> -               return;
>> -       }
>> +               split_bio = bio_split(bio, nr_secs * NR_PHY_IN_LOG, GFP_KERNEL,
>> +                                       &pblk_bio_set);
>> +               bio_chain(split_bio, bio);
>> +               generic_make_request(bio);
>> +
>> +               /* New bio contains first N sectors of the previous one, so
>> +                * we can continue to use existing rqd, but we need to shrink
>> +                * the number of PPAs in it. New bio is also guaranteed that
>> +                * it contains only either data from cache or from drive, newer
>> +                * mix of them.
>> +                */
>> +               bio = split_bio;
>> +               rqd->nr_ppas = nr_secs;
>> +               if (rqd->nr_ppas == 1)
>> +                       rqd->ppa_addr = rqd->ppa_list[0];
>>
>> -       /* All sectors are to be read from the device */
>> -       if (pblk_submit_io(pblk, rqd)) {
>> -               pblk_err(pblk, "read IO submission failed\n");
>> -               bio_io_error(bio);
>> -               __pblk_end_io_read(pblk, rqd, false);
>> +               /* Recreate int_bio - existing might have some needed internal
>> +                * fields modified already.
>> +                */
>> +               bio_put(int_bio);
>> +               int_bio = bio_clone_fast(bio, GFP_KERNEL, &pblk_bio_set);
>> +               goto split_retry;
>> +       } else if (pblk_submit_io(pblk, rqd)) {
>> +               /* Submitting IO to drive failed, let's report an error */
>> +               rqd->error = -ENODEV;
>> +               pblk_end_io_read(rqd);
>>          }
>>   }
>>
>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>> index 17ced12..a678553 100644
>> --- a/drivers/lightnvm/pblk.h
>> +++ b/drivers/lightnvm/pblk.h
>> @@ -121,18 +121,6 @@ struct pblk_g_ctx {
>>          u64 lba;
>>   };
>>
>> -/* partial read context */
>> -struct pblk_pr_ctx {
>> -       struct bio *orig_bio;
>> -       DECLARE_BITMAP(bitmap, NVM_MAX_VLBA);
>> -       unsigned int orig_nr_secs;
>> -       unsigned int bio_init_idx;
>> -       void *ppa_ptr;
>> -       dma_addr_t dma_ppa_list;
>> -       u64 lba_list_mem[NVM_MAX_VLBA];
>> -       u64 lba_list_media[NVM_MAX_VLBA];
>> -};
>> -
>>   /* Pad context */
>>   struct pblk_pad_rq {
>>          struct pblk *pblk;
>> @@ -759,7 +747,7 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
>>                                   unsigned int pos, unsigned int nr_entries,
>>                                   unsigned int count);
>>   int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
>> -                       struct ppa_addr ppa, int bio_iter, bool advanced_bio);
>> +                       struct ppa_addr ppa);
>>   unsigned int pblk_rb_read_commit(struct pblk_rb *rb, unsigned int entries);
>>
>>   unsigned int pblk_rb_sync_init(struct pblk_rb *rb, unsigned long *flags);
>> @@ -859,8 +847,8 @@ int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa,
>>                         struct pblk_line *gc_line, u64 paddr);
>>   void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>>                            u64 *lba_list, int nr_secs);
>> -void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
>> -                        sector_t blba, int nr_secs);
>> +int pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
>> +                        sector_t blba, int nr_secs, bool *from_cache);
>>   void *pblk_get_meta_for_writes(struct pblk *pblk, struct nvm_rq *rqd);
>>   void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
>>
>> --
>> 2.9.5
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/7] lightnvm: next set of improvements for 5.2
  2019-04-16 10:16 [PATCH v4 0/7] lightnvm: next set of improvements for 5.2 Igor Konopko
                   ` (6 preceding siblings ...)
  2019-04-16 10:16 ` [PATCH v4 7/7] lightnvm: do not remove instance under global lock Igor Konopko
@ 2019-04-26 12:34 ` Matias Bjørling
  7 siblings, 0 replies; 13+ messages in thread
From: Matias Bjørling @ 2019-04-26 12:34 UTC (permalink / raw)
  To: Igor Konopko, javier, hans.holmberg; +Cc: linux-block

On 4/16/19 12:16 PM, Igor Konopko wrote:
> This is another set of fixes and improvements to both pblk and lightnvm
> core.
> 
> First & second patches are the most crutial, since they changes
> the approach to the partial read path, so detailed review is needed
> especially here.
> 
> Other patches are my other findings related to some bugs or potential
> improvements, mostly related to some corner cases.
> 
> Changes v3 -> v4:
> -dropped patches which were already pulled into for-5.2/core branch
> -major changes for patch #2 based on code review
> -patch #6 modified to use krefs
> -new patch #7 which extends the patch #6
> 
> Changes v2 -> v3:
> -dropped some not needed patches
> -dropped patches which were already pulled into for-5.2/core branch
> -commit messages cleanup
> 
> Changes v1 -> v2:
> -dropped some not needed patches
> -review feedback incorporated for some of the patches
> -partial read path changes patch splited into two patches
> 
> Igor Konopko (7):
>    lightnvm: pblk: IO path reorganization
>    lightnvm: pblk: simplify partial read path
>    lightnvm: pblk: recover only written metadata
>    lightnvm: pblk: store multiple copies of smeta
>    lightnvm: pblk: use nvm_rq_to_ppa_list()
>    lightnvm: track inflight target creations
>    lightnvm: do not remove instance under global lock
> 
>   drivers/lightnvm/core.c          |  75 +++++---
>   drivers/lightnvm/pblk-cache.c    |   8 +-
>   drivers/lightnvm/pblk-core.c     | 159 +++++++++++++----
>   drivers/lightnvm/pblk-init.c     |  37 ++--
>   drivers/lightnvm/pblk-rb.c       |  11 +-
>   drivers/lightnvm/pblk-read.c     | 376 +++++++++++----------------------------
>   drivers/lightnvm/pblk-recovery.c |  35 ++--
>   drivers/lightnvm/pblk-rl.c       |   3 +-
>   drivers/lightnvm/pblk.h          |  23 +--
>   include/linux/lightnvm.h         |   1 +
>   10 files changed, 333 insertions(+), 395 deletions(-)
> 

Thanks Igor!

Picked 1, 3, 6 and 7. Could you please rebase 2 and 5? I'm still 
noodling over 4.

-Matias


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-04-26 12:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-16 10:16 [PATCH v4 0/7] lightnvm: next set of improvements for 5.2 Igor Konopko
2019-04-16 10:16 ` [PATCH v4 1/7] lightnvm: pblk: IO path reorganization Igor Konopko
2019-04-16 10:16 ` [PATCH v4 2/7] lightnvm: pblk: simplify partial read path Igor Konopko
2019-04-17 17:11   ` Heiner Litz
2019-04-23 13:51     ` Igor Konopko
2019-04-16 10:16 ` [PATCH v4 3/7] lightnvm: pblk: recover only written metadata Igor Konopko
2019-04-16 10:16 ` [PATCH v4 4/7] lightnvm: pblk: store multiple copies of smeta Igor Konopko
2019-04-16 10:16 ` [PATCH v4 5/7] lightnvm: pblk: use nvm_rq_to_ppa_list() Igor Konopko
2019-04-16 10:16 ` [PATCH v4 6/7] lightnvm: track inflight target creations Igor Konopko
2019-04-23  7:12   ` Javier González
2019-04-16 10:16 ` [PATCH v4 7/7] lightnvm: do not remove instance under global lock Igor Konopko
2019-04-23  7:13   ` Javier González
2019-04-26 12:34 ` [PATCH v4 0/7] lightnvm: next set of improvements for 5.2 Matias Bjørling

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).