linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/18] lightnvm: pblk patches for 4.14
@ 2017-09-06 10:50 Javier González
  2017-09-06 10:50 ` [PATCH 01/18] lightnvm: pblk: improve naming for internal req Javier González
                   ` (18 more replies)
  0 siblings, 19 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:50 UTC (permalink / raw)
  To: mb, axboe; +Cc: linux-block, linux-kernel, Javier González

Hi Jens,

Here is the pblk patchset for this window.

The most notable patches are on the read path:
  - ("lightnvm: pblk: check lba sanity on read path") uses the lba
    stored on the out-of-bound area to verify that the read ppa
    corresponds to the lba pointed to by the read bio.
  - ("lightnvm: pblk: guarantee line integrity on reads") guarantees
    that if a line is being GC'ed and a read comes in the middle, that
    line is not being moved into the free list until the read completes.
    Otherwise, the line could be reclaimed to be erased and re-written,
    thus causing data corruption.

The rest of the patches are are basically bug fixes and refactoring to
improve code readability.

The patches apply on your for-4.14/block, and you can be found at:
  https://github.com/OpenChannelSSD/linux/tree/pblk.for-4.14

Thanks,
Javier

Javier González (18):
  lightnvm: pblk: improve naming for internal req.
  lightnvm: pblk: refactor read lba sanity check
  lightnvm: pblk: normalize ppa namings
  lightnvm: pblk: check for failed mempool alloc.
  lightnvm: pblk: initialize debug stat counter
  lightnvm: pblk: use right flag for GC allocation
  lightnvm: pblk: use constant for GC parameter
  lightnvm: pblk: check lba sanity on read path
  lightnvm: pblk: simplify data validity check on GC
  lightnvm: pblk: use bio_copy_kern when possible
  lightnvm: pblk: refactor read path on GC
  lightnvm: pblk: free padded entries in write buffer
  lightnvm: pblk: fix write I/O sync stat
  lightnvm: pblk: simplify path on REQ_PREFLUSH
  lightnvm: pblk: avoid deadlock on low LUN config
  lightnvm: pblk: enable 1 LUN configuration
  lightnvm: pblk: guarantee line integrity on reads
  lightnvm: pblk: remove unnecessary check

 drivers/lightnvm/pblk-cache.c    |  24 ++--
 drivers/lightnvm/pblk-core.c     | 187 ++++++++++++++++--------------
 drivers/lightnvm/pblk-gc.c       | 133 ++++++++++------------
 drivers/lightnvm/pblk-init.c     |  25 ++++-
 drivers/lightnvm/pblk-map.c      |  21 ++--
 drivers/lightnvm/pblk-rb.c       |  14 +--
 drivers/lightnvm/pblk-read.c     | 237 ++++++++++++++++++++++++++-------------
 drivers/lightnvm/pblk-recovery.c |   9 +-
 drivers/lightnvm/pblk-rl.c       |   6 +
 drivers/lightnvm/pblk-write.c    |  37 +++---
 drivers/lightnvm/pblk.h          |  40 +++----
 11 files changed, 423 insertions(+), 310 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/18] lightnvm: pblk: improve naming for internal req.
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
@ 2017-09-06 10:50 ` Javier González
  2017-09-06 13:46   ` Christoph Hellwig
  2017-09-06 10:50 ` [PATCH 02/18] lightnvm: pblk: refactor read lba sanity check Javier González
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 27+ messages in thread
From: Javier González @ 2017-09-06 10:50 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

Each request type sent to the LightNVM subsystem requires different
metadata. Until now, we have tailored this metadata based on write, read
and erase commands. However, pblk uses different metadata for internal
writes that do not hit the write buffer. Instead of abusing the metadata
for reads, create a new request type - internal write. This improves
code readability.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-recovery.c | 6 +++---
 drivers/lightnvm/pblk-write.c    | 6 +++---
 drivers/lightnvm/pblk.h          | 5 ++++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index cb556e06673e..279638309d9a 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -344,7 +344,7 @@ static void pblk_end_io_recov(struct nvm_rq *rqd)
 
 	bio_put(rqd->bio);
 	nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
-	pblk_free_rqd(pblk, rqd, WRITE);
+	pblk_free_rqd(pblk, rqd, WRITE_INT);
 
 	atomic_dec(&pblk->inflight_io);
 	kref_put(&pad_rq->ref, pblk_recov_complete);
@@ -404,7 +404,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
 	ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
 	dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
 
-	rqd = pblk_alloc_rqd(pblk, WRITE);
+	rqd = pblk_alloc_rqd(pblk, WRITE_INT);
 	if (IS_ERR(rqd)) {
 		ret = PTR_ERR(rqd);
 		goto fail_free_meta;
@@ -491,7 +491,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
 fail_free_bio:
 	bio_put(bio);
 fail_free_rqd:
-	pblk_free_rqd(pblk, rqd, WRITE);
+	pblk_free_rqd(pblk, rqd, WRITE_INT);
 fail_free_meta:
 	nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
 fail_free_pad:
diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index 3ad9e56d2473..920b93fb15d5 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -199,7 +199,7 @@ static void pblk_end_io_write_meta(struct nvm_rq *rqd)
 
 	bio_put(rqd->bio);
 	nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
-	pblk_free_rqd(pblk, rqd, READ);
+	pblk_free_rqd(pblk, rqd, WRITE_INT);
 
 	atomic_dec(&pblk->inflight_io);
 }
@@ -370,7 +370,7 @@ int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line)
 	int i, j;
 	int ret;
 
-	rqd = pblk_alloc_rqd(pblk, READ);
+	rqd = pblk_alloc_rqd(pblk, WRITE_INT);
 	if (IS_ERR(rqd)) {
 		pr_err("pblk: cannot allocate write req.\n");
 		return PTR_ERR(rqd);
@@ -434,7 +434,7 @@ int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line)
 	if (likely(l_mg->emeta_alloc_type == PBLK_VMALLOC_META))
 		bio_put(bio);
 fail_free_rqd:
-	pblk_free_rqd(pblk, rqd, READ);
+	pblk_free_rqd(pblk, rqd, WRITE_INT);
 	return ret;
 }
 
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 67e623bd5c2d..08a463fe855f 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -59,7 +59,10 @@
 		for ((i) = 0, rlun = &(pblk)->luns[0]; \
 			(i) < (pblk)->nr_luns; (i)++, rlun = &(pblk)->luns[(i)])
 
-#define ERASE 2 /* READ = 0, WRITE = 1 */
+/* READ = 0, WRITE (user) = 1 */
+#define ERASE 2
+#define WRITE_INT 3 /* Internal write. Not through write buffer */
+
 
 enum {
 	/* IO Types */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/18] lightnvm: pblk: refactor read lba sanity check
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
  2017-09-06 10:50 ` [PATCH 01/18] lightnvm: pblk: improve naming for internal req Javier González
@ 2017-09-06 10:50 ` Javier González
  2017-09-06 10:50 ` [PATCH 03/18] lightnvm: pblk: normalize ppa namings Javier González
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:50 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

Refactor lba sanity check on read path to avoid code duplication.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-read.c | 29 ++++++++++-------------------
 1 file changed, 10 insertions(+), 19 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index d682e89e6493..db9a3b575f62 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -39,21 +39,14 @@ static int pblk_read_from_cache(struct pblk *pblk, struct bio *bio,
 }
 
 static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
-				 unsigned long *read_bitmap)
+				 sector_t blba, unsigned long *read_bitmap)
 {
 	struct bio *bio = rqd->bio;
 	struct ppa_addr ppas[PBLK_MAX_REQ_ADDRS];
-	sector_t blba = pblk_get_lba(bio);
 	int nr_secs = rqd->nr_ppas;
 	bool advanced_bio = false;
 	int i, j = 0;
 
-	/* logic error: lba out-of-bounds. Ignore read request */
-	if (blba + nr_secs >= pblk->rl.nr_secs) {
-		WARN(1, "pblk: read lbas out of bounds\n");
-		return;
-	}
-
 	pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
 
 	for (i = 0; i < nr_secs; i++) {
@@ -263,17 +256,10 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 }
 
 static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd,
-			 unsigned long *read_bitmap)
+			 sector_t lba, unsigned long *read_bitmap)
 {
 	struct bio *bio = rqd->bio;
 	struct ppa_addr ppa;
-	sector_t lba = pblk_get_lba(bio);
-
-	/* logic error: lba out-of-bounds. Ignore read request */
-	if (lba >= pblk->rl.nr_secs) {
-		WARN(1, "pblk: read lba out of bounds\n");
-		return;
-	}
 
 	pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
 
@@ -309,14 +295,19 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd,
 int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
+	sector_t blba = pblk_get_lba(bio);
 	unsigned int nr_secs = pblk_get_secs(bio);
 	struct nvm_rq *rqd;
 	unsigned long read_bitmap; /* Max 64 ppas per request */
 	unsigned int bio_init_idx;
 	int ret = NVM_IO_ERR;
 
-	if (nr_secs > PBLK_MAX_REQ_ADDRS)
+	/* logic error: lba out-of-bounds. Ignore read request */
+	if (blba >= pblk->rl.nr_secs || nr_secs > PBLK_MAX_REQ_ADDRS) {
+		WARN(1, "pblk: read lba out of bounds (lba:%llu, nr:%d)\n",
+					(unsigned long long)blba, nr_secs);
 		return NVM_IO_ERR;
+	}
 
 	bitmap_zero(&read_bitmap, nr_secs);
 
@@ -348,9 +339,9 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 		rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
 		rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
 
-		pblk_read_ppalist_rq(pblk, rqd, &read_bitmap);
+		pblk_read_ppalist_rq(pblk, rqd, blba, &read_bitmap);
 	} else {
-		pblk_read_rq(pblk, rqd, &read_bitmap);
+		pblk_read_rq(pblk, rqd, blba, &read_bitmap);
 	}
 
 	bio_get(bio);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/18] lightnvm: pblk: normalize ppa namings
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
  2017-09-06 10:50 ` [PATCH 01/18] lightnvm: pblk: improve naming for internal req Javier González
  2017-09-06 10:50 ` [PATCH 02/18] lightnvm: pblk: refactor read lba sanity check Javier González
@ 2017-09-06 10:50 ` Javier González
  2017-09-06 10:50 ` [PATCH 04/18] lightnvm: pblk: check for failed mempool alloc Javier González
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:50 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

Normalize the way we name ppa variables to improve code readability.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-core.c | 48 +++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 81501644fb15..e3997abdf740 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1764,7 +1764,7 @@ void pblk_up_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
 
 void pblk_update_map(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
 {
-	struct ppa_addr l2p_ppa;
+	struct ppa_addr ppa_l2p;
 
 	/* logic error: lba out-of-bounds. Ignore update */
 	if (!(lba < pblk->rl.nr_secs)) {
@@ -1773,10 +1773,10 @@ void pblk_update_map(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
 	}
 
 	spin_lock(&pblk->trans_lock);
-	l2p_ppa = pblk_trans_map_get(pblk, lba);
+	ppa_l2p = pblk_trans_map_get(pblk, lba);
 
-	if (!pblk_addr_in_cache(l2p_ppa) && !pblk_ppa_empty(l2p_ppa))
-		pblk_map_invalidate(pblk, l2p_ppa);
+	if (!pblk_addr_in_cache(ppa_l2p) && !pblk_ppa_empty(ppa_l2p))
+		pblk_map_invalidate(pblk, ppa_l2p);
 
 	pblk_trans_map_set(pblk, lba, ppa);
 	spin_unlock(&pblk->trans_lock);
@@ -1793,16 +1793,16 @@ void pblk_update_map_cache(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
 	pblk_update_map(pblk, lba, ppa);
 }
 
-int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa,
+int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa_new,
 		       struct pblk_line *gc_line)
 {
-	struct ppa_addr l2p_ppa;
+	struct ppa_addr ppa_l2p;
 	int ret = 1;
 
 #ifdef CONFIG_NVM_DEBUG
 	/* Callers must ensure that the ppa points to a cache address */
-	BUG_ON(!pblk_addr_in_cache(ppa));
-	BUG_ON(pblk_rb_pos_oob(&pblk->rwb, pblk_addr_to_cacheline(ppa)));
+	BUG_ON(!pblk_addr_in_cache(ppa_new));
+	BUG_ON(pblk_rb_pos_oob(&pblk->rwb, pblk_addr_to_cacheline(ppa_new)));
 #endif
 
 	/* logic error: lba out-of-bounds. Ignore update */
@@ -1812,36 +1812,38 @@ int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa,
 	}
 
 	spin_lock(&pblk->trans_lock);
-	l2p_ppa = pblk_trans_map_get(pblk, lba);
+	ppa_l2p = pblk_trans_map_get(pblk, lba);
 
 	/* Prevent updated entries to be overwritten by GC */
-	if (pblk_addr_in_cache(l2p_ppa) || pblk_ppa_empty(l2p_ppa) ||
-				pblk_tgt_ppa_to_line(l2p_ppa) != gc_line->id) {
+	if (pblk_addr_in_cache(ppa_l2p) || pblk_ppa_empty(ppa_l2p) ||
+				pblk_tgt_ppa_to_line(ppa_l2p) != gc_line->id) {
+
 		ret = 0;
 		goto out;
 	}
 
-	pblk_trans_map_set(pblk, lba, ppa);
+	pblk_trans_map_set(pblk, lba, ppa_new);
 out:
 	spin_unlock(&pblk->trans_lock);
 	return ret;
 }
 
-void pblk_update_map_dev(struct pblk *pblk, sector_t lba, struct ppa_addr ppa,
-			 struct ppa_addr entry_line)
+void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
+			 struct ppa_addr ppa_mapped, struct ppa_addr ppa_cache)
 {
-	struct ppa_addr l2p_line;
+	struct ppa_addr ppa_l2p;
 
 #ifdef CONFIG_NVM_DEBUG
 	/* Callers must ensure that the ppa points to a device address */
-	BUG_ON(pblk_addr_in_cache(ppa));
+	BUG_ON(pblk_addr_in_cache(ppa_mapped));
 #endif
 	/* Invalidate and discard padded entries */
 	if (lba == ADDR_EMPTY) {
 #ifdef CONFIG_NVM_DEBUG
 		atomic_long_inc(&pblk->padded_wb);
 #endif
-		pblk_map_invalidate(pblk, ppa);
+		if (!pblk_ppa_empty(ppa_mapped))
+			pblk_map_invalidate(pblk, ppa_mapped);
 		return;
 	}
 
@@ -1852,22 +1854,22 @@ void pblk_update_map_dev(struct pblk *pblk, sector_t lba, struct ppa_addr ppa,
 	}
 
 	spin_lock(&pblk->trans_lock);
-	l2p_line = pblk_trans_map_get(pblk, lba);
+	ppa_l2p = pblk_trans_map_get(pblk, lba);
 
 	/* Do not update L2P if the cacheline has been updated. In this case,
 	 * the mapped ppa must be invalidated
 	 */
-	if (l2p_line.ppa != entry_line.ppa) {
-		if (!pblk_ppa_empty(ppa))
-			pblk_map_invalidate(pblk, ppa);
+	if (!pblk_ppa_comp(ppa_l2p, ppa_cache)) {
+		if (!pblk_ppa_empty(ppa_mapped))
+			pblk_map_invalidate(pblk, ppa_mapped);
 		goto out;
 	}
 
 #ifdef CONFIG_NVM_DEBUG
-	WARN_ON(!pblk_addr_in_cache(l2p_line) && !pblk_ppa_empty(l2p_line));
+	WARN_ON(!pblk_addr_in_cache(ppa_l2p) && !pblk_ppa_empty(ppa_l2p));
 #endif
 
-	pblk_trans_map_set(pblk, lba, ppa);
+	pblk_trans_map_set(pblk, lba, ppa_mapped);
 out:
 	spin_unlock(&pblk->trans_lock);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/18] lightnvm: pblk: check for failed mempool alloc.
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (2 preceding siblings ...)
  2017-09-06 10:50 ` [PATCH 03/18] lightnvm: pblk: normalize ppa namings Javier González
@ 2017-09-06 10:50 ` Javier González
  2017-09-06 10:50 ` [PATCH 05/18] lightnvm: pblk: initialize debug stat counter Javier González
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:50 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

Check for failed mempool allocations and act accordingly.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index e3997abdf740..11a684162a5d 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -165,6 +165,8 @@ struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int rw)
 	}
 
 	rqd = mempool_alloc(pool, GFP_KERNEL);
+	if (!rqd)
+		return NULL;
 	memset(rqd, 0, rq_size);
 
 	return rqd;
@@ -1478,6 +1480,8 @@ int pblk_blk_erase_async(struct pblk *pblk, struct ppa_addr ppa)
 	int err;
 
 	rqd = mempool_alloc(pblk->g_rq_pool, GFP_KERNEL);
+	if (!rqd)
+		return -ENOMEM;
 	memset(rqd, 0, pblk_g_rq_size);
 
 	pblk_setup_e_rq(pblk, rqd, ppa);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/18] lightnvm: pblk: initialize debug stat counter
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (3 preceding siblings ...)
  2017-09-06 10:50 ` [PATCH 04/18] lightnvm: pblk: check for failed mempool alloc Javier González
@ 2017-09-06 10:50 ` Javier González
  2017-09-06 10:50 ` [PATCH 06/18] lightnvm: pblk: use right flag for GC allocation Javier González
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:50 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

Initialize the stat counter for garbage collected reads.

Fixes: a4bd217b43268 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 1b0f61233c21..8459434edd89 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -944,6 +944,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
 	atomic_long_set(&pblk->recov_writes, 0);
 	atomic_long_set(&pblk->recov_writes, 0);
 	atomic_long_set(&pblk->recov_gc_writes, 0);
+	atomic_long_set(&pblk->recov_gc_reads, 0);
 #endif
 
 	atomic_long_set(&pblk->read_failed, 0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/18] lightnvm: pblk: use right flag for GC allocation
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (4 preceding siblings ...)
  2017-09-06 10:50 ` [PATCH 05/18] lightnvm: pblk: initialize debug stat counter Javier González
@ 2017-09-06 10:50 ` Javier González
  2017-09-06 10:51 ` [PATCH 07/18] lightnvm: pblk: use constant for GC parameter Javier González
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:50 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

The data buffer for the GC path allocates virtual memory through
vmalloc. When this change was introduced, a flag signaling kmalloc'ed
memory was wrongly introduced. Use the right flag when creating a bio
from this buffer.

Fixes: de54e703a422 ("lightnvm: pblk: use vmalloc for GC data buffer")
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-read.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index db9a3b575f62..9bbdca8a65ab 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -490,7 +490,7 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
 
 	data_len = (*secs_to_gc) * geo->sec_size;
 	bio = pblk_bio_map_addr(pblk, data, *secs_to_gc, data_len,
-						PBLK_KMALLOC_META, GFP_KERNEL);
+						PBLK_VMALLOC_META, GFP_KERNEL);
 	if (IS_ERR(bio)) {
 		pr_err("pblk: could not allocate GC bio (%lu)\n", PTR_ERR(bio));
 		goto err_free_dma;
@@ -510,7 +510,7 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
 	if (ret) {
 		bio_endio(bio);
 		pr_err("pblk: GC read request failed\n");
-		goto err_free_dma;
+		goto err_free_bio;
 	}
 
 	if (!wait_for_completion_io_timeout(&wait,
@@ -532,10 +532,13 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
 	atomic_long_sub(*secs_to_gc, &pblk->inflight_reads);
 #endif
 
+	bio_put(bio);
 out:
 	nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
 	return NVM_IO_OK;
 
+err_free_bio:
+	bio_put(bio);
 err_free_dma:
 	nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
 	return NVM_IO_ERR;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/18] lightnvm: pblk: use constant for GC parameter
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (5 preceding siblings ...)
  2017-09-06 10:50 ` [PATCH 06/18] lightnvm: pblk: use right flag for GC allocation Javier González
@ 2017-09-06 10:51 ` Javier González
  2017-09-06 10:51 ` [PATCH 08/18] lightnvm: pblk: check lba sanity on read path Javier González
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:51 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

Use a constant instead of using meaningless tuning parameters on the
garbage collector algorithms.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-gc.c | 4 ++--
 drivers/lightnvm/pblk.h    | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index 6090d28f7995..92694e35afde 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -93,7 +93,7 @@ static int pblk_gc_move_valid_secs(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 
 retry:
 	spin_lock(&gc->w_lock);
-	if (gc->w_entries >= PBLK_GC_W_QD) {
+	if (gc->w_entries >= PBLK_GC_RQ_QD) {
 		spin_unlock(&gc->w_lock);
 		pblk_gc_writer_kick(&pblk->gc);
 		usleep_range(128, 256);
@@ -602,7 +602,7 @@ int pblk_gc_init(struct pblk *pblk)
 	spin_lock_init(&gc->w_lock);
 	spin_lock_init(&gc->r_lock);
 
-	sema_init(&gc->gc_sem, 128);
+	sema_init(&gc->gc_sem, PBLK_GC_RQ_QD);
 
 	INIT_LIST_HEAD(&gc->w_list);
 	INIT_LIST_HEAD(&gc->r_list);
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 08a463fe855f..b7f5fa8b49d0 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -818,7 +818,7 @@ int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx,
  * pblk gc
  */
 #define PBLK_GC_MAX_READERS 8	/* Max number of outstanding GC reader jobs */
-#define PBLK_GC_W_QD 128	/* Queue depth for inflight GC write I/Os */
+#define PBLK_GC_RQ_QD 128	/* Queue depth for inflight GC requests */
 #define PBLK_GC_L_QD 4		/* Queue depth for inflight GC lines */
 #define PBLK_GC_RSV_LINE 1	/* Reserved lines for GC */
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/18] lightnvm: pblk: check lba sanity on read path
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (6 preceding siblings ...)
  2017-09-06 10:51 ` [PATCH 07/18] lightnvm: pblk: use constant for GC parameter Javier González
@ 2017-09-06 10:51 ` Javier González
  2017-09-06 10:51 ` [PATCH 09/18] lightnvm: pblk: simplify data validity check on GC Javier González
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:51 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

As part of pblk's recovery scheme, we store the lba mapped to each
physical sector on the device's out-of-bound (OOB) area. On the read
path, we can use this information to validate that the data being
delivered to the upper layers corresponds to the lba being requested.
The cost of this check is an extra copy on the DMA region on the device
and an extra comparison in the host, given that (i) the OOB area is
being read together with the data in the media, and (ii) the DMA region
allocated for the ppa list can be reused for the metadata stored on the
OOB area.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-read.c | 48 ++++++++++++++++++++++++++++++++++++++++++--
 drivers/lightnvm/pblk.h      |  2 ++
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 9bbdca8a65ab..ae440a43e09c 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -41,6 +41,7 @@ static int pblk_read_from_cache(struct pblk *pblk, struct bio *bio,
 static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
 				 sector_t blba, unsigned long *read_bitmap)
 {
+	struct pblk_sec_meta *meta_list = rqd->meta_list;
 	struct bio *bio = rqd->bio;
 	struct ppa_addr ppas[PBLK_MAX_REQ_ADDRS];
 	int nr_secs = rqd->nr_ppas;
@@ -56,6 +57,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
 retry:
 		if (pblk_ppa_empty(p)) {
 			WARN_ON(test_and_set_bit(i, read_bitmap));
+			meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
 
 			if (unlikely(!advanced_bio)) {
 				bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
@@ -75,6 +77,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
 				goto retry;
 			}
 			WARN_ON(test_and_set_bit(i, read_bitmap));
+			meta_list[i].lba = cpu_to_le64(lba);
 			advanced_bio = true;
 #ifdef CONFIG_NVM_DEBUG
 			atomic_long_inc(&pblk->cache_reads);
@@ -110,6 +113,23 @@ static int pblk_submit_read_io(struct pblk *pblk, struct nvm_rq *rqd)
 	return NVM_IO_OK;
 }
 
+static void pblk_read_check(struct pblk *pblk, struct nvm_rq *rqd,
+			   sector_t blba)
+{
+	struct pblk_sec_meta *meta_list = rqd->meta_list;
+	int nr_lbas = rqd->nr_ppas;
+	int i;
+
+	for (i = 0; i < nr_lbas; i++) {
+		u64 lba = le64_to_cpu(meta_list[i].lba);
+
+		if (lba == ADDR_EMPTY)
+			continue;
+
+		WARN(lba != blba + i, "pblk: corrupted read LBA\n");
+	}
+}
+
 static void pblk_end_io_read(struct nvm_rq *rqd)
 {
 	struct pblk *pblk = rqd->private;
@@ -124,6 +144,7 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
 		WARN_ONCE(bio->bi_status, "pblk: corrupted read error\n");
 #endif
 
+	pblk_read_check(pblk, rqd, r_ctx->lba);
 	nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
 
 	bio_put(bio);
@@ -151,15 +172,21 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 				      unsigned long *read_bitmap)
 {
 	struct bio *new_bio, *bio = rqd->bio;
+	struct pblk_sec_meta *meta_list = rqd->meta_list;
 	struct bio_vec src_bv, dst_bv;
 	void *ppa_ptr = NULL;
 	void *src_p, *dst_p;
 	dma_addr_t dma_ppa_list = 0;
+	__le64 *lba_list_mem, *lba_list_media;
 	int nr_secs = rqd->nr_ppas;
 	int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
 	int i, ret, hole;
 	DECLARE_COMPLETION_ONSTACK(wait);
 
+	/* Re-use allocated memory for intermediate lbas */
+	lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
+	lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
+
 	new_bio = bio_alloc(GFP_KERNEL, nr_holes);
 	if (!new_bio) {
 		pr_err("pblk: could not alloc read bio\n");
@@ -174,6 +201,9 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 		goto err;
 	}
 
+	for (i = 0; i < nr_secs; i++)
+		lba_list_mem[i] = meta_list[i].lba;
+
 	new_bio->bi_iter.bi_sector = 0; /* internal bio */
 	bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
 	new_bio->bi_private = &wait;
@@ -214,10 +244,17 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 		rqd->dma_ppa_list = dma_ppa_list;
 	}
 
+	for (i = 0; i < nr_secs; i++) {
+		lba_list_media[i] = meta_list[i].lba;
+		meta_list[i].lba = lba_list_mem[i];
+	}
+
 	/* Fill the holes in the original bio */
 	i = 0;
 	hole = find_first_zero_bit(read_bitmap, nr_secs);
 	do {
+		meta_list[hole].lba = lba_list_media[i];
+
 		src_bv = new_bio->bi_io_vec[i++];
 		dst_bv = bio->bi_io_vec[bio_init_idx + hole];
 
@@ -258,6 +295,7 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd,
 			 sector_t lba, unsigned long *read_bitmap)
 {
+	struct pblk_sec_meta *meta_list = rqd->meta_list;
 	struct bio *bio = rqd->bio;
 	struct ppa_addr ppa;
 
@@ -270,6 +308,7 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd,
 retry:
 	if (pblk_ppa_empty(ppa)) {
 		WARN_ON(test_and_set_bit(0, read_bitmap));
+		meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
 		return;
 	}
 
@@ -281,6 +320,9 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd,
 			pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
 			goto retry;
 		}
+
+		meta_list[0].lba = cpu_to_le64(lba);
+
 		WARN_ON(test_and_set_bit(0, read_bitmap));
 #ifdef CONFIG_NVM_DEBUG
 			atomic_long_inc(&pblk->cache_reads);
@@ -297,9 +339,10 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 	struct nvm_tgt_dev *dev = pblk->dev;
 	sector_t blba = pblk_get_lba(bio);
 	unsigned int nr_secs = pblk_get_secs(bio);
+	struct pblk_g_ctx *r_ctx;
 	struct nvm_rq *rqd;
-	unsigned long read_bitmap; /* Max 64 ppas per request */
 	unsigned int bio_init_idx;
+	unsigned long read_bitmap; /* Max 64 ppas per request */
 	int ret = NVM_IO_ERR;
 
 	/* logic error: lba out-of-bounds. Ignore read request */
@@ -316,6 +359,8 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 		pr_err_ratelimited("pblk: not able to alloc rqd");
 		return NVM_IO_ERR;
 	}
+	r_ctx = nvm_rq_to_pdu(rqd);
+	r_ctx->lba = blba;
 
 	rqd->opcode = NVM_OP_PREAD;
 	rqd->bio = bio;
@@ -355,7 +400,6 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 	/* All sectors are to be read from the device */
 	if (bitmap_empty(&read_bitmap, rqd->nr_ppas)) {
 		struct bio *int_bio = NULL;
-		struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
 
 		/* Clone read bio to deal with read errors internally */
 		int_bio = bio_clone_fast(bio, GFP_KERNEL, pblk_bio_set);
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index b7f5fa8b49d0..f43d672585b4 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -98,6 +98,7 @@ enum {
 };
 
 #define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * PBLK_MAX_REQ_ADDRS)
+#define pblk_dma_ppa_size (sizeof(u64) * PBLK_MAX_REQ_ADDRS)
 
 /* write buffer completion context */
 struct pblk_c_ctx {
@@ -112,6 +113,7 @@ struct pblk_c_ctx {
 /* generic context */
 struct pblk_g_ctx {
 	void *private;
+	u64 lba;
 };
 
 /* Pad context */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/18] lightnvm: pblk: simplify data validity check on GC
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (7 preceding siblings ...)
  2017-09-06 10:51 ` [PATCH 08/18] lightnvm: pblk: check lba sanity on read path Javier González
@ 2017-09-06 10:51 ` Javier González
  2017-09-06 10:51 ` [PATCH 10/18] lightnvm: pblk: use bio_copy_kern when possible Javier González
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:51 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

When a line is selected for recycling by the garbage collector (GC), the
line state changes and the invalid bitmap is frozen, preventing
invalidations from happening. Throughout the GC, the L2P map is checked
to verify that not data being recycled has been updated. The last check
is done before the new map is being stored on the L2P table. Though
this algorithm works, it requires a number of corner cases to be checked
each time the L2P table is being updated. This complicates readability
and is error prone in case that the recycling algorithm is modified.

Instead, this patch makes the invalid bitmap accessible even when the
line is being recycled. When recycled data is being remapped, it is
enough to check the invalid bitmap for the line before updating the L2P
table.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-cache.c | 20 +++++------
 drivers/lightnvm/pblk-core.c  | 29 +++++++---------
 drivers/lightnvm/pblk-gc.c    | 59 ++++++++++++++++++--------------
 drivers/lightnvm/pblk-rb.c    |  6 ++--
 drivers/lightnvm/pblk-read.c  | 79 +++++++++++++++++++++++--------------------
 drivers/lightnvm/pblk.h       | 23 ++++---------
 6 files changed, 109 insertions(+), 107 deletions(-)

diff --git a/drivers/lightnvm/pblk-cache.c b/drivers/lightnvm/pblk-cache.c
index 024a8fc93069..1d6b8e3585f1 100644
--- a/drivers/lightnvm/pblk-cache.c
+++ b/drivers/lightnvm/pblk-cache.c
@@ -73,12 +73,11 @@ int pblk_write_to_cache(struct pblk *pblk, struct bio *bio, unsigned long flags)
  * On GC the incoming lbas are not necessarily sequential. Also, some of the
  * lbas might not be valid entries, which are marked as empty by the GC thread
  */
-int pblk_write_gc_to_cache(struct pblk *pblk, void *data, u64 *lba_list,
-			   unsigned int nr_entries, unsigned int nr_rec_entries,
-			   struct pblk_line *gc_line, unsigned long flags)
+int pblk_write_gc_to_cache(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 {
 	struct pblk_w_ctx w_ctx;
 	unsigned int bpos, pos;
+	void *data = gc_rq->data;
 	int i, valid_entries;
 
 	/* Update the write buffer head (mem) with the entries that we can
@@ -86,28 +85,29 @@ int pblk_write_gc_to_cache(struct pblk *pblk, void *data, u64 *lba_list,
 	 * rollback from here on.
 	 */
 retry:
-	if (!pblk_rb_may_write_gc(&pblk->rwb, nr_rec_entries, &bpos)) {
+	if (!pblk_rb_may_write_gc(&pblk->rwb, gc_rq->secs_to_gc, &bpos)) {
 		io_schedule();
 		goto retry;
 	}
 
-	w_ctx.flags = flags;
+	w_ctx.flags = PBLK_IOTYPE_GC;
 	pblk_ppa_set_empty(&w_ctx.ppa);
 
-	for (i = 0, valid_entries = 0; i < nr_entries; i++) {
-		if (lba_list[i] == ADDR_EMPTY)
+	for (i = 0, valid_entries = 0; i < gc_rq->nr_secs; i++) {
+		if (gc_rq->lba_list[i] == ADDR_EMPTY)
 			continue;
 
-		w_ctx.lba = lba_list[i];
+		w_ctx.lba = gc_rq->lba_list[i];
 
 		pos = pblk_rb_wrap_pos(&pblk->rwb, bpos + valid_entries);
-		pblk_rb_write_entry_gc(&pblk->rwb, data, w_ctx, gc_line, pos);
+		pblk_rb_write_entry_gc(&pblk->rwb, data, w_ctx, gc_rq->line,
+						gc_rq->paddr_list[i], pos);
 
 		data += PBLK_EXPOSED_PAGE_SIZE;
 		valid_entries++;
 	}
 
-	WARN_ONCE(nr_rec_entries != valid_entries,
+	WARN_ONCE(gc_rq->secs_to_gc != valid_entries,
 					"pblk: inconsistent GC write\n");
 
 #ifdef CONFIG_NVM_DEBUG
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 11a684162a5d..f6b9afbe8589 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -77,11 +77,7 @@ void __pblk_map_invalidate(struct pblk *pblk, struct pblk_line *line,
 	 * that newer updates are not overwritten.
 	 */
 	spin_lock(&line->lock);
-	if (line->state == PBLK_LINESTATE_GC ||
-					line->state == PBLK_LINESTATE_FREE) {
-		spin_unlock(&line->lock);
-		return;
-	}
+	WARN_ON(line->state == PBLK_LINESTATE_FREE);
 
 	if (test_and_set_bit(paddr, line->invalid_bitmap)) {
 		WARN_ONCE(1, "pblk: double invalidate\n");
@@ -98,8 +94,7 @@ void __pblk_map_invalidate(struct pblk *pblk, struct pblk_line *line,
 		spin_lock(&l_mg->gc_lock);
 		spin_lock(&line->lock);
 		/* Prevent moving a line that has just been chosen for GC */
-		if (line->state == PBLK_LINESTATE_GC ||
-					line->state == PBLK_LINESTATE_FREE) {
+		if (line->state == PBLK_LINESTATE_GC) {
 			spin_unlock(&line->lock);
 			spin_unlock(&l_mg->gc_lock);
 			return;
@@ -1788,6 +1783,7 @@ void pblk_update_map(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
 
 void pblk_update_map_cache(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
 {
+
 #ifdef CONFIG_NVM_DEBUG
 	/* Callers must ensure that the ppa points to a cache address */
 	BUG_ON(!pblk_addr_in_cache(ppa));
@@ -1798,9 +1794,9 @@ void pblk_update_map_cache(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
 }
 
 int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa_new,
-		       struct pblk_line *gc_line)
+		       struct pblk_line *gc_line, u64 paddr_gc)
 {
-	struct ppa_addr ppa_l2p;
+	struct ppa_addr ppa_l2p, ppa_gc;
 	int ret = 1;
 
 #ifdef CONFIG_NVM_DEBUG
@@ -1817,10 +1813,13 @@ int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa_new,
 
 	spin_lock(&pblk->trans_lock);
 	ppa_l2p = pblk_trans_map_get(pblk, lba);
+	ppa_gc = addr_to_gen_ppa(pblk, paddr_gc, gc_line->id);
 
-	/* Prevent updated entries to be overwritten by GC */
-	if (pblk_addr_in_cache(ppa_l2p) || pblk_ppa_empty(ppa_l2p) ||
-				pblk_tgt_ppa_to_line(ppa_l2p) != gc_line->id) {
+	if (!pblk_ppa_comp(ppa_l2p, ppa_gc)) {
+		spin_lock(&gc_line->lock);
+		WARN(!test_bit(paddr_gc, gc_line->invalid_bitmap),
+						"pblk: corrupted GC update");
+		spin_unlock(&gc_line->lock);
 
 		ret = 0;
 		goto out;
@@ -1892,15 +1891,13 @@ void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
 void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
 			  u64 *lba_list, int nr_secs)
 {
-	sector_t lba;
+	u64 lba;
 	int i;
 
 	spin_lock(&pblk->trans_lock);
 	for (i = 0; i < nr_secs; i++) {
 		lba = lba_list[i];
-		if (lba == ADDR_EMPTY) {
-			ppas[i].ppa = ADDR_EMPTY;
-		} else {
+		if (lba != ADDR_EMPTY) {
 			/* logic error: lba out-of-bounds. Ignore update */
 			if (!(lba < pblk->rl.nr_secs)) {
 				WARN(1, "pblk: corrupted L2P map request\n");
diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index 92694e35afde..27ee1f9b19c6 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -20,7 +20,8 @@
 
 static void pblk_gc_free_gc_rq(struct pblk_gc_rq *gc_rq)
 {
-	vfree(gc_rq->data);
+	if (gc_rq->data)
+		vfree(gc_rq->data);
 	kfree(gc_rq);
 }
 
@@ -41,10 +42,7 @@ static int pblk_gc_write(struct pblk *pblk)
 	spin_unlock(&gc->w_lock);
 
 	list_for_each_entry_safe(gc_rq, tgc_rq, &w_list, list) {
-		pblk_write_gc_to_cache(pblk, gc_rq->data, gc_rq->lba_list,
-				gc_rq->nr_secs, gc_rq->secs_to_gc,
-				gc_rq->line, PBLK_IOTYPE_GC);
-
+		pblk_write_gc_to_cache(pblk, gc_rq);
 		list_del(&gc_rq->list);
 		kref_put(&gc_rq->line->ref, pblk_line_put);
 		pblk_gc_free_gc_rq(gc_rq);
@@ -69,27 +67,23 @@ static int pblk_gc_move_valid_secs(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 	struct pblk_gc *gc = &pblk->gc;
 	struct pblk_line *line = gc_rq->line;
 	void *data;
-	unsigned int secs_to_gc;
 	int ret = 0;
 
 	data = vmalloc(gc_rq->nr_secs * geo->sec_size);
 	if (!data) {
 		ret = -ENOMEM;
-		goto out;
+		goto fail;
 	}
 
-	/* Read from GC victim block */
-	if (pblk_submit_read_gc(pblk, gc_rq->lba_list, data, gc_rq->nr_secs,
-							&secs_to_gc, line)) {
-		ret = -EFAULT;
-		goto free_data;
-	}
-
-	if (!secs_to_gc)
-		goto free_rq;
-
 	gc_rq->data = data;
-	gc_rq->secs_to_gc = secs_to_gc;
+
+	/* Read from GC victim block */
+	ret = pblk_submit_read_gc(pblk, gc_rq);
+	if (ret)
+		goto fail;
+
+	if (!gc_rq->secs_to_gc)
+		goto fail;
 
 retry:
 	spin_lock(&gc->w_lock);
@@ -107,11 +101,8 @@ static int pblk_gc_move_valid_secs(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 
 	return 0;
 
-free_rq:
-	kfree(gc_rq);
-free_data:
-	vfree(data);
-out:
+fail:
+	pblk_gc_free_gc_rq(gc_rq);
 	kref_put(&line->ref, pblk_line_put);
 	return ret;
 }
@@ -167,14 +158,21 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 	struct pblk_line_ws *line_rq_ws;
 	struct pblk_gc_rq *gc_rq;
 	__le64 *lba_list;
+	unsigned long *invalid_bitmap;
 	int sec_left, nr_secs, bit;
 	int ret;
 
+	invalid_bitmap = mempool_alloc(pblk->line_meta_pool, GFP_KERNEL);
+	if (!invalid_bitmap) {
+		pr_err("pblk: could not allocate GC invalid bitmap\n");
+		goto fail_free_ws;
+	}
+
 	emeta_buf = pblk_malloc(lm->emeta_len[0], l_mg->emeta_alloc_type,
 								GFP_KERNEL);
 	if (!emeta_buf) {
 		pr_err("pblk: cannot use GC emeta\n");
-		return;
+		goto fail_free_bitmap;
 	}
 
 	ret = pblk_line_read_emeta(pblk, line, emeta_buf);
@@ -193,7 +191,11 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 		goto fail_free_emeta;
 	}
 
+	spin_lock(&line->lock);
+	bitmap_copy(invalid_bitmap, line->invalid_bitmap, lm->sec_per_line);
 	sec_left = pblk_line_vsc(line);
+	spin_unlock(&line->lock);
+
 	if (sec_left < 0) {
 		pr_err("pblk: corrupted GC line (%d)\n", line->id);
 		goto fail_free_emeta;
@@ -207,11 +209,12 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 
 	nr_secs = 0;
 	do {
-		bit = find_next_zero_bit(line->invalid_bitmap, lm->sec_per_line,
+		bit = find_next_zero_bit(invalid_bitmap, lm->sec_per_line,
 								bit + 1);
 		if (bit > line->emeta_ssec)
 			break;
 
+		gc_rq->paddr_list[nr_secs] = bit;
 		gc_rq->lba_list[nr_secs++] = le64_to_cpu(lba_list[bit]);
 	} while (nr_secs < pblk->max_write_pgs);
 
@@ -243,6 +246,7 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 
 out:
 	pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
+	mempool_free(invalid_bitmap, pblk->line_meta_pool);
 	mempool_free(line_ws, pblk->line_ws_pool);
 
 	kref_put(&line->ref, pblk_line_put);
@@ -256,8 +260,11 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 	pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
 	pblk_put_line_back(pblk, line);
 	kref_put(&line->ref, pblk_line_put);
-	mempool_free(line_ws, pblk->line_ws_pool);
 	atomic_dec(&gc->inflight_gc);
+fail_free_bitmap:
+	mempool_free(invalid_bitmap, pblk->line_meta_pool);
+fail_free_ws:
+	mempool_free(line_ws, pblk->line_ws_pool);
 
 	pr_err("pblk: Failed to GC line %d\n", line->id);
 }
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 9bc32578a766..74c768ce09ef 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -325,8 +325,8 @@ void pblk_rb_write_entry_user(struct pblk_rb *rb, void *data,
 }
 
 void pblk_rb_write_entry_gc(struct pblk_rb *rb, void *data,
-			    struct pblk_w_ctx w_ctx, struct pblk_line *gc_line,
-			    unsigned int ring_pos)
+			    struct pblk_w_ctx w_ctx, struct pblk_line *line,
+			    u64 paddr, unsigned int ring_pos)
 {
 	struct pblk *pblk = container_of(rb, struct pblk, rwb);
 	struct pblk_rb_entry *entry;
@@ -341,7 +341,7 @@ void pblk_rb_write_entry_gc(struct pblk_rb *rb, void *data,
 
 	__pblk_rb_write_entry(rb, data, w_ctx, entry);
 
-	if (!pblk_update_map_gc(pblk, w_ctx.lba, entry->cacheline, gc_line))
+	if (!pblk_update_map_gc(pblk, w_ctx.lba, entry->cacheline, line, paddr))
 		entry->w_ctx.lba = ADDR_EMPTY;
 
 	flags = w_ctx.flags | PBLK_WRITTEN_DATA;
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index ae440a43e09c..f32091503547 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -440,34 +440,40 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 
 static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
 			      struct pblk_line *line, u64 *lba_list,
-			      unsigned int nr_secs)
+			      u64 *paddr_list_gc, unsigned int nr_secs)
 {
-	struct ppa_addr ppas[PBLK_MAX_REQ_ADDRS];
+	struct ppa_addr ppa_list_l2p[PBLK_MAX_REQ_ADDRS];
+	struct ppa_addr ppa_gc;
 	int valid_secs = 0;
 	int i;
 
-	pblk_lookup_l2p_rand(pblk, ppas, lba_list, nr_secs);
+	pblk_lookup_l2p_rand(pblk, ppa_list_l2p, lba_list, nr_secs);
 
 	for (i = 0; i < nr_secs; i++) {
-		if (pblk_addr_in_cache(ppas[i]) || ppas[i].g.blk != line->id ||
-						pblk_ppa_empty(ppas[i])) {
-			lba_list[i] = ADDR_EMPTY;
+		if (lba_list[i] == ADDR_EMPTY)
+			continue;
+
+		ppa_gc = addr_to_gen_ppa(pblk, paddr_list_gc[i], line->id);
+		if (!pblk_ppa_comp(ppa_list_l2p[i], ppa_gc)) {
+			paddr_list_gc[i] = lba_list[i] = ADDR_EMPTY;
 			continue;
 		}
 
-		rqd->ppa_list[valid_secs++] = ppas[i];
+		rqd->ppa_list[valid_secs++] = ppa_list_l2p[i];
 	}
 
 #ifdef CONFIG_NVM_DEBUG
 	atomic_long_add(valid_secs, &pblk->inflight_reads);
 #endif
+
 	return valid_secs;
 }
 
 static int read_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
-		      struct pblk_line *line, sector_t lba)
+		      struct pblk_line *line, sector_t lba,
+		      u64 paddr_gc)
 {
-	struct ppa_addr ppa;
+	struct ppa_addr ppa_l2p, ppa_gc;
 	int valid_secs = 0;
 
 	if (lba == ADDR_EMPTY)
@@ -480,15 +486,14 @@ static int read_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
 	}
 
 	spin_lock(&pblk->trans_lock);
-	ppa = pblk_trans_map_get(pblk, lba);
+	ppa_l2p = pblk_trans_map_get(pblk, lba);
 	spin_unlock(&pblk->trans_lock);
 
-	/* Ignore updated values until the moment */
-	if (pblk_addr_in_cache(ppa) || ppa.g.blk != line->id ||
-							pblk_ppa_empty(ppa))
+	ppa_gc = addr_to_gen_ppa(pblk, paddr_gc, line->id);
+	if (!pblk_ppa_comp(ppa_l2p, ppa_gc))
 		goto out;
 
-	rqd->ppa_addr = ppa;
+	rqd->ppa_addr = ppa_l2p;
 	valid_secs = 1;
 
 #ifdef CONFIG_NVM_DEBUG
@@ -499,15 +504,14 @@ static int read_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
 	return valid_secs;
 }
 
-int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
-			unsigned int nr_secs, unsigned int *secs_to_gc,
-			struct pblk_line *line)
+int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct nvm_geo *geo = &dev->geo;
 	struct bio *bio;
 	struct nvm_rq rqd;
-	int ret, data_len;
+	int data_len;
+	int ret = NVM_IO_OK;
 	DECLARE_COMPLETION_ONSTACK(wait);
 
 	memset(&rqd, 0, sizeof(struct nvm_rq));
@@ -515,25 +519,29 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
 	rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
 							&rqd.dma_meta_list);
 	if (!rqd.meta_list)
-		return NVM_IO_ERR;
+		return -ENOMEM;
 
-	if (nr_secs > 1) {
+	if (gc_rq->nr_secs > 1) {
 		rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
 		rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
 
-		*secs_to_gc = read_ppalist_rq_gc(pblk, &rqd, line, lba_list,
-								nr_secs);
-		if (*secs_to_gc == 1)
+		gc_rq->secs_to_gc = read_ppalist_rq_gc(pblk, &rqd, gc_rq->line,
+							gc_rq->lba_list,
+							gc_rq->paddr_list,
+							gc_rq->nr_secs);
+		if (gc_rq->secs_to_gc == 1)
 			rqd.ppa_addr = rqd.ppa_list[0];
 	} else {
-		*secs_to_gc = read_rq_gc(pblk, &rqd, line, lba_list[0]);
+		gc_rq->secs_to_gc = read_rq_gc(pblk, &rqd, gc_rq->line,
+							gc_rq->lba_list[0],
+							gc_rq->paddr_list[0]);
 	}
 
-	if (!(*secs_to_gc))
+	if (!(gc_rq->secs_to_gc))
 		goto out;
 
-	data_len = (*secs_to_gc) * geo->sec_size;
-	bio = pblk_bio_map_addr(pblk, data, *secs_to_gc, data_len,
+	data_len = (gc_rq->secs_to_gc) * geo->sec_size;
+	bio = pblk_bio_map_addr(pblk, gc_rq->data, gc_rq->secs_to_gc, data_len,
 						PBLK_VMALLOC_META, GFP_KERNEL);
 	if (IS_ERR(bio)) {
 		pr_err("pblk: could not allocate GC bio (%lu)\n", PTR_ERR(bio));
@@ -546,13 +554,12 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
 	rqd.opcode = NVM_OP_PREAD;
 	rqd.end_io = pblk_end_io_sync;
 	rqd.private = &wait;
-	rqd.nr_ppas = *secs_to_gc;
+	rqd.nr_ppas = gc_rq->secs_to_gc;
 	rqd.flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM);
 	rqd.bio = bio;
 
-	ret = pblk_submit_read_io(pblk, &rqd);
-	if (ret) {
-		bio_endio(bio);
+	if (pblk_submit_read_io(pblk, &rqd)) {
+		ret = -EIO;
 		pr_err("pblk: GC read request failed\n");
 		goto err_free_bio;
 	}
@@ -571,19 +578,19 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
 	}
 
 #ifdef CONFIG_NVM_DEBUG
-	atomic_long_add(*secs_to_gc, &pblk->sync_reads);
-	atomic_long_add(*secs_to_gc, &pblk->recov_gc_reads);
-	atomic_long_sub(*secs_to_gc, &pblk->inflight_reads);
+	atomic_long_add(gc_rq->secs_to_gc, &pblk->sync_reads);
+	atomic_long_add(gc_rq->secs_to_gc, &pblk->recov_gc_reads);
+	atomic_long_sub(gc_rq->secs_to_gc, &pblk->inflight_reads);
 #endif
 
 	bio_put(bio);
 out:
 	nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
-	return NVM_IO_OK;
+	return ret;
 
 err_free_bio:
 	bio_put(bio);
 err_free_dma:
 	nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
-	return NVM_IO_ERR;
+	return ret;
 }
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index f43d672585b4..3771f876ce80 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -212,6 +212,7 @@ struct pblk_lun {
 struct pblk_gc_rq {
 	struct pblk_line *line;
 	void *data;
+	u64 paddr_list[PBLK_MAX_REQ_ADDRS];
 	u64 lba_list[PBLK_MAX_REQ_ADDRS];
 	int nr_secs;
 	int secs_to_gc;
@@ -662,8 +663,8 @@ int pblk_rb_may_write_gc(struct pblk_rb *rb, unsigned int nr_entries,
 void pblk_rb_write_entry_user(struct pblk_rb *rb, void *data,
 			      struct pblk_w_ctx w_ctx, unsigned int pos);
 void pblk_rb_write_entry_gc(struct pblk_rb *rb, void *data,
-			    struct pblk_w_ctx w_ctx, struct pblk_line *gc_line,
-			    unsigned int pos);
+			    struct pblk_w_ctx w_ctx, struct pblk_line *line,
+			    u64 paddr, unsigned int pos);
 struct pblk_w_ctx *pblk_rb_w_ctx(struct pblk_rb *rb, unsigned int pos);
 void pblk_rb_flush(struct pblk_rb *rb);
 
@@ -765,7 +766,7 @@ void pblk_update_map_cache(struct pblk *pblk, sector_t lba,
 void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
 			 struct ppa_addr ppa, struct ppa_addr entry_line);
 int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa,
-		       struct pblk_line *gc_line);
+		       struct pblk_line *gc_line, u64 paddr);
 void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
 			  u64 *lba_list, int nr_secs);
 void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
@@ -776,9 +777,7 @@ void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
  */
 int pblk_write_to_cache(struct pblk *pblk, struct bio *bio,
 			unsigned long flags);
-int pblk_write_gc_to_cache(struct pblk *pblk, void *data, u64 *lba_list,
-			   unsigned int nr_entries, unsigned int nr_rec_entries,
-			   struct pblk_line *gc_line, unsigned long flags);
+int pblk_write_gc_to_cache(struct pblk *pblk, struct pblk_gc_rq *gc_rq);
 
 /*
  * pblk map
@@ -802,9 +801,7 @@ void pblk_write_should_kick(struct pblk *pblk);
  */
 extern struct bio_set *pblk_bio_set;
 int pblk_submit_read(struct pblk *pblk, struct bio *bio);
-int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
-			unsigned int nr_secs, unsigned int *secs_to_gc,
-			struct pblk_line *line);
+int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq);
 /*
  * pblk recovery
  */
@@ -897,13 +894,7 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta)
 
 static inline int pblk_line_vsc(struct pblk_line *line)
 {
-	int vsc;
-
-	spin_lock(&line->lock);
-	vsc = le32_to_cpu(*line->vsc);
-	spin_unlock(&line->lock);
-
-	return vsc;
+	return le32_to_cpu(*line->vsc);
 }
 
 #define NVM_MEM_PAGE_WRITE (8)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/18] lightnvm: pblk: use bio_copy_kern when possible
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (8 preceding siblings ...)
  2017-09-06 10:51 ` [PATCH 09/18] lightnvm: pblk: simplify data validity check on GC Javier González
@ 2017-09-06 10:51 ` Javier González
  2017-09-06 13:47   ` Christoph Hellwig
  2017-09-06 10:51 ` [PATCH 11/18] lightnvm: pblk: refactor read path on GC Javier González
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 27+ messages in thread
From: Javier González @ 2017-09-06 10:51 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

In pblk, buffers forming bios can be allocated on physically contiguous
or virtually contiguous memory. For physically contiguous memory, we
already use the bio_map_kern helper funciton, however, for virtually
contiguous memory, we from the bio manually. This makes the code more
complex, specially on the completion path, where mapped pages need to be
freed.

Instead, use bio_copy_kern, which does the same and at the same time
simplifies the completion path.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-core.c     | 39 ++++-----------------------------------
 drivers/lightnvm/pblk-read.c     |  3 +--
 drivers/lightnvm/pblk-recovery.c |  3 +--
 drivers/lightnvm/pblk-write.c    |  7 +------
 drivers/lightnvm/pblk.h          |  2 +-
 5 files changed, 8 insertions(+), 46 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index f6b9afbe8589..e69e8829b093 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -423,42 +423,14 @@ int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd)
 
 struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
 			      unsigned int nr_secs, unsigned int len,
-			      int alloc_type, gfp_t gfp_mask)
+			      int alloc_type, gfp_t gfp_mask, int reading)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
-	void *kaddr = data;
-	struct page *page;
-	struct bio *bio;
-	int i, ret;
 
 	if (alloc_type == PBLK_KMALLOC_META)
-		return bio_map_kern(dev->q, kaddr, len, gfp_mask);
+		return bio_map_kern(dev->q, data, len, gfp_mask);
 
-	bio = bio_kmalloc(gfp_mask, nr_secs);
-	if (!bio)
-		return ERR_PTR(-ENOMEM);
-
-	for (i = 0; i < nr_secs; i++) {
-		page = vmalloc_to_page(kaddr);
-		if (!page) {
-			pr_err("pblk: could not map vmalloc bio\n");
-			bio_put(bio);
-			bio = ERR_PTR(-ENOMEM);
-			goto out;
-		}
-
-		ret = bio_add_pc_page(dev->q, bio, page, PAGE_SIZE, 0);
-		if (ret != PAGE_SIZE) {
-			pr_err("pblk: could not add page to bio\n");
-			bio_put(bio);
-			bio = ERR_PTR(-ENOMEM);
-			goto out;
-		}
-
-		kaddr += PAGE_SIZE;
-	}
-out:
-	return bio;
+	return bio_copy_kern(dev->q, data, len, GFP_KERNEL, reading);
 }
 
 int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
@@ -588,7 +560,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
 	rq_len = rq_ppas * geo->sec_size;
 
 	bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
-					l_mg->emeta_alloc_type, GFP_KERNEL);
+			l_mg->emeta_alloc_type, GFP_KERNEL, dir == READ);
 	if (IS_ERR(bio)) {
 		ret = PTR_ERR(bio);
 		goto free_rqd_dma;
@@ -673,9 +645,6 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
 	atomic_dec(&pblk->inflight_io);
 	reinit_completion(&wait);
 
-	if (likely(pblk->l_mg.emeta_alloc_type == PBLK_VMALLOC_META))
-		bio_put(bio);
-
 	if (rqd.error) {
 		if (dir == WRITE)
 			pblk_log_write_err(pblk, &rqd);
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index f32091503547..1be972521dcd 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -542,7 +542,7 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 
 	data_len = (gc_rq->secs_to_gc) * geo->sec_size;
 	bio = pblk_bio_map_addr(pblk, gc_rq->data, gc_rq->secs_to_gc, data_len,
-						PBLK_VMALLOC_META, GFP_KERNEL);
+					PBLK_VMALLOC_META, GFP_KERNEL, 1);
 	if (IS_ERR(bio)) {
 		pr_err("pblk: could not allocate GC bio (%lu)\n", PTR_ERR(bio));
 		goto err_free_dma;
@@ -583,7 +583,6 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 	atomic_long_sub(gc_rq->secs_to_gc, &pblk->inflight_reads);
 #endif
 
-	bio_put(bio);
 out:
 	nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
 	return ret;
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 279638309d9a..b033d4f2b446 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -342,7 +342,6 @@ static void pblk_end_io_recov(struct nvm_rq *rqd)
 
 	pblk_up_page(pblk, rqd->ppa_list, rqd->nr_ppas);
 
-	bio_put(rqd->bio);
 	nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
 	pblk_free_rqd(pblk, rqd, WRITE_INT);
 
@@ -411,7 +410,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
 	}
 
 	bio = pblk_bio_map_addr(pblk, data, rq_ppas, rq_len,
-						PBLK_VMALLOC_META, GFP_KERNEL);
+					PBLK_VMALLOC_META, GFP_KERNEL, 0);
 	if (IS_ERR(bio)) {
 		ret = PTR_ERR(bio);
 		goto fail_free_rqd;
diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index 920b93fb15d5..e5d77af1aad1 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -187,17 +187,12 @@ static void pblk_end_io_write_meta(struct nvm_rq *rqd)
 		pblk_log_write_err(pblk, rqd);
 		pr_err("pblk: metadata I/O failed. Line %d\n", line->id);
 	}
-#ifdef CONFIG_NVM_DEBUG
-	else
-		WARN_ONCE(rqd->bio->bi_status, "pblk: corrupted write error\n");
-#endif
 
 	sync = atomic_add_return(rqd->nr_ppas, &emeta->sync);
 	if (sync == emeta->nr_entries)
 		pblk_line_run_ws(pblk, line, NULL, pblk_line_close_ws,
 								pblk->close_wq);
 
-	bio_put(rqd->bio);
 	nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
 	pblk_free_rqd(pblk, rqd, WRITE_INT);
 
@@ -382,7 +377,7 @@ int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line)
 	data = ((void *)emeta->buf) + emeta->mem;
 
 	bio = pblk_bio_map_addr(pblk, data, rq_ppas, rq_len,
-					l_mg->emeta_alloc_type, GFP_KERNEL);
+				l_mg->emeta_alloc_type, GFP_KERNEL, 0);
 	if (IS_ERR(bio)) {
 		ret = PTR_ERR(bio);
 		goto fail_free_rqd;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 3771f876ce80..2304a4891f20 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -712,7 +712,7 @@ int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd);
 int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line);
 struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
 			      unsigned int nr_secs, unsigned int len,
-			      int alloc_type, gfp_t gfp_mask);
+			      int alloc_type, gfp_t gfp_mask, int reading);
 struct pblk_line *pblk_line_get(struct pblk *pblk);
 struct pblk_line *pblk_line_get_first_data(struct pblk *pblk);
 void pblk_line_replace_data(struct pblk *pblk);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/18] lightnvm: pblk: refactor read path on GC
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (9 preceding siblings ...)
  2017-09-06 10:51 ` [PATCH 10/18] lightnvm: pblk: use bio_copy_kern when possible Javier González
@ 2017-09-06 10:51 ` Javier González
  2017-09-06 10:51 ` [PATCH 12/18] lightnvm: pblk: free padded entries in write buffer Javier González
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:51 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

Simplify the part of the garbage collector where data is read from the
line being recycled and moved into an internal queue before being copied
to the memory buffer. This allows to get rid of a dedicated function,
which introduces an unnecessary dependency on the code.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-gc.c | 94 +++++++++++++++++++---------------------------
 1 file changed, 39 insertions(+), 55 deletions(-)

diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index 27ee1f9b19c6..efa86295fbd1 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -56,57 +56,6 @@ static void pblk_gc_writer_kick(struct pblk_gc *gc)
 	wake_up_process(gc->gc_writer_ts);
 }
 
-/*
- * Responsible for managing all memory related to a gc request. Also in case of
- * failure
- */
-static int pblk_gc_move_valid_secs(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
-{
-	struct nvm_tgt_dev *dev = pblk->dev;
-	struct nvm_geo *geo = &dev->geo;
-	struct pblk_gc *gc = &pblk->gc;
-	struct pblk_line *line = gc_rq->line;
-	void *data;
-	int ret = 0;
-
-	data = vmalloc(gc_rq->nr_secs * geo->sec_size);
-	if (!data) {
-		ret = -ENOMEM;
-		goto fail;
-	}
-
-	gc_rq->data = data;
-
-	/* Read from GC victim block */
-	ret = pblk_submit_read_gc(pblk, gc_rq);
-	if (ret)
-		goto fail;
-
-	if (!gc_rq->secs_to_gc)
-		goto fail;
-
-retry:
-	spin_lock(&gc->w_lock);
-	if (gc->w_entries >= PBLK_GC_RQ_QD) {
-		spin_unlock(&gc->w_lock);
-		pblk_gc_writer_kick(&pblk->gc);
-		usleep_range(128, 256);
-		goto retry;
-	}
-	gc->w_entries++;
-	list_add_tail(&gc_rq->list, &gc->w_list);
-	spin_unlock(&gc->w_lock);
-
-	pblk_gc_writer_kick(&pblk->gc);
-
-	return 0;
-
-fail:
-	pblk_gc_free_gc_rq(gc_rq);
-	kref_put(&line->ref, pblk_line_put);
-	return ret;
-}
-
 static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
 {
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
@@ -130,18 +79,53 @@ static void pblk_gc_line_ws(struct work_struct *work)
 	struct pblk_line_ws *line_rq_ws = container_of(work,
 						struct pblk_line_ws, ws);
 	struct pblk *pblk = line_rq_ws->pblk;
+	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
 	struct pblk_gc *gc = &pblk->gc;
 	struct pblk_line *line = line_rq_ws->line;
 	struct pblk_gc_rq *gc_rq = line_rq_ws->priv;
+	int ret;
 
 	up(&gc->gc_sem);
 
-	if (pblk_gc_move_valid_secs(pblk, gc_rq)) {
-		pr_err("pblk: could not GC all sectors: line:%d (%d/%d)\n",
-						line->id, *line->vsc,
-						gc_rq->nr_secs);
+	gc_rq->data = vmalloc(gc_rq->nr_secs * geo->sec_size);
+	if (!gc_rq->data) {
+		pr_err("pblk: could not GC line:%d (%d/%d)\n",
+					line->id, *line->vsc, gc_rq->nr_secs);
+		goto out;
 	}
 
+	/* Read from GC victim block */
+	ret = pblk_submit_read_gc(pblk, gc_rq);
+	if (ret) {
+		pr_err("pblk: failed GC read in line:%d (err:%d)\n",
+								line->id, ret);
+		goto out;
+	}
+
+	if (!gc_rq->secs_to_gc)
+		goto out;
+
+retry:
+	spin_lock(&gc->w_lock);
+	if (gc->w_entries >= PBLK_GC_RQ_QD) {
+		spin_unlock(&gc->w_lock);
+		pblk_gc_writer_kick(&pblk->gc);
+		usleep_range(128, 256);
+		goto retry;
+	}
+	gc->w_entries++;
+	list_add_tail(&gc_rq->list, &gc->w_list);
+	spin_unlock(&gc->w_lock);
+
+	pblk_gc_writer_kick(&pblk->gc);
+
+	mempool_free(line_rq_ws, pblk->line_ws_pool);
+	return;
+
+out:
+	pblk_gc_free_gc_rq(gc_rq);
+	kref_put(&line->ref, pblk_line_put);
 	mempool_free(line_rq_ws, pblk->line_ws_pool);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 12/18] lightnvm: pblk: free padded entries in write buffer
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (10 preceding siblings ...)
  2017-09-06 10:51 ` [PATCH 11/18] lightnvm: pblk: refactor read path on GC Javier González
@ 2017-09-06 10:51 ` Javier González
  2017-09-06 10:51 ` [PATCH 13/18] lightnvm: pblk: fix write I/O sync stat Javier González
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:51 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

When a REQ_FLUSH reaches pblk, the bio cannot be directly completed.
Instead, data on the write buffer is flushed and the bio is completed on
the completion pah. This might require some sectors to be padded in
order to guarantee a successful write.

This patch fixes a memory leak on the padded pages. A consequence of
this bad free was that internal bios not containing data (only a flush)
were not being completed.

Fixes: a4bd217b4326 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-core.c  |  1 -
 drivers/lightnvm/pblk-write.c | 14 +++++++++++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index e69e8829b093..1f746e31e379 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -187,7 +187,6 @@ void pblk_bio_free_pages(struct pblk *pblk, struct bio *bio, int off,
 
 	WARN_ON(off + nr_pages != bio->bi_vcnt);
 
-	bio_advance(bio, off * PBLK_EXPOSED_PAGE_SIZE);
 	for (i = off; i < nr_pages + off; i++) {
 		bv = bio->bi_io_vec[i];
 		mempool_free(bv.bv_page, pblk->page_pool);
diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index e5d77af1aad1..33dd90251a60 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -25,13 +25,20 @@ static unsigned long pblk_end_w_bio(struct pblk *pblk, struct nvm_rq *rqd,
 	unsigned long ret;
 	int i;
 
-	for (i = 0; i < c_ctx->nr_valid; i++) {
+	i = 0;
+	do {
 		struct pblk_w_ctx *w_ctx;
 
 		w_ctx = pblk_rb_w_ctx(&pblk->rwb, c_ctx->sentry + i);
 		while ((original_bio = bio_list_pop(&w_ctx->bios)))
 			bio_endio(original_bio);
-	}
+
+		i++;
+	} while (i < c_ctx->nr_valid);
+
+	if (c_ctx->nr_padded)
+		pblk_bio_free_pages(pblk, rqd->bio, c_ctx->nr_valid,
+							c_ctx->nr_padded);
 
 #ifdef CONFIG_NVM_DEBUG
 	atomic_long_add(c_ctx->nr_valid, &pblk->sync_writes);
@@ -516,7 +523,8 @@ static void pblk_free_write_rqd(struct pblk *pblk, struct nvm_rq *rqd)
 	struct bio *bio = rqd->bio;
 
 	if (c_ctx->nr_padded)
-		pblk_bio_free_pages(pblk, bio, rqd->nr_ppas, c_ctx->nr_padded);
+		pblk_bio_free_pages(pblk, bio, c_ctx->nr_valid,
+							c_ctx->nr_padded);
 }
 
 static int pblk_submit_write(struct pblk *pblk)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 13/18] lightnvm: pblk: fix write I/O sync stat
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (11 preceding siblings ...)
  2017-09-06 10:51 ` [PATCH 12/18] lightnvm: pblk: free padded entries in write buffer Javier González
@ 2017-09-06 10:51 ` Javier González
  2017-09-06 10:51 ` [PATCH 14/18] lightnvm: pblk: simplify path on REQ_PREFLUSH Javier González
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:51 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

Fix stat counter to collect the right number of I/Os being synced on the
completion path.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index 33dd90251a60..fb4d9c3983db 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -41,7 +41,7 @@ static unsigned long pblk_end_w_bio(struct pblk *pblk, struct nvm_rq *rqd,
 							c_ctx->nr_padded);
 
 #ifdef CONFIG_NVM_DEBUG
-	atomic_long_add(c_ctx->nr_valid, &pblk->sync_writes);
+	atomic_long_add(rqd->nr_ppas, &pblk->sync_writes);
 #endif
 
 	ret = pblk_rb_sync_advance(&pblk->rwb, c_ctx->nr_valid);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 14/18] lightnvm: pblk: simplify path on REQ_PREFLUSH
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (12 preceding siblings ...)
  2017-09-06 10:51 ` [PATCH 13/18] lightnvm: pblk: fix write I/O sync stat Javier González
@ 2017-09-06 10:51 ` Javier González
  2017-09-06 10:51 ` [PATCH 15/18] lightnvm: pblk: avoid deadlock on low LUN config Javier González
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:51 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

On REQ_PREFLUSH, directly tag the I/O context flags to signal a flush in
the write to cache path, instead of finding the correct entry context
and imposing a memory barrier. This simplifies the code and might
potentially prevent race conditions when adding functionality to the
write path.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-cache.c | 4 +++-
 drivers/lightnvm/pblk-rb.c    | 8 +-------
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/lightnvm/pblk-cache.c b/drivers/lightnvm/pblk-cache.c
index 1d6b8e3585f1..0d227ef7d1b9 100644
--- a/drivers/lightnvm/pblk-cache.c
+++ b/drivers/lightnvm/pblk-cache.c
@@ -43,8 +43,10 @@ int pblk_write_to_cache(struct pblk *pblk, struct bio *bio, unsigned long flags)
 	if (unlikely(!bio_has_data(bio)))
 		goto out;
 
+	pblk_ppa_set_empty(&w_ctx.ppa);
 	w_ctx.flags = flags;
-	pblk_ppa_set_empty(&w_ctx.ppa);
+	if (bio->bi_opf & REQ_PREFLUSH)
+		w_ctx.flags |= PBLK_FLUSH_ENTRY;
 
 	for (i = 0; i < nr_entries; i++) {
 		void *data = bio_data(bio);
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 74c768ce09ef..05e6b2e9221d 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -355,7 +355,6 @@ static int pblk_rb_sync_point_set(struct pblk_rb *rb, struct bio *bio,
 {
 	struct pblk_rb_entry *entry;
 	unsigned int subm, sync_point;
-	int flags;
 
 	subm = READ_ONCE(rb->subm);
 
@@ -369,12 +368,6 @@ static int pblk_rb_sync_point_set(struct pblk_rb *rb, struct bio *bio,
 	sync_point = (pos == 0) ? (rb->nr_entries - 1) : (pos - 1);
 	entry = &rb->entries[sync_point];
 
-	flags = READ_ONCE(entry->w_ctx.flags);
-	flags |= PBLK_FLUSH_ENTRY;
-
-	/* Release flags on context. Protect from writes */
-	smp_store_release(&entry->w_ctx.flags, flags);
-
 	/* Protect syncs */
 	smp_store_release(&rb->sync_point, sync_point);
 
@@ -454,6 +447,7 @@ static int pblk_rb_may_write_flush(struct pblk_rb *rb, unsigned int nr_entries,
 
 	/* Protect from read count */
 	smp_store_release(&rb->mem, mem);
+
 	return 1;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 15/18] lightnvm: pblk: avoid deadlock on low LUN config
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (13 preceding siblings ...)
  2017-09-06 10:51 ` [PATCH 14/18] lightnvm: pblk: simplify path on REQ_PREFLUSH Javier González
@ 2017-09-06 10:51 ` Javier González
  2017-09-06 10:51 ` [PATCH 16/18] lightnvm: pblk: enable 1 LUN configuration Javier González
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:51 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

On low LUN configurations, make sure not to send bios that are bigger
than the buffer size.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-init.c | 2 +-
 drivers/lightnvm/pblk-rl.c   | 6 ++++++
 drivers/lightnvm/pblk.h      | 2 ++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 8459434edd89..12c05aebac16 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -46,7 +46,7 @@ static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
 	 * user I/Os. Unless stalled, the rate limiter leaves at least 256KB
 	 * available for user I/O.
 	 */
-	if (unlikely(pblk_get_secs(bio) >= pblk_rl_sysfs_rate_show(&pblk->rl)))
+	if (pblk_get_secs(bio) > pblk_rl_max_io(&pblk->rl))
 		blk_queue_split(q, &bio);
 
 	return pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER);
diff --git a/drivers/lightnvm/pblk-rl.c b/drivers/lightnvm/pblk-rl.c
index 2e6a5361baf0..596bdec433c3 100644
--- a/drivers/lightnvm/pblk-rl.c
+++ b/drivers/lightnvm/pblk-rl.c
@@ -178,6 +178,11 @@ int pblk_rl_sysfs_rate_show(struct pblk_rl *rl)
 	return rl->rb_user_max;
 }
 
+int pblk_rl_max_io(struct pblk_rl *rl)
+{
+	return rl->rb_max_io;
+}
+
 static void pblk_rl_u_timer(unsigned long data)
 {
 	struct pblk_rl *rl = (struct pblk_rl *)data;
@@ -214,6 +219,7 @@ void pblk_rl_init(struct pblk_rl *rl, int budget)
 	/* To start with, all buffer is available to user I/O writers */
 	rl->rb_budget = budget;
 	rl->rb_user_max = budget;
+	rl->rb_max_io = budget >> 1;
 	rl->rb_gc_max = 0;
 	rl->rb_state = PBLK_RL_HIGH;
 
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 2304a4891f20..2ae9d94e9852 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -273,6 +273,7 @@ struct pblk_rl {
 	int rb_gc_max;		/* Max buffer entries available for GC I/O */
 	int rb_gc_rsv;		/* Reserved buffer entries for GC I/O */
 	int rb_state;		/* Rate-limiter current state */
+	int rb_max_io;		/* Maximum size for an I/O giving the config */
 
 	atomic_t rb_user_cnt;	/* User I/O buffer counter */
 	atomic_t rb_gc_cnt;	/* GC I/O buffer counter */
@@ -846,6 +847,7 @@ int pblk_rl_gc_may_insert(struct pblk_rl *rl, int nr_entries);
 void pblk_rl_gc_in(struct pblk_rl *rl, int nr_entries);
 void pblk_rl_out(struct pblk_rl *rl, int nr_user, int nr_gc);
 int pblk_rl_sysfs_rate_show(struct pblk_rl *rl);
+int pblk_rl_max_io(struct pblk_rl *rl);
 void pblk_rl_free_lines_inc(struct pblk_rl *rl, struct pblk_line *line);
 void pblk_rl_free_lines_dec(struct pblk_rl *rl, struct pblk_line *line);
 void pblk_rl_set_space_limit(struct pblk_rl *rl, int entries_left);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 16/18] lightnvm: pblk: enable 1 LUN configuration
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (14 preceding siblings ...)
  2017-09-06 10:51 ` [PATCH 15/18] lightnvm: pblk: avoid deadlock on low LUN config Javier González
@ 2017-09-06 10:51 ` Javier González
  2017-09-06 10:51 ` [PATCH 17/18] lightnvm: pblk: guarantee line integrity on reads Javier González
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:51 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

Metadata I/Os are scheduled to minimize their impact on user data I/Os.
When there are enough LUNs instantiated (i.e., enought bandwidth), it is
easy to interleave metadata and data one after the other so that
metadata I/Os are the ones being blocked and not viceversa.

We do this by calculating the distance between the I/Os in terms of the
LUNs that are not in used, and selecting a free LUN that satisfies a
the simple heuristic that medatata is scheduled behind. The per-LUN
semaphores guarantee consistency. This works fine on >1 LUN
configuration. However, when a signle LUN is instantiated, this design
leads to a deadlock, where metadata waits to be scheduled on a free LUN.

This patch implements the 1 LUN case by simply scheduling the medatada
I/O after the data I/O. In the process, we refactor the way a line is
replaced to ensure that metadata writes are submitted after data writes
in order to guarantee block sequentiality. Note that, since there is
only one LUN, both I/Os will block each other by design. However, such
configuration only pursues tight read latencies, not write bandwidth.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-core.c  | 17 ++++++++++-------
 drivers/lightnvm/pblk-init.c  |  8 ++++++--
 drivers/lightnvm/pblk-map.c   | 21 ++++++++++++---------
 drivers/lightnvm/pblk-write.c |  8 +++++---
 drivers/lightnvm/pblk.h       |  2 +-
 5 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 1f746e31e379..873b66200678 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1326,17 +1326,17 @@ void pblk_pipeline_stop(struct pblk *pblk)
 	spin_unlock(&l_mg->free_lock);
 }
 
-void pblk_line_replace_data(struct pblk *pblk)
+struct pblk_line *pblk_line_replace_data(struct pblk *pblk)
 {
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
-	struct pblk_line *cur, *new;
+	struct pblk_line *cur, *new = NULL;
 	unsigned int left_seblks;
 	int is_next = 0;
 
 	cur = l_mg->data_line;
 	new = l_mg->data_next;
 	if (!new)
-		return;
+		goto out;
 	l_mg->data_line = new;
 
 	spin_lock(&l_mg->free_lock);
@@ -1344,7 +1344,7 @@ void pblk_line_replace_data(struct pblk *pblk)
 		l_mg->data_line = NULL;
 		l_mg->data_next = NULL;
 		spin_unlock(&l_mg->free_lock);
-		return;
+		goto out;
 	}
 
 	pblk_line_setup_metadata(new, l_mg, &pblk->lm);
@@ -1356,7 +1356,7 @@ void pblk_line_replace_data(struct pblk *pblk)
 		/* If line is not fully erased, erase it */
 		if (atomic_read(&new->left_eblks)) {
 			if (pblk_line_erase(pblk, new))
-				return;
+				goto out;
 		} else {
 			io_schedule();
 		}
@@ -1367,7 +1367,7 @@ void pblk_line_replace_data(struct pblk *pblk)
 	if (!pblk_line_init_metadata(pblk, new, cur)) {
 		new = pblk_line_retry(pblk, new);
 		if (!new)
-			return;
+			goto out;
 
 		goto retry_setup;
 	}
@@ -1375,7 +1375,7 @@ void pblk_line_replace_data(struct pblk *pblk)
 	if (!pblk_line_init_bb(pblk, new, 1)) {
 		new = pblk_line_retry(pblk, new);
 		if (!new)
-			return;
+			goto out;
 
 		goto retry_setup;
 	}
@@ -1399,6 +1399,9 @@ void pblk_line_replace_data(struct pblk *pblk)
 
 	if (is_next)
 		pblk_rl_free_lines_dec(&pblk->rl, l_mg->data_next);
+
+out:
+	return new;
 }
 
 void pblk_line_free(struct pblk *pblk, struct pblk_line *line)
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 12c05aebac16..0409839cc8fc 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -714,8 +714,12 @@ static int pblk_lines_init(struct pblk *pblk)
 	}
 
 	lm->emeta_bb = geo->nr_luns - i;
-	lm->min_blk_line = 1 + DIV_ROUND_UP(lm->smeta_sec + lm->emeta_sec[0],
-							geo->sec_per_blk);
+
+	lm->min_blk_line = 1;
+	if (geo->nr_luns > 1)
+		lm->min_blk_line += DIV_ROUND_UP(lm->smeta_sec +
+					lm->emeta_sec[0], geo->sec_per_blk);
+
 	if (lm->min_blk_line > lm->blk_per_line) {
 		pr_err("pblk: config. not supported. Min. LUN in line:%d\n",
 							lm->blk_per_line);
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index fddb924f6dde..3bc4c94f9cf2 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -25,13 +25,23 @@ static void pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
 			       unsigned int valid_secs)
 {
 	struct pblk_line *line = pblk_line_get_data(pblk);
-	struct pblk_emeta *emeta = line->emeta;
+	struct pblk_emeta *emeta;
 	struct pblk_w_ctx *w_ctx;
-	__le64 *lba_list = emeta_to_lbas(pblk, emeta->buf);
+	__le64 *lba_list;
 	u64 paddr;
 	int nr_secs = pblk->min_write_pgs;
 	int i;
 
+	if (pblk_line_is_full(line)) {
+		struct pblk_line *prev_line = line;
+
+		line = pblk_line_replace_data(pblk);
+		pblk_line_close_meta(pblk, prev_line);
+	}
+
+	emeta = line->emeta;
+	lba_list = emeta_to_lbas(pblk, emeta->buf);
+
 	paddr = pblk_alloc_page(pblk, line, nr_secs);
 
 	for (i = 0; i < nr_secs; i++, paddr++) {
@@ -60,13 +70,6 @@ static void pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
 		}
 	}
 
-	if (pblk_line_is_full(line)) {
-		struct pblk_line *prev_line = line;
-
-		pblk_line_replace_data(pblk);
-		pblk_line_close_meta(pblk, prev_line);
-	}
-
 	pblk_down_rq(pblk, ppa_list, nr_secs, lun_bitmap);
 }
 
diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index fb4d9c3983db..73300c059784 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -188,7 +188,8 @@ static void pblk_end_io_write_meta(struct nvm_rq *rqd)
 	struct pblk_emeta *emeta = line->emeta;
 	int sync;
 
-	pblk_up_page(pblk, rqd->ppa_list, rqd->nr_ppas);
+	if (dev->geo.nr_luns > 1)
+		pblk_up_page(pblk, rqd->ppa_list, rqd->nr_ppas);
 
 	if (rqd->error) {
 		pblk_log_write_err(pblk, rqd);
@@ -332,7 +333,7 @@ static inline int pblk_valid_meta_ppa(struct pblk *pblk,
 	 * the distance to not be optimal, but allow metadata I/Os to succeed.
 	 */
 	ppa_opt = addr_to_gen_ppa(pblk, paddr + data_line->meta_distance, 0);
-	if (unlikely(ppa_opt.ppa == ppa.ppa)) {
+	if (geo->nr_luns > 1 && unlikely(ppa_opt.ppa == ppa.ppa)) {
 		data_line->meta_distance--;
 		return 0;
 	}
@@ -414,7 +415,8 @@ int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line)
 		spin_unlock(&l_mg->close_lock);
 	}
 
-	pblk_down_page(pblk, rqd->ppa_list, rqd->nr_ppas);
+	if (geo->nr_luns > 1)
+		pblk_down_page(pblk, rqd->ppa_list, rqd->nr_ppas);
 
 	ret = pblk_submit_io(pblk, rqd);
 	if (ret) {
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 2ae9d94e9852..f6d2e1e72057 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -716,7 +716,7 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
 			      int alloc_type, gfp_t gfp_mask, int reading);
 struct pblk_line *pblk_line_get(struct pblk *pblk);
 struct pblk_line *pblk_line_get_first_data(struct pblk *pblk);
-void pblk_line_replace_data(struct pblk *pblk);
+struct pblk_line *pblk_line_replace_data(struct pblk *pblk);
 int pblk_line_recov_alloc(struct pblk *pblk, struct pblk_line *line);
 void pblk_line_recov_close(struct pblk *pblk, struct pblk_line *line);
 struct pblk_line *pblk_line_get_data(struct pblk *pblk);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 17/18] lightnvm: pblk: guarantee line integrity on reads
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (15 preceding siblings ...)
  2017-09-06 10:51 ` [PATCH 16/18] lightnvm: pblk: enable 1 LUN configuration Javier González
@ 2017-09-06 10:51 ` Javier González
  2017-09-06 10:51 ` [PATCH 18/18] lightnvm: pblk: remove unnecessary check Javier González
  2017-09-06 14:04 ` [PATCH 00/18] lightnvm: pblk patches for 4.14 Jens Axboe
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:51 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

When a line is recycled during garbage collection, reads can still be
issued to the line. If the line is freed in the middle of this process,
data corruption might occur.

This patch guarantees that lines are not freed in the middle of reads
that target them (lines). Specifically, we use the existing line
reference to decide when a line is eligible for being freed after the
recycle process.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-core.c | 55 ++++++++++++++++++++++++++++++---
 drivers/lightnvm/pblk-init.c | 14 +++++++--
 drivers/lightnvm/pblk-read.c | 73 ++++++++++++++++++++++++++++++++------------
 drivers/lightnvm/pblk.h      |  2 ++
 4 files changed, 118 insertions(+), 26 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 873b66200678..b9f6ff164b46 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1419,10 +1419,8 @@ void pblk_line_free(struct pblk *pblk, struct pblk_line *line)
 	line->emeta = NULL;
 }
 
-void pblk_line_put(struct kref *ref)
+static void __pblk_line_put(struct pblk *pblk, struct pblk_line *line)
 {
-	struct pblk_line *line = container_of(ref, struct pblk_line, ref);
-	struct pblk *pblk = line->pblk;
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 
 	spin_lock(&line->lock);
@@ -1440,6 +1438,42 @@ void pblk_line_put(struct kref *ref)
 	pblk_rl_free_lines_inc(&pblk->rl, line);
 }
 
+static void pblk_line_put_ws(struct work_struct *work)
+{
+	struct pblk_line_ws *line_put_ws = container_of(work,
+						struct pblk_line_ws, ws);
+	struct pblk *pblk = line_put_ws->pblk;
+	struct pblk_line *line = line_put_ws->line;
+
+	__pblk_line_put(pblk, line);
+}
+
+void pblk_line_put(struct kref *ref)
+{
+	struct pblk_line *line = container_of(ref, struct pblk_line, ref);
+	struct pblk *pblk = line->pblk;
+
+	__pblk_line_put(pblk, line);
+}
+
+void pblk_line_put_wq(struct kref *ref)
+{
+	struct pblk_line *line = container_of(ref, struct pblk_line, ref);
+	struct pblk *pblk = line->pblk;
+	struct pblk_line_ws *line_put_ws;
+
+	line_put_ws = mempool_alloc(pblk->line_ws_pool, GFP_ATOMIC);
+	if (!line_put_ws)
+		return;
+
+	line_put_ws->pblk = pblk;
+	line_put_ws->line = line;
+	line_put_ws->priv = NULL;
+
+	INIT_WORK(&line_put_ws->ws, pblk_line_put_ws);
+	queue_work(pblk->r_end_wq, &line_put_ws->ws);
+}
+
 int pblk_blk_erase_async(struct pblk *pblk, struct ppa_addr ppa)
 {
 	struct nvm_rq *rqd;
@@ -1854,8 +1888,19 @@ void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
 	int i;
 
 	spin_lock(&pblk->trans_lock);
-	for (i = 0; i < nr_secs; i++)
-		ppas[i] = pblk_trans_map_get(pblk, blba + i);
+	for (i = 0; i < nr_secs; i++) {
+		struct ppa_addr ppa;
+
+		ppa = ppas[i] = pblk_trans_map_get(pblk, blba + i);
+
+		/* If the L2P entry maps to a line, the reference is valid */
+		if (!pblk_ppa_empty(ppa) && !pblk_addr_in_cache(ppa)) {
+			int line_id = pblk_dev_ppa_to_line(ppa);
+			struct pblk_line *line = &pblk->lines[line_id];
+
+			kref_get(&line->ref);
+		}
+	}
 	spin_unlock(&pblk->trans_lock);
 }
 
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 0409839cc8fc..5fe926bbbb2d 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -286,15 +286,22 @@ static int pblk_core_init(struct pblk *pblk)
 	if (!pblk->bb_wq)
 		goto free_close_wq;
 
+	pblk->r_end_wq = alloc_workqueue("pblk-read-end-wq",
+			WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
+	if (!pblk->r_end_wq)
+		goto free_bb_wq;
+
 	if (pblk_set_ppaf(pblk))
-		goto free_bb_wq;
+		goto free_r_end_wq;
 
 	if (pblk_rwb_init(pblk))
-		goto free_bb_wq;
+		goto free_r_end_wq;
 
 	INIT_LIST_HEAD(&pblk->compl_list);
 	return 0;
 
+free_r_end_wq:
+	destroy_workqueue(pblk->r_end_wq);
 free_bb_wq:
 	destroy_workqueue(pblk->bb_wq);
 free_close_wq:
@@ -319,6 +326,9 @@ static void pblk_core_free(struct pblk *pblk)
 	if (pblk->close_wq)
 		destroy_workqueue(pblk->close_wq);
 
+	if (pblk->r_end_wq)
+		destroy_workqueue(pblk->r_end_wq);
+
 	if (pblk->bb_wq)
 		destroy_workqueue(pblk->bb_wq);
 
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 1be972521dcd..f43d78e4ce78 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -130,9 +130,34 @@ static void pblk_read_check(struct pblk *pblk, struct nvm_rq *rqd,
 	}
 }
 
-static void pblk_end_io_read(struct nvm_rq *rqd)
+static void pblk_read_put_rqd_kref(struct pblk *pblk, struct nvm_rq *rqd)
+{
+	struct ppa_addr *ppa_list;
+	int i;
+
+	ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
+
+	for (i = 0; i < rqd->nr_ppas; i++) {
+		struct ppa_addr ppa = ppa_list[i];
+		struct pblk_line *line;
+
+		line = &pblk->lines[pblk_dev_ppa_to_line(ppa)];
+		kref_put(&line->ref, pblk_line_put_wq);
+	}
+}
+
+static void pblk_end_user_read(struct bio *bio)
+{
+#ifdef CONFIG_NVM_DEBUG
+	WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
+#endif
+	bio_endio(bio);
+	bio_put(bio);
+}
+
+static void __pblk_end_io_read(struct pblk *pblk, struct nvm_rq *rqd,
+			       bool put_line)
 {
-	struct pblk *pblk = rqd->private;
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
 	struct bio *bio = rqd->bio;
@@ -145,28 +170,31 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
 #endif
 
 	pblk_read_check(pblk, rqd, r_ctx->lba);
-	nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
 
 	bio_put(bio);
-	if (r_ctx->private) {
-		struct bio *orig_bio = r_ctx->private;
+	if (r_ctx->private)
+		pblk_end_user_read((struct bio *)r_ctx->private);
 
-#ifdef CONFIG_NVM_DEBUG
-		WARN_ONCE(orig_bio->bi_status, "pblk: corrupted read bio\n");
-#endif
-		bio_endio(orig_bio);
-		bio_put(orig_bio);
-	}
+	if (put_line)
+		pblk_read_put_rqd_kref(pblk, rqd);
 
 #ifdef CONFIG_NVM_DEBUG
 	atomic_long_add(rqd->nr_ppas, &pblk->sync_reads);
 	atomic_long_sub(rqd->nr_ppas, &pblk->inflight_reads);
 #endif
 
+	nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
 	pblk_free_rqd(pblk, rqd, READ);
 	atomic_dec(&pblk->inflight_io);
 }
 
+static void pblk_end_io_read(struct nvm_rq *rqd)
+{
+	struct pblk *pblk = rqd->private;
+
+	__pblk_end_io_read(pblk, rqd, true);
+}
+
 static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 				      unsigned int bio_init_idx,
 				      unsigned long *read_bitmap)
@@ -240,8 +268,12 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 	}
 
 	if (unlikely(nr_secs > 1 && nr_holes == 1)) {
+		struct ppa_addr ppa;
+
+		ppa = rqd->ppa_addr;
 		rqd->ppa_list = ppa_ptr;
 		rqd->dma_ppa_list = dma_ppa_list;
+		rqd->ppa_list[0] = ppa;
 	}
 
 	for (i = 0; i < nr_secs; i++) {
@@ -253,6 +285,11 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 	i = 0;
 	hole = find_first_zero_bit(read_bitmap, nr_secs);
 	do {
+		int line_id = pblk_dev_ppa_to_line(rqd->ppa_list[i]);
+		struct pblk_line *line = &pblk->lines[line_id];
+
+		kref_put(&line->ref, pblk_line_put);
+
 		meta_list[hole].lba = lba_list_media[i];
 
 		src_bv = new_bio->bi_io_vec[i++];
@@ -276,19 +313,17 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 	bio_put(new_bio);
 
 	/* Complete the original bio and associated request */
+	bio_endio(bio);
 	rqd->bio = bio;
 	rqd->nr_ppas = nr_secs;
-	rqd->private = pblk;
 
-	bio_endio(bio);
-	pblk_end_io_read(rqd);
+	__pblk_end_io_read(pblk, rqd, false);
 	return NVM_IO_OK;
 
 err:
 	/* Free allocated pages in new bio */
 	pblk_bio_free_pages(pblk, bio, 0, new_bio->bi_vcnt);
-	rqd->private = pblk;
-	pblk_end_io_read(rqd);
+	__pblk_end_io_read(pblk, rqd, false);
 	return NVM_IO_ERR;
 }
 
@@ -321,11 +356,11 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd,
 			goto retry;
 		}
 
+		WARN_ON(test_and_set_bit(0, read_bitmap));
 		meta_list[0].lba = cpu_to_le64(lba);
 
-		WARN_ON(test_and_set_bit(0, read_bitmap));
 #ifdef CONFIG_NVM_DEBUG
-			atomic_long_inc(&pblk->cache_reads);
+		atomic_long_inc(&pblk->cache_reads);
 #endif
 	} else {
 		rqd->ppa_addr = ppa;
@@ -393,7 +428,7 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 	if (bitmap_full(&read_bitmap, nr_secs)) {
 		bio_endio(bio);
 		atomic_inc(&pblk->inflight_io);
-		pblk_end_io_read(rqd);
+		__pblk_end_io_read(pblk, rqd, false);
 		return NVM_IO_OK;
 	}
 
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index f6d2e1e72057..db602bd4827f 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -634,6 +634,7 @@ struct pblk {
 
 	struct workqueue_struct *close_wq;
 	struct workqueue_struct *bb_wq;
+	struct workqueue_struct *r_end_wq;
 
 	struct timer_list wtimer;
 
@@ -739,6 +740,7 @@ int pblk_line_read_emeta(struct pblk *pblk, struct pblk_line *line,
 			 void *emeta_buf);
 int pblk_blk_erase_async(struct pblk *pblk, struct ppa_addr erase_ppa);
 void pblk_line_put(struct kref *ref);
+void pblk_line_put_wq(struct kref *ref);
 struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line);
 u64 pblk_lookup_page(struct pblk *pblk, struct pblk_line *line);
 void pblk_dealloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 18/18] lightnvm: pblk: remove unnecessary check
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (16 preceding siblings ...)
  2017-09-06 10:51 ` [PATCH 17/18] lightnvm: pblk: guarantee line integrity on reads Javier González
@ 2017-09-06 10:51 ` Javier González
  2017-09-06 14:04 ` [PATCH 00/18] lightnvm: pblk patches for 4.14 Jens Axboe
  18 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 10:51 UTC (permalink / raw)
  To: mb, axboe
  Cc: linux-block, linux-kernel, Javier González, Matias Bjørling

A partial read I/O in pblk is an I/O where some sectors reside in the
write buffer in main memory and some are persisted on the device. Such
an I/O must at least contain 2 lbas, therefore checking for the case
where a single lba is mapped is not necessary.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
---
 drivers/lightnvm/pblk-read.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index f43d78e4ce78..46a37d51fa4d 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -242,7 +242,7 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 	rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM);
 	rqd->end_io = NULL;
 
-	if (unlikely(nr_secs > 1 && nr_holes == 1)) {
+	if (unlikely(nr_holes == 1)) {
 		ppa_ptr = rqd->ppa_list;
 		dma_ppa_list = rqd->dma_ppa_list;
 		rqd->ppa_addr = rqd->ppa_list[0];
@@ -267,7 +267,7 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 #endif
 	}
 
-	if (unlikely(nr_secs > 1 && nr_holes == 1)) {
+	if (unlikely(nr_holes == 1)) {
 		struct ppa_addr ppa;
 
 		ppa = rqd->ppa_addr;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/18] lightnvm: pblk: improve naming for internal req.
  2017-09-06 10:50 ` [PATCH 01/18] lightnvm: pblk: improve naming for internal req Javier González
@ 2017-09-06 13:46   ` Christoph Hellwig
  2017-09-06 14:01     ` Javier González
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2017-09-06 13:46 UTC (permalink / raw)
  To: Javier González
  Cc: mb, axboe, linux-block, linux-kernel, Javier González,
	Matias Bjørling

> -#define ERASE 2 /* READ = 0, WRITE = 1 */
> +/* READ = 0, WRITE (user) = 1 */
> +#define ERASE 2
> +#define WRITE_INT 3 /* Internal write. Not through write buffer */

enum {
	PBLK_READ,
	PBLK_WRITE,
	PBLK_ERASE,
	PBLK_WRITE,
};

please.  Don't abuse and overload the messy READ/WRITE macros in any
new code.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/18] lightnvm: pblk: use bio_copy_kern when possible
  2017-09-06 10:51 ` [PATCH 10/18] lightnvm: pblk: use bio_copy_kern when possible Javier González
@ 2017-09-06 13:47   ` Christoph Hellwig
  2017-09-06 14:00     ` Javier González
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2017-09-06 13:47 UTC (permalink / raw)
  To: Javier González
  Cc: mb, axboe, linux-block, linux-kernel, Javier González,
	Matias Bjørling

On Wed, Sep 06, 2017 at 12:51:03PM +0200, Javier González wrote:
> In pblk, buffers forming bios can be allocated on physically contiguous
> or virtually contiguous memory. For physically contiguous memory, we
> already use the bio_map_kern helper funciton, however, for virtually
> contiguous memory, we from the bio manually. This makes the code more
> complex, specially on the completion path, where mapped pages need to be
> freed.
> 
> Instead, use bio_copy_kern, which does the same and at the same time
> simplifies the completion path.

Nope.  You want to loop over vmalloc_to_page and call bio_add_page
for each page, after taking care of virtually tagged caches instead
of this bounce buffering.

And you really want to allocate the request first and only then map
the data to the request, as said before.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/18] lightnvm: pblk: use bio_copy_kern when possible
  2017-09-06 13:47   ` Christoph Hellwig
@ 2017-09-06 14:00     ` Javier González
  2017-09-07 11:08       ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Javier González @ 2017-09-06 14:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matias Bjørling, Jens Axboe, linux-block,
	Linux Kernel Mailing List, Matias Bjørling

[-- Attachment #1: Type: text/plain, Size: 1559 bytes --]

> On 6 Sep 2017, at 15.47, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Wed, Sep 06, 2017 at 12:51:03PM +0200, Javier González wrote:
>> In pblk, buffers forming bios can be allocated on physically contiguous
>> or virtually contiguous memory. For physically contiguous memory, we
>> already use the bio_map_kern helper funciton, however, for virtually
>> contiguous memory, we from the bio manually. This makes the code more
>> complex, specially on the completion path, where mapped pages need to be
>> freed.
>> 
>> Instead, use bio_copy_kern, which does the same and at the same time
>> simplifies the completion path.
> 
> Nope.  You want to loop over vmalloc_to_page and call bio_add_page
> for each page,

Yes. This is basically what I did before.

> after taking care of virtually tagged caches instead
> of this bounce buffering.

And thus I considered bio_copy_kern to be a better solution, since it
will through time take care of doing the vmalloc_to_page correctly for
all cases.

> 
> And you really want to allocate the request first and only then map
> the data to the request, as said before.

Ok. So this would mean that targets (e.g., pblk) deal with struct
request instead of only dealing with bios and then letting the LightNVM
core transforming bios to requests. This way we can directly map to the
request. Is this what you mean?

Just out of curiosity, why is forming the bio trough bio_copy_kern (or
manually doing the same) and then transforming to a request incorrect /
worse?

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/18] lightnvm: pblk: improve naming for internal req.
  2017-09-06 13:46   ` Christoph Hellwig
@ 2017-09-06 14:01     ` Javier González
  0 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 14:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matias Bjørling, Jens Axboe, linux-block, linux-kernel,
	Matias Bjørling

[-- Attachment #1: Type: text/plain, Size: 437 bytes --]

> On 6 Sep 2017, at 15.46, Christoph Hellwig <hch@infradead.org> wrote:
> 
>> -#define ERASE 2 /* READ = 0, WRITE = 1 */
>> +/* READ = 0, WRITE (user) = 1 */
>> +#define ERASE 2
>> +#define WRITE_INT 3 /* Internal write. Not through write buffer */
> 
> enum {
> 	PBLK_READ,
> 	PBLK_WRITE,
> 	PBLK_ERASE,
> 	PBLK_WRITE,
> };
> 
> please.  Don't abuse and overload the messy READ/WRITE macros in any
> new code.

Ok. I'll resend.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/18] lightnvm: pblk patches for 4.14
  2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
                   ` (17 preceding siblings ...)
  2017-09-06 10:51 ` [PATCH 18/18] lightnvm: pblk: remove unnecessary check Javier González
@ 2017-09-06 14:04 ` Jens Axboe
  2017-09-06 14:10   ` Javier González
  18 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2017-09-06 14:04 UTC (permalink / raw)
  To: Javier González, mb; +Cc: linux-block, linux-kernel, Javier González

On 09/06/2017 04:50 AM, Javier González wrote:
> Hi Jens,
> 
> Here is the pblk patchset for this window.

You must mean for the next window, surely, since we're in the middle
of the current merge window? Have these patches been posted for
review before? This should have been done (at least) two weeks ago!

Get rid of anything that isn't a bug fix, I don't want to take
anything that looks like a feature or refactoring of code.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/18] lightnvm: pblk patches for 4.14
  2017-09-06 14:04 ` [PATCH 00/18] lightnvm: pblk patches for 4.14 Jens Axboe
@ 2017-09-06 14:10   ` Javier González
  0 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-06 14:10 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Matias Bjørling, linux-block, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 603 bytes --]

> On 6 Sep 2017, at 16.04, Jens Axboe <axboe@kernel.dk> wrote:
> 
> On 09/06/2017 04:50 AM, Javier González wrote:
>> Hi Jens,
>> 
>> Here is the pblk patchset for this window.
> 
> You must mean for the next window, surely, since we're in the middle
> of the current merge window? Have these patches been posted for
> review before? This should have been done (at least) two weeks ago!
> 
> Get rid of anything that isn't a bug fix, I don't want to take
> anything that looks like a feature or refactoring of code.
> 

Ok. I'll re-post only bug fixes.

> --
> Jens Axboe

Javier.

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/18] lightnvm: pblk: use bio_copy_kern when possible
  2017-09-06 14:00     ` Javier González
@ 2017-09-07 11:08       ` Christoph Hellwig
  2017-09-07 11:20         ` Javier González
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2017-09-07 11:08 UTC (permalink / raw)
  To: Javier González
  Cc: Christoph Hellwig, Matias Bjørling, Jens Axboe, linux-block,
	Linux Kernel Mailing List, Matias Bjørling

On Wed, Sep 06, 2017 at 04:00:56PM +0200, Javier González wrote:
> > Nope.  You want to loop over vmalloc_to_page and call bio_add_page
> > for each page,
> 
> Yes. This is basically what I did before.
> 
> > after taking care of virtually tagged caches instead
> > of this bounce buffering.
> 
> And thus I considered bio_copy_kern to be a better solution, since it
> will through time take care of doing the vmalloc_to_page correctly for
> all cases.

bio_copy_kern copies all the data, so it is generally not a good
idea.  The cache flushing isn't too hard - take a look at the XFS
buffer cache for an existing version.

It would be good to just to do the right thing inside bio_map_kern
for that so that callers don't need to care if it is vmalloced or
not.

> Ok. So this would mean that targets (e.g., pblk) deal with struct
> request instead of only dealing with bios and then letting the LightNVM
> core transforming bios to requests. This way we can directly map to the
> request. Is this what you mean?

Yes.

> Just out of curiosity, why is forming the bio trough bio_copy_kern (or
> manually doing the same) and then transforming to a request incorrect /
> worse?

Because you expose yourself to the details of mapping a bio to request.
We had to export blk_init_request_from_bio just for lightnvm to do this,
and it also has to do weird other bits about requests.  If you go
through blk_rq_map_* the block layer takes care of all that for you.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/18] lightnvm: pblk: use bio_copy_kern when possible
  2017-09-07 11:08       ` Christoph Hellwig
@ 2017-09-07 11:20         ` Javier González
  0 siblings, 0 replies; 27+ messages in thread
From: Javier González @ 2017-09-07 11:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matias Bjørling, Jens Axboe, linux-block,
	Linux Kernel Mailing List, Matias Bjørling

[-- Attachment #1: Type: text/plain, Size: 1810 bytes --]

> On 7 Sep 2017, at 13.08, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Wed, Sep 06, 2017 at 04:00:56PM +0200, Javier González wrote:
>>> Nope.  You want to loop over vmalloc_to_page and call bio_add_page
>>> for each page,
>> 
>> Yes. This is basically what I did before.
>> 
>>> after taking care of virtually tagged caches instead
>>> of this bounce buffering.
>> 
>> And thus I considered bio_copy_kern to be a better solution, since it
>> will through time take care of doing the vmalloc_to_page correctly for
>> all cases.
> 
> bio_copy_kern copies all the data, so it is generally not a good
> idea.  The cache flushing isn't too hard - take a look at the XFS
> buffer cache for an existing version.
> 
> It would be good to just to do the right thing inside bio_map_kern
> for that so that callers don't need to care if it is vmalloced or
> not.

Yes. That would help. I know md also needs to manually add pages on
vmalloced memory. Probably other do too.

> 
>> Ok. So this would mean that targets (e.g., pblk) deal with struct
>> request instead of only dealing with bios and then letting the LightNVM
>> core transforming bios to requests. This way we can directly map to the
>> request. Is this what you mean?
> 
> Yes.
> 
>> Just out of curiosity, why is forming the bio trough bio_copy_kern (or
>> manually doing the same) and then transforming to a request incorrect /
>> worse?
> 
> Because you expose yourself to the details of mapping a bio to request.
> We had to export blk_init_request_from_bio just for lightnvm to do this,
> and it also has to do weird other bits about requests.  If you go
> through blk_rq_map_* the block layer takes care of all that for you.

Ok. It makes sense. I'll talk to Matias about it.


Thanks!
Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-09-07 11:20 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-06 10:50 [PATCH 00/18] lightnvm: pblk patches for 4.14 Javier González
2017-09-06 10:50 ` [PATCH 01/18] lightnvm: pblk: improve naming for internal req Javier González
2017-09-06 13:46   ` Christoph Hellwig
2017-09-06 14:01     ` Javier González
2017-09-06 10:50 ` [PATCH 02/18] lightnvm: pblk: refactor read lba sanity check Javier González
2017-09-06 10:50 ` [PATCH 03/18] lightnvm: pblk: normalize ppa namings Javier González
2017-09-06 10:50 ` [PATCH 04/18] lightnvm: pblk: check for failed mempool alloc Javier González
2017-09-06 10:50 ` [PATCH 05/18] lightnvm: pblk: initialize debug stat counter Javier González
2017-09-06 10:50 ` [PATCH 06/18] lightnvm: pblk: use right flag for GC allocation Javier González
2017-09-06 10:51 ` [PATCH 07/18] lightnvm: pblk: use constant for GC parameter Javier González
2017-09-06 10:51 ` [PATCH 08/18] lightnvm: pblk: check lba sanity on read path Javier González
2017-09-06 10:51 ` [PATCH 09/18] lightnvm: pblk: simplify data validity check on GC Javier González
2017-09-06 10:51 ` [PATCH 10/18] lightnvm: pblk: use bio_copy_kern when possible Javier González
2017-09-06 13:47   ` Christoph Hellwig
2017-09-06 14:00     ` Javier González
2017-09-07 11:08       ` Christoph Hellwig
2017-09-07 11:20         ` Javier González
2017-09-06 10:51 ` [PATCH 11/18] lightnvm: pblk: refactor read path on GC Javier González
2017-09-06 10:51 ` [PATCH 12/18] lightnvm: pblk: free padded entries in write buffer Javier González
2017-09-06 10:51 ` [PATCH 13/18] lightnvm: pblk: fix write I/O sync stat Javier González
2017-09-06 10:51 ` [PATCH 14/18] lightnvm: pblk: simplify path on REQ_PREFLUSH Javier González
2017-09-06 10:51 ` [PATCH 15/18] lightnvm: pblk: avoid deadlock on low LUN config Javier González
2017-09-06 10:51 ` [PATCH 16/18] lightnvm: pblk: enable 1 LUN configuration Javier González
2017-09-06 10:51 ` [PATCH 17/18] lightnvm: pblk: guarantee line integrity on reads Javier González
2017-09-06 10:51 ` [PATCH 18/18] lightnvm: pblk: remove unnecessary check Javier González
2017-09-06 14:04 ` [PATCH 00/18] lightnvm: pblk patches for 4.14 Jens Axboe
2017-09-06 14:10   ` Javier González

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).