All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL 00/20] lightnvm updates for 4.18
@ 2018-05-28  8:58 Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 01/20] lightnvm: pblk: fail gracefully on line alloc. failure Matias Bjørling
                   ` (20 more replies)
  0 siblings, 21 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Matias Bjørling

Hi Jens,

Please pick up the following patches.

 - Hans reworked the write error recovery path in pblk.
 - Igor added extra error handling for lines, and fixed a bug in the
   pblk ringbuffer during GC.
 - Javier refactored the pblk code a bit, added extra error
   handling, and added checks to verify that data returned from drive
   is appropriate.
 - Marcin added some extra logic to manage the write buffer. Now
   MW_CUNITS can be zero and the size of write buffer can be changed
   at module load time.

Thanks,
Matias

Hans Holmberg (3):
  lightnvm: pblk: rework write error recovery path
  lightnvm: pblk: garbage collect lines with failed writes
  lightnvm: pblk: fix smeta write error path

Igor Konopko (4):
  lightnvm: proper error handling for pblk_bio_add_pages
  lightnvm: error handling when whole line is bad
  lightnvm: fix partial read error path
  lightnvm: pblk: sync RB and RL states during GC

Javier González (11):
  lightnvm: pblk: fail gracefully on line alloc. failure
  lightnvm: pblk: recheck for bad lines at runtime
  lightnvm: pblk: check read lba on gc path
  lightnvn: pblk: improve error msg on corrupted LBAs
  lightnvm: pblk: warn in case of corrupted write buffer
  lightnvm: pblk: return NVM_ error on failed submission
  lightnvm: pblk: remove unnecessary indirection
  lightnvm: pblk: remove unnecessary argument
  lightnvm: pblk: check for chunk size before allocating it
  lightnvn: pass flag on graceful teardown to targets
  lightnvm: pblk: remove dead function

Marcin Dziegielewski (2):
  lightnvm: pblk: handle case when mw_cunits equals to 0
  lightnvm: pblk: add possibility to set write buffer size manually

 drivers/lightnvm/core.c          |  10 +-
 drivers/lightnvm/pblk-core.c     | 149 +++++++++++++++------
 drivers/lightnvm/pblk-gc.c       | 112 ++++++++++------
 drivers/lightnvm/pblk-init.c     | 112 +++++++++++-----
 drivers/lightnvm/pblk-map.c      |  33 +++--
 drivers/lightnvm/pblk-rb.c       |  51 +------
 drivers/lightnvm/pblk-read.c     |  83 +++++++++---
 drivers/lightnvm/pblk-recovery.c |  91 -------------
 drivers/lightnvm/pblk-rl.c       |  29 +++-
 drivers/lightnvm/pblk-sysfs.c    |  15 ++-
 drivers/lightnvm/pblk-write.c    | 281 +++++++++++++++++++++++++--------------
 drivers/lightnvm/pblk.h          |  43 +++---
 drivers/nvme/host/lightnvm.c     |   1 -
 include/linux/lightnvm.h         |   2 +-
 14 files changed, 604 insertions(+), 408 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [GIT PULL 01/20] lightnvm: pblk: fail gracefully on line alloc. failure
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 02/20] lightnvm: pblk: recheck for bad lines at runtime Matias Bjørling
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

In the event of a line failing to allocate, fail gracefully and stop the
pipeline to avoid more write failing in the same place.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c |  5 +++++
 drivers/lightnvm/pblk-map.c  | 33 ++++++++++++++++++++++++---------
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 91a5bc2556a3..dee64f91227d 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1047,6 +1047,11 @@ static int pblk_lines_init(struct pblk *pblk)
 		nr_free_chks += pblk_setup_line_meta(pblk, line, chunk_meta, i);
 	}
 
+	if (!nr_free_chks) {
+		pr_err("pblk: too many bad blocks prevent for sane instance\n");
+		return -EINTR;
+	}
+
 	pblk_set_provision(pblk, nr_free_chks);
 
 	kfree(chunk_meta);
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 20dbaa89c9df..953ca31dda68 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -18,11 +18,11 @@
 
 #include "pblk.h"
 
-static void pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
-			       struct ppa_addr *ppa_list,
-			       unsigned long *lun_bitmap,
-			       struct pblk_sec_meta *meta_list,
-			       unsigned int valid_secs)
+static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
+			      struct ppa_addr *ppa_list,
+			      unsigned long *lun_bitmap,
+			      struct pblk_sec_meta *meta_list,
+			      unsigned int valid_secs)
 {
 	struct pblk_line *line = pblk_line_get_data(pblk);
 	struct pblk_emeta *emeta;
@@ -35,8 +35,14 @@ static void pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
 	if (pblk_line_is_full(line)) {
 		struct pblk_line *prev_line = line;
 
+		/* If we cannot allocate a new line, make sure to store metadata
+		 * on current line and then fail
+		 */
 		line = pblk_line_replace_data(pblk);
 		pblk_line_close_meta(pblk, prev_line);
+
+		if (!line)
+			return -EINTR;
 	}
 
 	emeta = line->emeta;
@@ -74,6 +80,7 @@ static void pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
 	}
 
 	pblk_down_rq(pblk, ppa_list, nr_secs, lun_bitmap);
+	return 0;
 }
 
 void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
@@ -87,8 +94,12 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
 
 	for (i = off; i < rqd->nr_ppas; i += min) {
 		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
-		pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
-					lun_bitmap, &meta_list[i], map_secs);
+		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
+					lun_bitmap, &meta_list[i], map_secs)) {
+			bio_put(rqd->bio);
+			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
+			pblk_pipeline_stop(pblk);
+		}
 	}
 }
 
@@ -108,8 +119,12 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
 
 	for (i = 0; i < rqd->nr_ppas; i += min) {
 		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
-		pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
-					lun_bitmap, &meta_list[i], map_secs);
+		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
+					lun_bitmap, &meta_list[i], map_secs)) {
+			bio_put(rqd->bio);
+			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
+			pblk_pipeline_stop(pblk);
+		}
 
 		erase_lun = pblk_ppa_to_pos(geo, rqd->ppa_list[i]);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 02/20] lightnvm: pblk: recheck for bad lines at runtime
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 01/20] lightnvm: pblk: fail gracefully on line alloc. failure Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 03/20] lightnvm: pblk: check read lba on gc path Matias Bjørling
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Bad blocks can grow at runtime. Check that the number of valid blocks in
a line are within the sanity threshold before allocating the line for
new writes.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 38 ++++++++++++++++++++++++++++----------
 drivers/lightnvm/pblk-init.c | 11 +++++++----
 2 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 94d5d97c9d8a..2cad918434a7 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1174,7 +1174,8 @@ static int pblk_prepare_new_line(struct pblk *pblk, struct pblk_line *line)
 static int pblk_line_prepare(struct pblk *pblk, struct pblk_line *line)
 {
 	struct pblk_line_meta *lm = &pblk->lm;
-	int blk_to_erase;
+	int blk_in_line = atomic_read(&line->blk_in_line);
+	int blk_to_erase, ret;
 
 	line->map_bitmap = kzalloc(lm->sec_bitmap_len, GFP_ATOMIC);
 	if (!line->map_bitmap)
@@ -1183,8 +1184,8 @@ static int pblk_line_prepare(struct pblk *pblk, struct pblk_line *line)
 	/* will be initialized using bb info from map_bitmap */
 	line->invalid_bitmap = kmalloc(lm->sec_bitmap_len, GFP_ATOMIC);
 	if (!line->invalid_bitmap) {
-		kfree(line->map_bitmap);
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto fail_free_map_bitmap;
 	}
 
 	/* Bad blocks do not need to be erased */
@@ -1199,16 +1200,19 @@ static int pblk_line_prepare(struct pblk *pblk, struct pblk_line *line)
 		blk_to_erase = pblk_prepare_new_line(pblk, line);
 		line->state = PBLK_LINESTATE_FREE;
 	} else {
-		blk_to_erase = atomic_read(&line->blk_in_line);
+		blk_to_erase = blk_in_line;
+	}
+
+	if (blk_in_line < lm->min_blk_line) {
+		ret = -EAGAIN;
+		goto fail_free_invalid_bitmap;
 	}
 
 	if (line->state != PBLK_LINESTATE_FREE) {
-		kfree(line->map_bitmap);
-		kfree(line->invalid_bitmap);
-		spin_unlock(&line->lock);
 		WARN(1, "pblk: corrupted line %d, state %d\n",
 							line->id, line->state);
-		return -EAGAIN;
+		ret = -EINTR;
+		goto fail_free_invalid_bitmap;
 	}
 
 	line->state = PBLK_LINESTATE_OPEN;
@@ -1222,6 +1226,16 @@ static int pblk_line_prepare(struct pblk *pblk, struct pblk_line *line)
 	kref_init(&line->ref);
 
 	return 0;
+
+fail_free_invalid_bitmap:
+	spin_unlock(&line->lock);
+	kfree(line->invalid_bitmap);
+	line->invalid_bitmap = NULL;
+fail_free_map_bitmap:
+	kfree(line->map_bitmap);
+	line->map_bitmap = NULL;
+
+	return ret;
 }
 
 int pblk_line_recov_alloc(struct pblk *pblk, struct pblk_line *line)
@@ -1292,10 +1306,14 @@ struct pblk_line *pblk_line_get(struct pblk *pblk)
 
 	ret = pblk_line_prepare(pblk, line);
 	if (ret) {
-		if (ret == -EAGAIN) {
+		switch (ret) {
+		case -EAGAIN:
+			list_add(&line->list, &l_mg->bad_list);
+			goto retry;
+		case -EINTR:
 			list_add(&line->list, &l_mg->corrupt_list);
 			goto retry;
-		} else {
+		default:
 			pr_err("pblk: failed to prepare line %d\n", line->id);
 			list_add(&line->list, &l_mg->free_list);
 			l_mg->nr_free_lines++;
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index dee64f91227d..8f8c9abd14fc 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -127,10 +127,8 @@ static int pblk_l2p_recover(struct pblk *pblk, bool factory_init)
 	if (!line) {
 		/* Configure next line for user data */
 		line = pblk_line_get_first_data(pblk);
-		if (!line) {
-			pr_err("pblk: line list corrupted\n");
+		if (!line)
 			return -EFAULT;
-		}
 	}
 
 	return 0;
@@ -141,6 +139,7 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
 	sector_t i;
 	struct ppa_addr ppa;
 	size_t map_size;
+	int ret = 0;
 
 	map_size = pblk_trans_map_size(pblk);
 	pblk->trans_map = vmalloc(map_size);
@@ -152,7 +151,11 @@ static int pblk_l2p_init(struct pblk *pblk, bool factory_init)
 	for (i = 0; i < pblk->rl.nr_secs; i++)
 		pblk_trans_map_set(pblk, i, ppa);
 
-	return pblk_l2p_recover(pblk, factory_init);
+	ret = pblk_l2p_recover(pblk, factory_init);
+	if (ret)
+		vfree(pblk->trans_map);
+
+	return ret;
 }
 
 static void pblk_rwb_free(struct pblk *pblk)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 03/20] lightnvm: pblk: check read lba on gc path
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 01/20] lightnvm: pblk: fail gracefully on line alloc. failure Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 02/20] lightnvm: pblk: recheck for bad lines at runtime Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 04/20] lightnvm: pblk: improve error msg on corrupted LBAs Matias Bjørling
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Check that the lba stored in the LBA metadata is correct in the GC path
too. This requires a new helper function to check random reads in the
vector read.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-read.c | 39 +++++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 9eee10f69df0..1f699c09e0ea 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -113,15 +113,14 @@ static int pblk_submit_read_io(struct pblk *pblk, struct nvm_rq *rqd)
 	return NVM_IO_OK;
 }
 
-static void pblk_read_check(struct pblk *pblk, struct nvm_rq *rqd,
-			   sector_t blba)
+static void pblk_read_check_seq(struct pblk *pblk, void *meta_list,
+				sector_t blba, int nr_lbas)
 {
-	struct pblk_sec_meta *meta_list = rqd->meta_list;
-	int nr_lbas = rqd->nr_ppas;
+	struct pblk_sec_meta *meta_lba_list = meta_list;
 	int i;
 
 	for (i = 0; i < nr_lbas; i++) {
-		u64 lba = le64_to_cpu(meta_list[i].lba);
+		u64 lba = le64_to_cpu(meta_lba_list[i].lba);
 
 		if (lba == ADDR_EMPTY)
 			continue;
@@ -130,6 +129,32 @@ static void pblk_read_check(struct pblk *pblk, struct nvm_rq *rqd,
 	}
 }
 
+/*
+ * There can be holes in the lba list.
+ */
+static void pblk_read_check_rand(struct pblk *pblk, void *meta_list,
+				u64 *lba_list, int nr_lbas)
+{
+	struct pblk_sec_meta *meta_lba_list = meta_list;
+	int i, j;
+
+	for (i = 0, j = 0; i < nr_lbas; i++) {
+		u64 lba = lba_list[i];
+		u64 meta_lba;
+
+		if (lba == ADDR_EMPTY)
+			continue;
+
+		meta_lba = le64_to_cpu(meta_lba_list[j++].lba);
+
+		if (lba != meta_lba) {
+			pr_err("pblk: corrupted read LBA (%llu/%llu)\n",
+								lba, meta_lba);
+			WARN_ON(1);
+		}
+	}
+}
+
 static void pblk_read_put_rqd_kref(struct pblk *pblk, struct nvm_rq *rqd)
 {
 	struct ppa_addr *ppa_list;
@@ -172,7 +197,7 @@ static void __pblk_end_io_read(struct pblk *pblk, struct nvm_rq *rqd,
 		WARN_ONCE(bio->bi_status, "pblk: corrupted read error\n");
 #endif
 
-	pblk_read_check(pblk, rqd, r_ctx->lba);
+	pblk_read_check_seq(pblk, rqd->meta_list, r_ctx->lba, rqd->nr_ppas);
 
 	bio_put(bio);
 	if (r_ctx->private)
@@ -585,6 +610,8 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 		goto err_free_bio;
 	}
 
+	pblk_read_check_rand(pblk, rqd.meta_list, gc_rq->lba_list, rqd.nr_ppas);
+
 	atomic_dec(&pblk->inflight_io);
 
 	if (rqd.error) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 04/20] lightnvm: pblk: improve error msg on corrupted LBAs
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (2 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 03/20] lightnvm: pblk: check read lba on gc path Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 05/20] lightnvm: pblk: warn in case of corrupted write buffer Matias Bjørling
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

In the event of a mismatch between the read LBA and the metadata pointer
reported by the device, improve the error message to be able to detect
the offending physical address (PPA) mapped to the corrupted LBA.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-read.c | 42 ++++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 1f699c09e0ea..b201fc486adb 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -113,10 +113,11 @@ static int pblk_submit_read_io(struct pblk *pblk, struct nvm_rq *rqd)
 	return NVM_IO_OK;
 }
 
-static void pblk_read_check_seq(struct pblk *pblk, void *meta_list,
-				sector_t blba, int nr_lbas)
+static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
+				sector_t blba)
 {
-	struct pblk_sec_meta *meta_lba_list = meta_list;
+	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
+	int nr_lbas = rqd->nr_ppas;
 	int i;
 
 	for (i = 0; i < nr_lbas; i++) {
@@ -125,17 +126,27 @@ static void pblk_read_check_seq(struct pblk *pblk, void *meta_list,
 		if (lba == ADDR_EMPTY)
 			continue;
 
-		WARN(lba != blba + i, "pblk: corrupted read LBA\n");
+		if (lba != blba + i) {
+#ifdef CONFIG_NVM_DEBUG
+			struct ppa_addr *p;
+
+			p = (nr_lbas == 1) ? &rqd->ppa_list[i] : &rqd->ppa_addr;
+			print_ppa(&pblk->dev->geo, p, "seq", i);
+#endif
+			pr_err("pblk: corrupted read LBA (%llu/%llu)\n",
+							lba, (u64)blba + i);
+			WARN_ON(1);
+		}
 	}
 }
 
 /*
  * There can be holes in the lba list.
  */
-static void pblk_read_check_rand(struct pblk *pblk, void *meta_list,
-				u64 *lba_list, int nr_lbas)
+static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
+				 u64 *lba_list, int nr_lbas)
 {
-	struct pblk_sec_meta *meta_lba_list = meta_list;
+	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
 	int i, j;
 
 	for (i = 0, j = 0; i < nr_lbas; i++) {
@@ -145,14 +156,25 @@ static void pblk_read_check_rand(struct pblk *pblk, void *meta_list,
 		if (lba == ADDR_EMPTY)
 			continue;
 
-		meta_lba = le64_to_cpu(meta_lba_list[j++].lba);
+		meta_lba = le64_to_cpu(meta_lba_list[j].lba);
 
 		if (lba != meta_lba) {
+#ifdef CONFIG_NVM_DEBUG
+			struct ppa_addr *p;
+			int nr_ppas = rqd->nr_ppas;
+
+			p = (nr_ppas == 1) ? &rqd->ppa_list[j] : &rqd->ppa_addr;
+			print_ppa(&pblk->dev->geo, p, "seq", j);
+#endif
 			pr_err("pblk: corrupted read LBA (%llu/%llu)\n",
 								lba, meta_lba);
 			WARN_ON(1);
 		}
+
+		j++;
 	}
+
+	WARN_ONCE(j != rqd->nr_ppas, "pblk: corrupted random request\n");
 }
 
 static void pblk_read_put_rqd_kref(struct pblk *pblk, struct nvm_rq *rqd)
@@ -197,7 +219,7 @@ static void __pblk_end_io_read(struct pblk *pblk, struct nvm_rq *rqd,
 		WARN_ONCE(bio->bi_status, "pblk: corrupted read error\n");
 #endif
 
-	pblk_read_check_seq(pblk, rqd->meta_list, r_ctx->lba, rqd->nr_ppas);
+	pblk_read_check_seq(pblk, rqd, r_ctx->lba);
 
 	bio_put(bio);
 	if (r_ctx->private)
@@ -610,7 +632,7 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 		goto err_free_bio;
 	}
 
-	pblk_read_check_rand(pblk, rqd.meta_list, gc_rq->lba_list, rqd.nr_ppas);
+	pblk_read_check_rand(pblk, &rqd, gc_rq->lba_list, gc_rq->nr_secs);
 
 	atomic_dec(&pblk->inflight_io);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 05/20] lightnvm: pblk: warn in case of corrupted write buffer
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (3 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 04/20] lightnvm: pblk: improve error msg on corrupted LBAs Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 06/20] lightnvm: pblk: return NVM_ error on failed submission Matias Bjørling
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

When cleaning up buffer entries as we wrap up, their state should be
"completed". If any of the entries is in "submitted" state, it means
that something bad has happened. Trigger a warning immediately instead of
waiting for the state flag to eventually be updated, thus hiding the
issue.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-rb.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 52fdd85dbc97..58946ffebe81 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -142,10 +142,9 @@ static void clean_wctx(struct pblk_w_ctx *w_ctx)
 {
 	int flags;
 
-try:
 	flags = READ_ONCE(w_ctx->flags);
-	if (!(flags & PBLK_SUBMITTED_ENTRY))
-		goto try;
+	WARN_ONCE(!(flags & PBLK_SUBMITTED_ENTRY),
+			"pblk: overwriting unsubmitted data\n");
 
 	/* Release flags on context. Protect from writes and reads */
 	smp_store_release(&w_ctx->flags, PBLK_WRITABLE_ENTRY);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 06/20] lightnvm: pblk: return NVM_ error on failed submission
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (4 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 05/20] lightnvm: pblk: warn in case of corrupted write buffer Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 07/20] lightnvm: pblk: remove unnecessary indirection Matias Bjørling
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Return a meaningful error when the sanity vector I/O check fails.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 2cad918434a7..0d4078805ecc 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -467,16 +467,13 @@ int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
 
+	atomic_inc(&pblk->inflight_io);
+
 #ifdef CONFIG_NVM_DEBUG
-	int ret;
-
-	ret = pblk_check_io(pblk, rqd);
-	if (ret)
-		return ret;
+	if (pblk_check_io(pblk, rqd))
+		return NVM_IO_ERR;
 #endif
 
-	atomic_inc(&pblk->inflight_io);
-
 	return nvm_submit_io(dev, rqd);
 }
 
@@ -484,16 +481,13 @@ int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd)
 {
 	struct nvm_tgt_dev *dev = pblk->dev;
 
+	atomic_inc(&pblk->inflight_io);
+
 #ifdef CONFIG_NVM_DEBUG
-	int ret;
-
-	ret = pblk_check_io(pblk, rqd);
-	if (ret)
-		return ret;
+	if (pblk_check_io(pblk, rqd))
+		return NVM_IO_ERR;
 #endif
 
-	atomic_inc(&pblk->inflight_io);
-
 	return nvm_submit_io_sync(dev, rqd);
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 07/20] lightnvm: pblk: remove unnecessary indirection
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (5 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 06/20] lightnvm: pblk: return NVM_ error on failed submission Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 08/20] lightnvm: pblk: remove unnecessary argument Matias Bjørling
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Call nvm_submit_io directly and remove an unnecessary indirection on the
read path.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-read.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index b201fc486adb..a2e678de428f 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -102,16 +102,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
 #endif
 }
 
-static int pblk_submit_read_io(struct pblk *pblk, struct nvm_rq *rqd)
-{
-	int err;
-
-	err = pblk_submit_io(pblk, rqd);
-	if (err)
-		return NVM_IO_ERR;
-
-	return NVM_IO_OK;
-}
 
 static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
 				sector_t blba)
@@ -485,9 +475,9 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 		rqd->bio = int_bio;
 		r_ctx->private = bio;
 
-		ret = pblk_submit_read_io(pblk, rqd);
-		if (ret) {
+		if (pblk_submit_io(pblk, rqd)) {
 			pr_err("pblk: read IO submission failed\n");
+			ret = NVM_IO_ERR;
 			if (int_bio)
 				bio_put(int_bio);
 			goto fail_end_io;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 08/20] lightnvm: pblk: remove unnecessary argument
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (6 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 07/20] lightnvm: pblk: remove unnecessary indirection Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 09/20] lightnvm: pblk: check for chunk size before allocating it Matias Bjørling
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Remove unnecessary argument on pblk_line_free()

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 6 +++---
 drivers/lightnvm/pblk-init.c | 2 +-
 drivers/lightnvm/pblk.h      | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 0d4078805ecc..4b10122aec89 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1337,7 +1337,7 @@ static struct pblk_line *pblk_line_retry(struct pblk *pblk,
 	retry_line->emeta = line->emeta;
 	retry_line->meta_line = line->meta_line;
 
-	pblk_line_free(pblk, line);
+	pblk_line_free(line);
 	l_mg->data_line = retry_line;
 	spin_unlock(&l_mg->free_lock);
 
@@ -1562,7 +1562,7 @@ struct pblk_line *pblk_line_replace_data(struct pblk *pblk)
 	return new;
 }
 
-void pblk_line_free(struct pblk *pblk, struct pblk_line *line)
+void pblk_line_free(struct pblk_line *line)
 {
 	kfree(line->map_bitmap);
 	kfree(line->invalid_bitmap);
@@ -1584,7 +1584,7 @@ static void __pblk_line_put(struct pblk *pblk, struct pblk_line *line)
 	WARN_ON(line->state != PBLK_LINESTATE_GC);
 	line->state = PBLK_LINESTATE_FREE;
 	line->gc_group = PBLK_LINEGC_NONE;
-	pblk_line_free(pblk, line);
+	pblk_line_free(line);
 	spin_unlock(&line->lock);
 
 	atomic_dec(&gc->pipeline_gc);
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 8f8c9abd14fc..b52855f9336b 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -509,7 +509,7 @@ static void pblk_lines_free(struct pblk *pblk)
 	for (i = 0; i < l_mg->nr_lines; i++) {
 		line = &pblk->lines[i];
 
-		pblk_line_free(pblk, line);
+		pblk_line_free(line);
 		pblk_line_meta_free(line);
 	}
 	spin_unlock(&l_mg->free_lock);
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 9c682acfc5d1..dfbfe9e9a385 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -766,7 +766,7 @@ struct pblk_line *pblk_line_get_data(struct pblk *pblk);
 struct pblk_line *pblk_line_get_erase(struct pblk *pblk);
 int pblk_line_erase(struct pblk *pblk, struct pblk_line *line);
 int pblk_line_is_full(struct pblk_line *line);
-void pblk_line_free(struct pblk *pblk, struct pblk_line *line);
+void pblk_line_free(struct pblk_line *line);
 void pblk_line_close_meta(struct pblk *pblk, struct pblk_line *line);
 void pblk_line_close(struct pblk *pblk, struct pblk_line *line);
 void pblk_line_close_ws(struct work_struct *work);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 09/20] lightnvm: pblk: check for chunk size before allocating it
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (7 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 08/20] lightnvm: pblk: remove unnecessary argument Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 10/20] lightnvm: pass flag on graceful teardown to targets Matias Bjørling
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Do the check for the chunk state after making sure that the chunk type
is supported.

Fixes: 32ef9412c114 ("lightnvm: pblk: implement get log report chunk")
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index b52855f9336b..9e3a43346d4c 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -751,14 +751,14 @@ static int pblk_setup_line_meta_20(struct pblk *pblk, struct pblk_line *line,
 		chunk->cnlb = chunk_meta->cnlb;
 		chunk->wp = chunk_meta->wp;
 
-		if (!(chunk->state & NVM_CHK_ST_OFFLINE))
-			continue;
-
 		if (chunk->type & NVM_CHK_TP_SZ_SPEC) {
 			WARN_ONCE(1, "pblk: custom-sized chunks unsupported\n");
 			continue;
 		}
 
+		if (!(chunk->state & NVM_CHK_ST_OFFLINE))
+			continue;
+
 		set_bit(pos, line->blk_bitmap);
 		nr_bad_chks++;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 10/20] lightnvm: pass flag on graceful teardown to targets
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (8 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 09/20] lightnvm: pblk: check for chunk size before allocating it Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 11/20] lightnvm: pblk: remove dead function Matias Bjørling
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

If the namespace is unregistered before the LightNVM target is removed
(e.g., on hot unplug) it is too late for the target to store any metadata
on the device - any attempt to write to the device will fail. In this
case, pass on a "gracefull teardown" flag to the target to let it know
when this happens.

In the case of pblk, we pad the open line (close all open chunks) to
improve data retention. In the event of an ungraceful shutdown, avoid
this part and just clean up.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/core.c      | 10 +++++-----
 drivers/lightnvm/pblk-core.c | 13 ++++++++++++-
 drivers/lightnvm/pblk-gc.c   | 10 ++++++----
 drivers/lightnvm/pblk-init.c | 14 ++++++++------
 drivers/lightnvm/pblk.h      |  4 +++-
 include/linux/lightnvm.h     |  2 +-
 6 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 63171cdce270..60aa7bc5a630 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -431,7 +431,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
 	return 0;
 err_sysfs:
 	if (tt->exit)
-		tt->exit(targetdata);
+		tt->exit(targetdata, true);
 err_init:
 	blk_cleanup_queue(tqueue);
 	tdisk->queue = NULL;
@@ -446,7 +446,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
 	return ret;
 }
 
-static void __nvm_remove_target(struct nvm_target *t)
+static void __nvm_remove_target(struct nvm_target *t, bool graceful)
 {
 	struct nvm_tgt_type *tt = t->type;
 	struct gendisk *tdisk = t->disk;
@@ -459,7 +459,7 @@ static void __nvm_remove_target(struct nvm_target *t)
 		tt->sysfs_exit(tdisk);
 
 	if (tt->exit)
-		tt->exit(tdisk->private_data);
+		tt->exit(tdisk->private_data, graceful);
 
 	nvm_remove_tgt_dev(t->dev, 1);
 	put_disk(tdisk);
@@ -489,7 +489,7 @@ static int nvm_remove_tgt(struct nvm_dev *dev, struct nvm_ioctl_remove *remove)
 		mutex_unlock(&dev->mlock);
 		return 1;
 	}
-	__nvm_remove_target(t);
+	__nvm_remove_target(t, true);
 	mutex_unlock(&dev->mlock);
 
 	return 0;
@@ -963,7 +963,7 @@ void nvm_unregister(struct nvm_dev *dev)
 	list_for_each_entry_safe(t, tmp, &dev->targets, list) {
 		if (t->dev->parent != dev)
 			continue;
-		__nvm_remove_target(t);
+		__nvm_remove_target(t, false);
 	}
 	mutex_unlock(&dev->mlock);
 
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 4b10122aec89..5f1e5b1b3094 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1461,7 +1461,7 @@ static void pblk_line_close_meta_sync(struct pblk *pblk)
 	flush_workqueue(pblk->close_wq);
 }
 
-void pblk_pipeline_stop(struct pblk *pblk)
+void __pblk_pipeline_flush(struct pblk *pblk)
 {
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	int ret;
@@ -1486,6 +1486,11 @@ void pblk_pipeline_stop(struct pblk *pblk)
 
 	flush_workqueue(pblk->bb_wq);
 	pblk_line_close_meta_sync(pblk);
+}
+
+void __pblk_pipeline_stop(struct pblk *pblk)
+{
+	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 
 	spin_lock(&l_mg->free_lock);
 	pblk->state = PBLK_STATE_STOPPED;
@@ -1494,6 +1499,12 @@ void pblk_pipeline_stop(struct pblk *pblk)
 	spin_unlock(&l_mg->free_lock);
 }
 
+void pblk_pipeline_stop(struct pblk *pblk)
+{
+	__pblk_pipeline_flush(pblk);
+	__pblk_pipeline_stop(pblk);
+}
+
 struct pblk_line *pblk_line_replace_data(struct pblk *pblk)
 {
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index 6851a5c67189..b0cc277bf972 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -649,7 +649,7 @@ int pblk_gc_init(struct pblk *pblk)
 	return ret;
 }
 
-void pblk_gc_exit(struct pblk *pblk)
+void pblk_gc_exit(struct pblk *pblk, bool graceful)
 {
 	struct pblk_gc *gc = &pblk->gc;
 
@@ -663,10 +663,12 @@ void pblk_gc_exit(struct pblk *pblk)
 	if (gc->gc_reader_ts)
 		kthread_stop(gc->gc_reader_ts);
 
-	flush_workqueue(gc->gc_reader_wq);
+	if (graceful) {
+		flush_workqueue(gc->gc_reader_wq);
+		flush_workqueue(gc->gc_line_reader_wq);
+	}
+
 	destroy_workqueue(gc->gc_reader_wq);
-
-	flush_workqueue(gc->gc_line_reader_wq);
 	destroy_workqueue(gc->gc_line_reader_wq);
 
 	if (gc->gc_writer_ts)
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 9e3a43346d4c..bfc488d0dda9 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1118,23 +1118,25 @@ static void pblk_free(struct pblk *pblk)
 	kfree(pblk);
 }
 
-static void pblk_tear_down(struct pblk *pblk)
+static void pblk_tear_down(struct pblk *pblk, bool graceful)
 {
-	pblk_pipeline_stop(pblk);
+	if (graceful)
+		__pblk_pipeline_flush(pblk);
+	__pblk_pipeline_stop(pblk);
 	pblk_writer_stop(pblk);
 	pblk_rb_sync_l2p(&pblk->rwb);
 	pblk_rl_free(&pblk->rl);
 
-	pr_debug("pblk: consistent tear down\n");
+	pr_debug("pblk: consistent tear down (graceful:%d)\n", graceful);
 }
 
-static void pblk_exit(void *private)
+static void pblk_exit(void *private, bool graceful)
 {
 	struct pblk *pblk = private;
 
 	down_write(&pblk_lock);
-	pblk_gc_exit(pblk);
-	pblk_tear_down(pblk);
+	pblk_gc_exit(pblk, graceful);
+	pblk_tear_down(pblk, graceful);
 
 #ifdef CONFIG_NVM_DEBUG
 	pr_info("pblk exit: L2P CRC: %x\n", pblk_l2p_crc(pblk));
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index dfbfe9e9a385..0c69eb880f56 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -771,6 +771,8 @@ void pblk_line_close_meta(struct pblk *pblk, struct pblk_line *line);
 void pblk_line_close(struct pblk *pblk, struct pblk_line *line);
 void pblk_line_close_ws(struct work_struct *work);
 void pblk_pipeline_stop(struct pblk *pblk);
+void __pblk_pipeline_stop(struct pblk *pblk);
+void __pblk_pipeline_flush(struct pblk *pblk);
 void pblk_gen_run_ws(struct pblk *pblk, struct pblk_line *line, void *priv,
 		     void (*work)(struct work_struct *), gfp_t gfp_mask,
 		     struct workqueue_struct *wq);
@@ -864,7 +866,7 @@ int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx,
 #define PBLK_GC_RSV_LINE 1	/* Reserved lines for GC */
 
 int pblk_gc_init(struct pblk *pblk);
-void pblk_gc_exit(struct pblk *pblk);
+void pblk_gc_exit(struct pblk *pblk, bool graceful);
 void pblk_gc_should_start(struct pblk *pblk);
 void pblk_gc_should_stop(struct pblk *pblk);
 void pblk_gc_should_kick(struct pblk *pblk);
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index 6e0859b9d4d2..e9e0d1c7eaf5 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -489,7 +489,7 @@ typedef blk_qc_t (nvm_tgt_make_rq_fn)(struct request_queue *, struct bio *);
 typedef sector_t (nvm_tgt_capacity_fn)(void *);
 typedef void *(nvm_tgt_init_fn)(struct nvm_tgt_dev *, struct gendisk *,
 				int flags);
-typedef void (nvm_tgt_exit_fn)(void *);
+typedef void (nvm_tgt_exit_fn)(void *, bool);
 typedef int (nvm_tgt_sysfs_init_fn)(struct gendisk *);
 typedef void (nvm_tgt_sysfs_exit_fn)(struct gendisk *);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 11/20] lightnvm: pblk: remove dead function
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (9 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 10/20] lightnvm: pass flag on graceful teardown to targets Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 12/20] lightnvm: pblk: rework write error recovery path Matias Bjørling
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González,
	Javier González, Matias Bjørling

From: Javier González <javier@javigon.com>

Remove dead function for manual sync. I/O

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 7 -------
 drivers/lightnvm/pblk.h      | 1 -
 2 files changed, 8 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 5f1e5b1b3094..5dae72e8b46b 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -342,13 +342,6 @@ void pblk_write_should_kick(struct pblk *pblk)
 		pblk_write_kick(pblk);
 }
 
-void pblk_end_io_sync(struct nvm_rq *rqd)
-{
-	struct completion *waiting = rqd->private;
-
-	complete(waiting);
-}
-
 static void pblk_wait_for_meta(struct pblk *pblk)
 {
 	do {
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 0c69eb880f56..5a4daf0b949d 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -796,7 +796,6 @@ void pblk_down_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
 void pblk_down_page(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas);
 void pblk_up_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
 		unsigned long *lun_bitmap);
-void pblk_end_io_sync(struct nvm_rq *rqd);
 int pblk_bio_add_pages(struct pblk *pblk, struct bio *bio, gfp_t flags,
 		       int nr_pages);
 void pblk_bio_free_pages(struct pblk *pblk, struct bio *bio, int off,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 12/20] lightnvm: pblk: rework write error recovery path
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (10 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 11/20] lightnvm: pblk: remove dead function Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 13/20] lightnvm: pblk: garbage collect lines with failed writes Matias Bjørling
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Hans Holmberg, Matias Bjørling

From: Hans Holmberg <hans.holmberg@cnexlabs.com>

The write error recovery path is incomplete, so rework
the write error recovery handling to do resubmits directly
from the write buffer.

When a write error occurs, the remaining sectors in the chunk are
mapped out and invalidated and the request inserted in a resubmit list.

The writer thread checks if there are any requests to resubmit,
scans and invalidates any lbas that have been overwritten by later
writes and resubmits the failed entries.

Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c     |   2 +
 drivers/lightnvm/pblk-rb.c       |  39 ------
 drivers/lightnvm/pblk-recovery.c |  91 -------------
 drivers/lightnvm/pblk-write.c    | 279 +++++++++++++++++++++++++--------------
 drivers/lightnvm/pblk.h          |  11 +-
 5 files changed, 187 insertions(+), 235 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index bfc488d0dda9..6f06727afcf6 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -426,6 +426,7 @@ static int pblk_core_init(struct pblk *pblk)
 		goto free_r_end_wq;
 
 	INIT_LIST_HEAD(&pblk->compl_list);
+	INIT_LIST_HEAD(&pblk->resubmit_list);
 
 	return 0;
 
@@ -1185,6 +1186,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
 	pblk->state = PBLK_STATE_RUNNING;
 	pblk->gc.gc_enabled = 0;
 
+	spin_lock_init(&pblk->resubmit_lock);
 	spin_lock_init(&pblk->trans_lock);
 	spin_lock_init(&pblk->lock);
 
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 58946ffebe81..1b74ec51a4ad 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -503,45 +503,6 @@ int pblk_rb_may_write_gc(struct pblk_rb *rb, unsigned int nr_entries,
 }
 
 /*
- * The caller of this function must ensure that the backpointer will not
- * overwrite the entries passed on the list.
- */
-unsigned int pblk_rb_read_to_bio_list(struct pblk_rb *rb, struct bio *bio,
-				      struct list_head *list,
-				      unsigned int max)
-{
-	struct pblk_rb_entry *entry, *tentry;
-	struct page *page;
-	unsigned int read = 0;
-	int ret;
-
-	list_for_each_entry_safe(entry, tentry, list, index) {
-		if (read > max) {
-			pr_err("pblk: too many entries on list\n");
-			goto out;
-		}
-
-		page = virt_to_page(entry->data);
-		if (!page) {
-			pr_err("pblk: could not allocate write bio page\n");
-			goto out;
-		}
-
-		ret = bio_add_page(bio, page, rb->seg_size, 0);
-		if (ret != rb->seg_size) {
-			pr_err("pblk: could not add page to write bio\n");
-			goto out;
-		}
-
-		list_del(&entry->index);
-		read++;
-	}
-
-out:
-	return read;
-}
-
-/*
  * Read available entries on rb and add them to the given bio. To avoid a memory
  * copy, a page reference to the write buffer is used to be added to the bio.
  *
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 3e079c2afa6e..788dce87043e 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -16,97 +16,6 @@
 
 #include "pblk.h"
 
-void pblk_submit_rec(struct work_struct *work)
-{
-	struct pblk_rec_ctx *recovery =
-			container_of(work, struct pblk_rec_ctx, ws_rec);
-	struct pblk *pblk = recovery->pblk;
-	struct nvm_rq *rqd = recovery->rqd;
-	struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
-	struct bio *bio;
-	unsigned int nr_rec_secs;
-	unsigned int pgs_read;
-	int ret;
-
-	nr_rec_secs = bitmap_weight((unsigned long int *)&rqd->ppa_status,
-								NVM_MAX_VLBA);
-
-	bio = bio_alloc(GFP_KERNEL, nr_rec_secs);
-
-	bio->bi_iter.bi_sector = 0;
-	bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
-	rqd->bio = bio;
-	rqd->nr_ppas = nr_rec_secs;
-
-	pgs_read = pblk_rb_read_to_bio_list(&pblk->rwb, bio, &recovery->failed,
-								nr_rec_secs);
-	if (pgs_read != nr_rec_secs) {
-		pr_err("pblk: could not read recovery entries\n");
-		goto err;
-	}
-
-	if (pblk_setup_w_rec_rq(pblk, rqd, c_ctx)) {
-		pr_err("pblk: could not setup recovery request\n");
-		goto err;
-	}
-
-#ifdef CONFIG_NVM_DEBUG
-	atomic_long_add(nr_rec_secs, &pblk->recov_writes);
-#endif
-
-	ret = pblk_submit_io(pblk, rqd);
-	if (ret) {
-		pr_err("pblk: I/O submission failed: %d\n", ret);
-		goto err;
-	}
-
-	mempool_free(recovery, pblk->rec_pool);
-	return;
-
-err:
-	bio_put(bio);
-	pblk_free_rqd(pblk, rqd, PBLK_WRITE);
-}
-
-int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx,
-			struct pblk_rec_ctx *recovery, u64 *comp_bits,
-			unsigned int comp)
-{
-	struct nvm_rq *rec_rqd;
-	struct pblk_c_ctx *rec_ctx;
-	int nr_entries = c_ctx->nr_valid + c_ctx->nr_padded;
-
-	rec_rqd = pblk_alloc_rqd(pblk, PBLK_WRITE);
-	rec_ctx = nvm_rq_to_pdu(rec_rqd);
-
-	/* Copy completion bitmap, but exclude the first X completed entries */
-	bitmap_shift_right((unsigned long int *)&rec_rqd->ppa_status,
-				(unsigned long int *)comp_bits,
-				comp, NVM_MAX_VLBA);
-
-	/* Save the context for the entries that need to be re-written and
-	 * update current context with the completed entries.
-	 */
-	rec_ctx->sentry = pblk_rb_wrap_pos(&pblk->rwb, c_ctx->sentry + comp);
-	if (comp >= c_ctx->nr_valid) {
-		rec_ctx->nr_valid = 0;
-		rec_ctx->nr_padded = nr_entries - comp;
-
-		c_ctx->nr_padded = comp - c_ctx->nr_valid;
-	} else {
-		rec_ctx->nr_valid = c_ctx->nr_valid - comp;
-		rec_ctx->nr_padded = c_ctx->nr_padded;
-
-		c_ctx->nr_valid = comp;
-		c_ctx->nr_padded = 0;
-	}
-
-	recovery->rqd = rec_rqd;
-	recovery->pblk = pblk;
-
-	return 0;
-}
-
 int pblk_recov_check_emeta(struct pblk *pblk, struct line_emeta *emeta_buf)
 {
 	u32 crc;
diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index 3e6f1ebd743a..f62e432f7c91 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -103,68 +103,149 @@ static void pblk_complete_write(struct pblk *pblk, struct nvm_rq *rqd,
 	pblk_rb_sync_end(&pblk->rwb, &flags);
 }
 
-/* When a write fails, we are not sure whether the block has grown bad or a page
- * range is more susceptible to write errors. If a high number of pages fail, we
- * assume that the block is bad and we mark it accordingly. In all cases, we
- * remap and resubmit the failed entries as fast as possible; if a flush is
- * waiting on a completion, the whole stack would stall otherwise.
- */
+/* Map remaining sectors in chunk, starting from ppa */
+static void pblk_map_remaining(struct pblk *pblk, struct ppa_addr *ppa)
+{
+	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
+	struct pblk_line *line;
+	struct ppa_addr map_ppa = *ppa;
+	u64 paddr;
+	int done = 0;
+
+	line = &pblk->lines[pblk_ppa_to_line(*ppa)];
+	spin_lock(&line->lock);
+
+	while (!done)  {
+		paddr = pblk_dev_ppa_to_line_addr(pblk, map_ppa);
+
+		if (!test_and_set_bit(paddr, line->map_bitmap))
+			line->left_msecs--;
+
+		if (!test_and_set_bit(paddr, line->invalid_bitmap))
+			le32_add_cpu(line->vsc, -1);
+
+		if (geo->version == NVM_OCSSD_SPEC_12) {
+			map_ppa.ppa++;
+			if (map_ppa.g.pg == geo->num_pg)
+				done = 1;
+		} else {
+			map_ppa.m.sec++;
+			if (map_ppa.m.sec == geo->clba)
+				done = 1;
+		}
+	}
+
+	spin_unlock(&line->lock);
+}
+
+static void pblk_prepare_resubmit(struct pblk *pblk, unsigned int sentry,
+				  unsigned int nr_entries)
+{
+	struct pblk_rb *rb = &pblk->rwb;
+	struct pblk_rb_entry *entry;
+	struct pblk_line *line;
+	struct pblk_w_ctx *w_ctx;
+	struct ppa_addr ppa_l2p;
+	int flags;
+	unsigned int pos, i;
+
+	spin_lock(&pblk->trans_lock);
+	pos = sentry;
+	for (i = 0; i < nr_entries; i++) {
+		entry = &rb->entries[pos];
+		w_ctx = &entry->w_ctx;
+
+		/* Check if the lba has been overwritten */
+		ppa_l2p = pblk_trans_map_get(pblk, w_ctx->lba);
+		if (!pblk_ppa_comp(ppa_l2p, entry->cacheline))
+			w_ctx->lba = ADDR_EMPTY;
+
+		/* Mark up the entry as submittable again */
+		flags = READ_ONCE(w_ctx->flags);
+		flags |= PBLK_WRITTEN_DATA;
+		/* Release flags on write context. Protect from writes */
+		smp_store_release(&w_ctx->flags, flags);
+
+		/* Decrese the reference count to the line as we will
+		 * re-map these entries
+		 */
+		line = &pblk->lines[pblk_ppa_to_line(w_ctx->ppa)];
+		kref_put(&line->ref, pblk_line_put);
+
+		pos = (pos + 1) & (rb->nr_entries - 1);
+	}
+	spin_unlock(&pblk->trans_lock);
+}
+
+static void pblk_queue_resubmit(struct pblk *pblk, struct pblk_c_ctx *c_ctx)
+{
+	struct pblk_c_ctx *r_ctx;
+
+	r_ctx = kzalloc(sizeof(struct pblk_c_ctx), GFP_KERNEL);
+	if (!r_ctx)
+		return;
+
+	r_ctx->lun_bitmap = NULL;
+	r_ctx->sentry = c_ctx->sentry;
+	r_ctx->nr_valid = c_ctx->nr_valid;
+	r_ctx->nr_padded = c_ctx->nr_padded;
+
+	spin_lock(&pblk->resubmit_lock);
+	list_add_tail(&r_ctx->list, &pblk->resubmit_list);
+	spin_unlock(&pblk->resubmit_lock);
+
+#ifdef CONFIG_NVM_DEBUG
+	atomic_long_add(c_ctx->nr_valid, &pblk->recov_writes);
+#endif
+}
+
+static void pblk_submit_rec(struct work_struct *work)
+{
+	struct pblk_rec_ctx *recovery =
+			container_of(work, struct pblk_rec_ctx, ws_rec);
+	struct pblk *pblk = recovery->pblk;
+	struct nvm_rq *rqd = recovery->rqd;
+	struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
+	struct ppa_addr *ppa_list;
+
+	pblk_log_write_err(pblk, rqd);
+
+	if (rqd->nr_ppas == 1)
+		ppa_list = &rqd->ppa_addr;
+	else
+		ppa_list = rqd->ppa_list;
+
+	pblk_map_remaining(pblk, ppa_list);
+	pblk_queue_resubmit(pblk, c_ctx);
+
+	pblk_up_rq(pblk, rqd->ppa_list, rqd->nr_ppas, c_ctx->lun_bitmap);
+	if (c_ctx->nr_padded)
+		pblk_bio_free_pages(pblk, rqd->bio, c_ctx->nr_valid,
+							c_ctx->nr_padded);
+	bio_put(rqd->bio);
+	pblk_free_rqd(pblk, rqd, PBLK_WRITE);
+	mempool_free(recovery, pblk->rec_pool);
+
+	atomic_dec(&pblk->inflight_io);
+}
+
+
 static void pblk_end_w_fail(struct pblk *pblk, struct nvm_rq *rqd)
 {
-	void *comp_bits = &rqd->ppa_status;
-	struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
 	struct pblk_rec_ctx *recovery;
-	struct ppa_addr *ppa_list = rqd->ppa_list;
-	int nr_ppas = rqd->nr_ppas;
-	unsigned int c_entries;
-	int bit, ret;
-
-	if (unlikely(nr_ppas == 1))
-		ppa_list = &rqd->ppa_addr;
 
 	recovery = mempool_alloc(pblk->rec_pool, GFP_ATOMIC);
-
-	INIT_LIST_HEAD(&recovery->failed);
-
-	bit = -1;
-	while ((bit = find_next_bit(comp_bits, nr_ppas, bit + 1)) < nr_ppas) {
-		struct pblk_rb_entry *entry;
-		struct ppa_addr ppa;
-
-		/* Logic error */
-		if (bit > c_ctx->nr_valid) {
-			WARN_ONCE(1, "pblk: corrupted write request\n");
-			mempool_free(recovery, pblk->rec_pool);
-			goto out;
-		}
-
-		ppa = ppa_list[bit];
-		entry = pblk_rb_sync_scan_entry(&pblk->rwb, &ppa);
-		if (!entry) {
-			pr_err("pblk: could not scan entry on write failure\n");
-			mempool_free(recovery, pblk->rec_pool);
-			goto out;
-		}
-
-		/* The list is filled first and emptied afterwards. No need for
-		 * protecting it with a lock
-		 */
-		list_add_tail(&entry->index, &recovery->failed);
+	if (!recovery) {
+		pr_err("pblk: could not allocate recovery work\n");
+		return;
 	}
 
-	c_entries = find_first_bit(comp_bits, nr_ppas);
-	ret = pblk_recov_setup_rq(pblk, c_ctx, recovery, comp_bits, c_entries);
-	if (ret) {
-		pr_err("pblk: could not recover from write failure\n");
-		mempool_free(recovery, pblk->rec_pool);
-		goto out;
-	}
+	recovery->pblk = pblk;
+	recovery->rqd = rqd;
 
 	INIT_WORK(&recovery->ws_rec, pblk_submit_rec);
 	queue_work(pblk->close_wq, &recovery->ws_rec);
-
-out:
-	pblk_complete_write(pblk, rqd, c_ctx);
 }
 
 static void pblk_end_io_write(struct nvm_rq *rqd)
@@ -173,8 +254,8 @@ static void pblk_end_io_write(struct nvm_rq *rqd)
 	struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
 
 	if (rqd->error) {
-		pblk_log_write_err(pblk, rqd);
-		return pblk_end_w_fail(pblk, rqd);
+		pblk_end_w_fail(pblk, rqd);
+		return;
 	}
 #ifdef CONFIG_NVM_DEBUG
 	else
@@ -266,31 +347,6 @@ static int pblk_setup_w_rq(struct pblk *pblk, struct nvm_rq *rqd,
 	return 0;
 }
 
-int pblk_setup_w_rec_rq(struct pblk *pblk, struct nvm_rq *rqd,
-			struct pblk_c_ctx *c_ctx)
-{
-	struct pblk_line_meta *lm = &pblk->lm;
-	unsigned long *lun_bitmap;
-	int ret;
-
-	lun_bitmap = kzalloc(lm->lun_bitmap_len, GFP_KERNEL);
-	if (!lun_bitmap)
-		return -ENOMEM;
-
-	c_ctx->lun_bitmap = lun_bitmap;
-
-	ret = pblk_alloc_w_rq(pblk, rqd, rqd->nr_ppas, pblk_end_io_write);
-	if (ret)
-		return ret;
-
-	pblk_map_rq(pblk, rqd, c_ctx->sentry, lun_bitmap, c_ctx->nr_valid, 0);
-
-	rqd->ppa_status = (u64)0;
-	rqd->flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
-
-	return ret;
-}
-
 static int pblk_calc_secs_to_sync(struct pblk *pblk, unsigned int secs_avail,
 				  unsigned int secs_to_flush)
 {
@@ -339,6 +395,7 @@ int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line)
 	bio = pblk_bio_map_addr(pblk, data, rq_ppas, rq_len,
 					l_mg->emeta_alloc_type, GFP_KERNEL);
 	if (IS_ERR(bio)) {
+		pr_err("pblk: failed to map emeta io");
 		ret = PTR_ERR(bio);
 		goto fail_free_rqd;
 	}
@@ -515,27 +572,55 @@ static int pblk_submit_write(struct pblk *pblk)
 	unsigned int secs_avail, secs_to_sync, secs_to_com;
 	unsigned int secs_to_flush;
 	unsigned long pos;
+	unsigned int resubmit;
 
-	/* If there are no sectors in the cache, flushes (bios without data)
-	 * will be cleared on the cache threads
-	 */
-	secs_avail = pblk_rb_read_count(&pblk->rwb);
-	if (!secs_avail)
-		return 1;
-
-	secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
-	if (!secs_to_flush && secs_avail < pblk->min_write_pgs)
-		return 1;
-
-	secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail, secs_to_flush);
-	if (secs_to_sync > pblk->max_write_pgs) {
-		pr_err("pblk: bad buffer sync calculation\n");
-		return 1;
+	spin_lock(&pblk->resubmit_lock);
+	resubmit = !list_empty(&pblk->resubmit_list);
+	spin_unlock(&pblk->resubmit_lock);
+
+	/* Resubmit failed writes first */
+	if (resubmit) {
+		struct pblk_c_ctx *r_ctx;
+
+		spin_lock(&pblk->resubmit_lock);
+		r_ctx = list_first_entry(&pblk->resubmit_list,
+					struct pblk_c_ctx, list);
+		list_del(&r_ctx->list);
+		spin_unlock(&pblk->resubmit_lock);
+
+		secs_avail = r_ctx->nr_valid;
+		pos = r_ctx->sentry;
+
+		pblk_prepare_resubmit(pblk, pos, secs_avail);
+		secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail,
+				secs_avail);
+
+		kfree(r_ctx);
+	} else {
+		/* If there are no sectors in the cache,
+		 * flushes (bios without data) will be cleared on
+		 * the cache threads
+		 */
+		secs_avail = pblk_rb_read_count(&pblk->rwb);
+		if (!secs_avail)
+			return 1;
+
+		secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
+		if (!secs_to_flush && secs_avail < pblk->min_write_pgs)
+			return 1;
+
+		secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail,
+					secs_to_flush);
+		if (secs_to_sync > pblk->max_write_pgs) {
+			pr_err("pblk: bad buffer sync calculation\n");
+			return 1;
+		}
+
+		secs_to_com = (secs_to_sync > secs_avail) ?
+			secs_avail : secs_to_sync;
+		pos = pblk_rb_read_commit(&pblk->rwb, secs_to_com);
 	}
 
-	secs_to_com = (secs_to_sync > secs_avail) ? secs_avail : secs_to_sync;
-	pos = pblk_rb_read_commit(&pblk->rwb, secs_to_com);
-
 	bio = bio_alloc(GFP_KERNEL, secs_to_sync);
 
 	bio->bi_iter.bi_sector = 0; /* internal bio */
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 5a4daf0b949d..a75ffae53a0d 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -128,7 +128,6 @@ struct pblk_pad_rq {
 struct pblk_rec_ctx {
 	struct pblk *pblk;
 	struct nvm_rq *rqd;
-	struct list_head failed;
 	struct work_struct ws_rec;
 };
 
@@ -664,6 +663,9 @@ struct pblk {
 
 	struct list_head compl_list;
 
+	spinlock_t resubmit_lock;	 /* Resubmit list lock */
+	struct list_head resubmit_list; /* Resubmit list for failed writes*/
+
 	mempool_t *page_bio_pool;
 	mempool_t *gen_ws_pool;
 	mempool_t *rec_pool;
@@ -713,9 +715,6 @@ void pblk_rb_sync_l2p(struct pblk_rb *rb);
 unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
 				 unsigned int pos, unsigned int nr_entries,
 				 unsigned int count);
-unsigned int pblk_rb_read_to_bio_list(struct pblk_rb *rb, struct bio *bio,
-				      struct list_head *list,
-				      unsigned int max);
 int pblk_rb_copy_to_bio(struct pblk_rb *rb, struct bio *bio, sector_t lba,
 			struct ppa_addr ppa, int bio_iter, bool advanced_bio);
 unsigned int pblk_rb_read_commit(struct pblk_rb *rb, unsigned int entries);
@@ -848,13 +847,9 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq);
 /*
  * pblk recovery
  */
-void pblk_submit_rec(struct work_struct *work);
 struct pblk_line *pblk_recov_l2p(struct pblk *pblk);
 int pblk_recov_pad(struct pblk *pblk);
 int pblk_recov_check_emeta(struct pblk *pblk, struct line_emeta *emeta);
-int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx,
-			struct pblk_rec_ctx *recovery, u64 *comp_bits,
-			unsigned int comp);
 
 /*
  * pblk gc
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 13/20] lightnvm: pblk: garbage collect lines with failed writes
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (11 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 12/20] lightnvm: pblk: rework write error recovery path Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 14/20] lightnvm: pblk: fix smeta write error path Matias Bjørling
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Hans Holmberg, Matias Bjørling

From: Hans Holmberg <hans.holmberg@cnexlabs.com>

Write failures should not happen under normal circumstances,
so in order to bring the chunk back into a known state as soon
as possible, evacuate all the valid data out of the line and let the
fw judge if the block can be written to in the next reset cycle.

Do this by introducing a new gc list for lines with failed writes,
and ensure that the rate limiter allocates a small portion of
the write bandwidth to get the job done.

The lba list is saved in memory for use during gc as we
cannot gurantee that the emeta data is readable if a write
error occurred.

Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c  |  45 ++++++++++++++++++-
 drivers/lightnvm/pblk-gc.c    | 102 +++++++++++++++++++++++++++---------------
 drivers/lightnvm/pblk-init.c  |  46 ++++++++++++-------
 drivers/lightnvm/pblk-rl.c    |  29 ++++++++++--
 drivers/lightnvm/pblk-sysfs.c |  15 ++++++-
 drivers/lightnvm/pblk-write.c |   2 +
 drivers/lightnvm/pblk.h       |  25 +++++++++--
 7 files changed, 200 insertions(+), 64 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 5dae72e8b46b..263da2e43567 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -373,7 +373,13 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line)
 
 	lockdep_assert_held(&line->lock);
 
-	if (!vsc) {
+	if (line->w_err_gc->has_write_err) {
+		if (line->gc_group != PBLK_LINEGC_WERR) {
+			line->gc_group = PBLK_LINEGC_WERR;
+			move_list = &l_mg->gc_werr_list;
+			pblk_rl_werr_line_in(&pblk->rl);
+		}
+	} else if (!vsc) {
 		if (line->gc_group != PBLK_LINEGC_FULL) {
 			line->gc_group = PBLK_LINEGC_FULL;
 			move_list = &l_mg->gc_full_list;
@@ -1589,8 +1595,13 @@ static void __pblk_line_put(struct pblk *pblk, struct pblk_line *line)
 	line->state = PBLK_LINESTATE_FREE;
 	line->gc_group = PBLK_LINEGC_NONE;
 	pblk_line_free(line);
+
+	if (line->w_err_gc->has_write_err) {
+		pblk_rl_werr_line_out(&pblk->rl);
+		line->w_err_gc->has_write_err = 0;
+	}
+
 	spin_unlock(&line->lock);
-
 	atomic_dec(&gc->pipeline_gc);
 
 	spin_lock(&l_mg->free_lock);
@@ -1753,11 +1764,34 @@ void pblk_line_close_meta(struct pblk *pblk, struct pblk_line *line)
 
 	spin_lock(&l_mg->close_lock);
 	spin_lock(&line->lock);
+
+	/* Update the in-memory start address for emeta, in case it has
+	 * shifted due to write errors
+	 */
+	if (line->emeta_ssec != line->cur_sec)
+		line->emeta_ssec = line->cur_sec;
+
 	list_add_tail(&line->list, &l_mg->emeta_list);
 	spin_unlock(&line->lock);
 	spin_unlock(&l_mg->close_lock);
 
 	pblk_line_should_sync_meta(pblk);
+
+
+}
+
+static void pblk_save_lba_list(struct pblk *pblk, struct pblk_line *line)
+{
+	struct pblk_line_meta *lm = &pblk->lm;
+	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
+	unsigned int lba_list_size = lm->emeta_len[2];
+	struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
+	struct pblk_emeta *emeta = line->emeta;
+
+	w_err_gc->lba_list = pblk_malloc(lba_list_size,
+					 l_mg->emeta_alloc_type, GFP_KERNEL);
+	memcpy(w_err_gc->lba_list, emeta_to_lbas(pblk, emeta->buf),
+				lba_list_size);
 }
 
 void pblk_line_close_ws(struct work_struct *work)
@@ -1766,6 +1800,13 @@ void pblk_line_close_ws(struct work_struct *work)
 									ws);
 	struct pblk *pblk = line_ws->pblk;
 	struct pblk_line *line = line_ws->line;
+	struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
+
+	/* Write errors makes the emeta start address stored in smeta invalid,
+	 * so keep a copy of the lba list until we've gc'd the line
+	 */
+	if (w_err_gc->has_write_err)
+		pblk_save_lba_list(pblk, line);
 
 	pblk_line_close(pblk, line);
 	mempool_free(line_ws, pblk->gen_ws_pool);
diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index b0cc277bf972..df88f1bdd921 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -129,6 +129,53 @@ static void pblk_gc_line_ws(struct work_struct *work)
 	kfree(gc_rq_ws);
 }
 
+static __le64 *get_lba_list_from_emeta(struct pblk *pblk,
+				       struct pblk_line *line)
+{
+	struct line_emeta *emeta_buf;
+	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
+	struct pblk_line_meta *lm = &pblk->lm;
+	unsigned int lba_list_size = lm->emeta_len[2];
+	__le64 *lba_list;
+	int ret;
+
+	emeta_buf = pblk_malloc(lm->emeta_len[0],
+				l_mg->emeta_alloc_type, GFP_KERNEL);
+	if (!emeta_buf)
+		return NULL;
+
+	ret = pblk_line_read_emeta(pblk, line, emeta_buf);
+	if (ret) {
+		pr_err("pblk: line %d read emeta failed (%d)\n",
+				line->id, ret);
+		pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
+		return NULL;
+	}
+
+	/* If this read fails, it means that emeta is corrupted.
+	 * For now, leave the line untouched.
+	 * TODO: Implement a recovery routine that scans and moves
+	 * all sectors on the line.
+	 */
+
+	ret = pblk_recov_check_emeta(pblk, emeta_buf);
+	if (ret) {
+		pr_err("pblk: inconsistent emeta (line %d)\n",
+				line->id);
+		pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
+		return NULL;
+	}
+
+	lba_list = pblk_malloc(lba_list_size,
+			       l_mg->emeta_alloc_type, GFP_KERNEL);
+	if (lba_list)
+		memcpy(lba_list, emeta_to_lbas(pblk, emeta_buf), lba_list_size);
+
+	pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
+
+	return lba_list;
+}
+
 static void pblk_gc_line_prepare_ws(struct work_struct *work)
 {
 	struct pblk_line_ws *line_ws = container_of(work, struct pblk_line_ws,
@@ -138,46 +185,26 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct pblk_gc *gc = &pblk->gc;
-	struct line_emeta *emeta_buf;
 	struct pblk_line_ws *gc_rq_ws;
 	struct pblk_gc_rq *gc_rq;
 	__le64 *lba_list;
 	unsigned long *invalid_bitmap;
 	int sec_left, nr_secs, bit;
-	int ret;
 
 	invalid_bitmap = kmalloc(lm->sec_bitmap_len, GFP_KERNEL);
 	if (!invalid_bitmap)
 		goto fail_free_ws;
 
-	emeta_buf = pblk_malloc(lm->emeta_len[0], l_mg->emeta_alloc_type,
-								GFP_KERNEL);
-	if (!emeta_buf) {
-		pr_err("pblk: cannot use GC emeta\n");
-		goto fail_free_bitmap;
-	}
-
-	ret = pblk_line_read_emeta(pblk, line, emeta_buf);
-	if (ret) {
-		pr_err("pblk: line %d read emeta failed (%d)\n", line->id, ret);
-		goto fail_free_emeta;
-	}
-
-	/* If this read fails, it means that emeta is corrupted. For now, leave
-	 * the line untouched. TODO: Implement a recovery routine that scans and
-	 * moves all sectors on the line.
-	 */
-
-	ret = pblk_recov_check_emeta(pblk, emeta_buf);
-	if (ret) {
-		pr_err("pblk: inconsistent emeta (line %d)\n", line->id);
-		goto fail_free_emeta;
-	}
-
-	lba_list = emeta_to_lbas(pblk, emeta_buf);
-	if (!lba_list) {
-		pr_err("pblk: could not interpret emeta (line %d)\n", line->id);
-		goto fail_free_emeta;
+	if (line->w_err_gc->has_write_err) {
+		lba_list = line->w_err_gc->lba_list;
+		line->w_err_gc->lba_list = NULL;
+	} else {
+		lba_list = get_lba_list_from_emeta(pblk, line);
+		if (!lba_list) {
+			pr_err("pblk: could not interpret emeta (line %d)\n",
+					line->id);
+			goto fail_free_ws;
+		}
 	}
 
 	spin_lock(&line->lock);
@@ -187,14 +214,14 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 
 	if (sec_left < 0) {
 		pr_err("pblk: corrupted GC line (%d)\n", line->id);
-		goto fail_free_emeta;
+		goto fail_free_lba_list;
 	}
 
 	bit = -1;
 next_rq:
 	gc_rq = kmalloc(sizeof(struct pblk_gc_rq), GFP_KERNEL);
 	if (!gc_rq)
-		goto fail_free_emeta;
+		goto fail_free_lba_list;
 
 	nr_secs = 0;
 	do {
@@ -240,7 +267,7 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 		goto next_rq;
 
 out:
-	pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
+	pblk_mfree(lba_list, l_mg->emeta_alloc_type);
 	kfree(line_ws);
 	kfree(invalid_bitmap);
 
@@ -251,9 +278,8 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
 
 fail_free_gc_rq:
 	kfree(gc_rq);
-fail_free_emeta:
-	pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
-fail_free_bitmap:
+fail_free_lba_list:
+	pblk_mfree(lba_list, l_mg->emeta_alloc_type);
 	kfree(invalid_bitmap);
 fail_free_ws:
 	kfree(line_ws);
@@ -349,12 +375,14 @@ static struct pblk_line *pblk_gc_get_victim_line(struct pblk *pblk,
 static bool pblk_gc_should_run(struct pblk_gc *gc, struct pblk_rl *rl)
 {
 	unsigned int nr_blocks_free, nr_blocks_need;
+	unsigned int werr_lines = atomic_read(&rl->werr_lines);
 
 	nr_blocks_need = pblk_rl_high_thrs(rl);
 	nr_blocks_free = pblk_rl_nr_free_blks(rl);
 
 	/* This is not critical, no need to take lock here */
-	return ((gc->gc_active) && (nr_blocks_need > nr_blocks_free));
+	return ((werr_lines > 0) ||
+		((gc->gc_active) && (nr_blocks_need > nr_blocks_free)));
 }
 
 void pblk_gc_free_full_lines(struct pblk *pblk)
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 6f06727afcf6..d65d2f972ccf 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -493,11 +493,17 @@ static void pblk_line_mg_free(struct pblk *pblk)
 	}
 }
 
-static void pblk_line_meta_free(struct pblk_line *line)
+static void pblk_line_meta_free(struct pblk_line_mgmt *l_mg,
+				struct pblk_line *line)
 {
+	struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
+
 	kfree(line->blk_bitmap);
 	kfree(line->erase_bitmap);
 	kfree(line->chks);
+
+	pblk_mfree(w_err_gc->lba_list, l_mg->emeta_alloc_type);
+	kfree(w_err_gc);
 }
 
 static void pblk_lines_free(struct pblk *pblk)
@@ -511,7 +517,7 @@ static void pblk_lines_free(struct pblk *pblk)
 		line = &pblk->lines[i];
 
 		pblk_line_free(line);
-		pblk_line_meta_free(line);
+		pblk_line_meta_free(l_mg, line);
 	}
 	spin_unlock(&l_mg->free_lock);
 
@@ -813,20 +819,28 @@ static int pblk_alloc_line_meta(struct pblk *pblk, struct pblk_line *line)
 		return -ENOMEM;
 
 	line->erase_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
-	if (!line->erase_bitmap) {
-		kfree(line->blk_bitmap);
-		return -ENOMEM;
-	}
+	if (!line->erase_bitmap)
+		goto free_blk_bitmap;
+
 
 	line->chks = kmalloc(lm->blk_per_line * sizeof(struct nvm_chk_meta),
 								GFP_KERNEL);
-	if (!line->chks) {
-		kfree(line->erase_bitmap);
-		kfree(line->blk_bitmap);
-		return -ENOMEM;
-	}
+	if (!line->chks)
+		goto free_erase_bitmap;
+
+	line->w_err_gc = kzalloc(sizeof(struct pblk_w_err_gc), GFP_KERNEL);
+	if (!line->w_err_gc)
+		goto free_chks;
 
 	return 0;
+
+free_chks:
+	kfree(line->chks);
+free_erase_bitmap:
+	kfree(line->erase_bitmap);
+free_blk_bitmap:
+	kfree(line->blk_bitmap);
+	return -ENOMEM;
 }
 
 static int pblk_line_mg_init(struct pblk *pblk)
@@ -851,12 +865,14 @@ static int pblk_line_mg_init(struct pblk *pblk)
 	INIT_LIST_HEAD(&l_mg->gc_mid_list);
 	INIT_LIST_HEAD(&l_mg->gc_low_list);
 	INIT_LIST_HEAD(&l_mg->gc_empty_list);
+	INIT_LIST_HEAD(&l_mg->gc_werr_list);
 
 	INIT_LIST_HEAD(&l_mg->emeta_list);
 
-	l_mg->gc_lists[0] = &l_mg->gc_high_list;
-	l_mg->gc_lists[1] = &l_mg->gc_mid_list;
-	l_mg->gc_lists[2] = &l_mg->gc_low_list;
+	l_mg->gc_lists[0] = &l_mg->gc_werr_list;
+	l_mg->gc_lists[1] = &l_mg->gc_high_list;
+	l_mg->gc_lists[2] = &l_mg->gc_mid_list;
+	l_mg->gc_lists[3] = &l_mg->gc_low_list;
 
 	spin_lock_init(&l_mg->free_lock);
 	spin_lock_init(&l_mg->close_lock);
@@ -1063,7 +1079,7 @@ static int pblk_lines_init(struct pblk *pblk)
 
 fail_free_lines:
 	while (--i >= 0)
-		pblk_line_meta_free(&pblk->lines[i]);
+		pblk_line_meta_free(l_mg, &pblk->lines[i]);
 	kfree(pblk->lines);
 fail_free_chunk_meta:
 	kfree(chunk_meta);
diff --git a/drivers/lightnvm/pblk-rl.c b/drivers/lightnvm/pblk-rl.c
index 883a7113b19d..6a0616a6fcaf 100644
--- a/drivers/lightnvm/pblk-rl.c
+++ b/drivers/lightnvm/pblk-rl.c
@@ -73,6 +73,16 @@ void pblk_rl_user_in(struct pblk_rl *rl, int nr_entries)
 	pblk_rl_kick_u_timer(rl);
 }
 
+void pblk_rl_werr_line_in(struct pblk_rl *rl)
+{
+	atomic_inc(&rl->werr_lines);
+}
+
+void pblk_rl_werr_line_out(struct pblk_rl *rl)
+{
+	atomic_dec(&rl->werr_lines);
+}
+
 void pblk_rl_gc_in(struct pblk_rl *rl, int nr_entries)
 {
 	atomic_add(nr_entries, &rl->rb_gc_cnt);
@@ -99,11 +109,21 @@ static void __pblk_rl_update_rates(struct pblk_rl *rl,
 {
 	struct pblk *pblk = container_of(rl, struct pblk, rl);
 	int max = rl->rb_budget;
+	int werr_gc_needed = atomic_read(&rl->werr_lines);
 
 	if (free_blocks >= rl->high) {
-		rl->rb_user_max = max;
-		rl->rb_gc_max = 0;
-		rl->rb_state = PBLK_RL_HIGH;
+		if (werr_gc_needed) {
+			/* Allocate a small budget for recovering
+			 * lines with write errors
+			 */
+			rl->rb_gc_max = 1 << rl->rb_windows_pw;
+			rl->rb_user_max = max - rl->rb_gc_max;
+			rl->rb_state = PBLK_RL_WERR;
+		} else {
+			rl->rb_user_max = max;
+			rl->rb_gc_max = 0;
+			rl->rb_state = PBLK_RL_OFF;
+		}
 	} else if (free_blocks < rl->high) {
 		int shift = rl->high_pw - rl->rb_windows_pw;
 		int user_windows = free_blocks >> shift;
@@ -124,7 +144,7 @@ static void __pblk_rl_update_rates(struct pblk_rl *rl,
 		rl->rb_state = PBLK_RL_LOW;
 	}
 
-	if (rl->rb_state == (PBLK_RL_MID | PBLK_RL_LOW))
+	if (rl->rb_state != PBLK_RL_OFF)
 		pblk_gc_should_start(pblk);
 	else
 		pblk_gc_should_stop(pblk);
@@ -221,6 +241,7 @@ void pblk_rl_init(struct pblk_rl *rl, int budget)
 	atomic_set(&rl->rb_user_cnt, 0);
 	atomic_set(&rl->rb_gc_cnt, 0);
 	atomic_set(&rl->rb_space, -1);
+	atomic_set(&rl->werr_lines, 0);
 
 	timer_setup(&rl->u_timer, pblk_rl_u_timer, 0);
 
diff --git a/drivers/lightnvm/pblk-sysfs.c b/drivers/lightnvm/pblk-sysfs.c
index e61909af23a5..88a0a7c407aa 100644
--- a/drivers/lightnvm/pblk-sysfs.c
+++ b/drivers/lightnvm/pblk-sysfs.c
@@ -173,6 +173,8 @@ static ssize_t pblk_sysfs_lines(struct pblk *pblk, char *page)
 	int free_line_cnt = 0, closed_line_cnt = 0, emeta_line_cnt = 0;
 	int d_line_cnt = 0, l_line_cnt = 0;
 	int gc_full = 0, gc_high = 0, gc_mid = 0, gc_low = 0, gc_empty = 0;
+	int gc_werr = 0;
+
 	int bad = 0, cor = 0;
 	int msecs = 0, cur_sec = 0, vsc = 0, sec_in_line = 0;
 	int map_weight = 0, meta_weight = 0;
@@ -237,6 +239,15 @@ static ssize_t pblk_sysfs_lines(struct pblk *pblk, char *page)
 		gc_empty++;
 	}
 
+	list_for_each_entry(line, &l_mg->gc_werr_list, list) {
+		if (line->type == PBLK_LINETYPE_DATA)
+			d_line_cnt++;
+		else if (line->type == PBLK_LINETYPE_LOG)
+			l_line_cnt++;
+		closed_line_cnt++;
+		gc_werr++;
+	}
+
 	list_for_each_entry(line, &l_mg->bad_list, list)
 		bad++;
 	list_for_each_entry(line, &l_mg->corrupt_list, list)
@@ -275,8 +286,8 @@ static ssize_t pblk_sysfs_lines(struct pblk *pblk, char *page)
 					l_mg->nr_lines);
 
 	sz += snprintf(page + sz, PAGE_SIZE - sz,
-		"GC: full:%d, high:%d, mid:%d, low:%d, empty:%d, queue:%d\n",
-			gc_full, gc_high, gc_mid, gc_low, gc_empty,
+		"GC: full:%d, high:%d, mid:%d, low:%d, empty:%d, werr: %d, queue:%d\n",
+			gc_full, gc_high, gc_mid, gc_low, gc_empty, gc_werr,
 			atomic_read(&pblk->gc.read_inflight_gc));
 
 	sz += snprintf(page + sz, PAGE_SIZE - sz,
diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index f62e432f7c91..f33c2c3993f0 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -136,6 +136,7 @@ static void pblk_map_remaining(struct pblk *pblk, struct ppa_addr *ppa)
 		}
 	}
 
+	line->w_err_gc->has_write_err = 1;
 	spin_unlock(&line->lock);
 }
 
@@ -279,6 +280,7 @@ static void pblk_end_io_write_meta(struct nvm_rq *rqd)
 	if (rqd->error) {
 		pblk_log_write_err(pblk, rqd);
 		pr_err("pblk: metadata I/O failed. Line %d\n", line->id);
+		line->w_err_gc->has_write_err = 1;
 	}
 
 	sync = atomic_add_return(rqd->nr_ppas, &emeta->sync);
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index a75ffae53a0d..7fbdfdc809d3 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -89,12 +89,14 @@ struct pblk_sec_meta {
 /* The number of GC lists and the rate-limiter states go together. This way the
  * rate-limiter can dictate how much GC is needed based on resource utilization.
  */
-#define PBLK_GC_NR_LISTS 3
+#define PBLK_GC_NR_LISTS 4
 
 enum {
-	PBLK_RL_HIGH = 1,
-	PBLK_RL_MID = 2,
-	PBLK_RL_LOW = 3,
+	PBLK_RL_OFF = 0,
+	PBLK_RL_WERR = 1,
+	PBLK_RL_HIGH = 2,
+	PBLK_RL_MID = 3,
+	PBLK_RL_LOW = 4
 };
 
 #define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * PBLK_MAX_REQ_ADDRS)
@@ -278,6 +280,8 @@ struct pblk_rl {
 	int rb_user_active;
 	int rb_gc_active;
 
+	atomic_t werr_lines;	/* Number of write error lines that needs gc */
+
 	struct timer_list u_timer;
 
 	unsigned long long nr_secs;
@@ -311,6 +315,7 @@ enum {
 	PBLK_LINEGC_MID = 23,
 	PBLK_LINEGC_HIGH = 24,
 	PBLK_LINEGC_FULL = 25,
+	PBLK_LINEGC_WERR = 26
 };
 
 #define PBLK_MAGIC 0x70626c6b /*pblk*/
@@ -412,6 +417,11 @@ struct pblk_smeta {
 	struct line_smeta *buf;		/* smeta buffer in persistent format */
 };
 
+struct pblk_w_err_gc {
+	int has_write_err;
+	__le64 *lba_list;
+};
+
 struct pblk_line {
 	struct pblk *pblk;
 	unsigned int id;		/* Line number corresponds to the
@@ -457,6 +467,8 @@ struct pblk_line {
 
 	struct kref ref;		/* Write buffer L2P references */
 
+	struct pblk_w_err_gc *w_err_gc;	/* Write error gc recovery metadata */
+
 	spinlock_t lock;		/* Necessary for invalid_bitmap only */
 };
 
@@ -488,6 +500,8 @@ struct pblk_line_mgmt {
 	struct list_head gc_mid_list;	/* Full lines ready to GC, mid isc */
 	struct list_head gc_low_list;	/* Full lines ready to GC, low isc */
 
+	struct list_head gc_werr_list;  /* Write err recovery list */
+
 	struct list_head gc_full_list;	/* Full lines ready to GC, no valid */
 	struct list_head gc_empty_list;	/* Full lines close, all valid */
 
@@ -890,6 +904,9 @@ void pblk_rl_free_lines_dec(struct pblk_rl *rl, struct pblk_line *line,
 			    bool used);
 int pblk_rl_is_limit(struct pblk_rl *rl);
 
+void pblk_rl_werr_line_in(struct pblk_rl *rl);
+void pblk_rl_werr_line_out(struct pblk_rl *rl);
+
 /*
  * pblk sysfs
  */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 14/20] lightnvm: pblk: fix smeta write error path
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (12 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 13/20] lightnvm: pblk: garbage collect lines with failed writes Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 15/20] lightnvm: proper error handling for pblk_bio_add_pages Matias Bjørling
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, Hans Holmberg, Matias Bjørling

From: Hans Holmberg <hans.holmberg@cnexlabs.com>

Smeta write errors were previously ignored. Skip these
lines instead and throw them back on the free
list, so the chunks will go through a reset cycle
before we attempt to use the line again.

Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 263da2e43567..e43093e27084 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -849,9 +849,10 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
 	atomic_dec(&pblk->inflight_io);
 
 	if (rqd.error) {
-		if (dir == PBLK_WRITE)
+		if (dir == PBLK_WRITE) {
 			pblk_log_write_err(pblk, &rqd);
-		else if (dir == PBLK_READ)
+			ret = 1;
+		} else if (dir == PBLK_READ)
 			pblk_log_read_err(pblk, &rqd);
 	}
 
@@ -1101,7 +1102,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
 
 	if (init && pblk_line_submit_smeta_io(pblk, line, off, PBLK_WRITE)) {
 		pr_debug("pblk: line smeta I/O failed. Retry\n");
-		return 1;
+		return 0;
 	}
 
 	bitmap_copy(line->invalid_bitmap, line->map_bitmap, lm->sec_per_line);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 15/20] lightnvm: proper error handling for pblk_bio_add_pages
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (13 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 14/20] lightnvm: pblk: fix smeta write error path Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 16/20] lightnvm: error handling when whole line is bad Matias Bjørling
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Igor Konopko, Marcin Dziegielewski,
	Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

Currently in case of error caused by bio_pc_add_page in
pblk_bio_add_pages two issues occur when calling from
pblk_rb_read_to_bio(). First one is in pblk_bio_free_pages, since we
are trying to free pages not allocated from our mempool. Second one
is the warn from dma_pool_free, that we are trying to free NULL
pointer dma.

This commit fix both issues.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index e43093e27084..a20b41c355c5 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -278,7 +278,9 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type)
 		return;
 	}
 
-	nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
+	if (rqd->meta_list)
+		nvm_dev_dma_free(dev->parent, rqd->meta_list,
+				rqd->dma_meta_list);
 	mempool_free(rqd, pool);
 }
 
@@ -316,7 +318,7 @@ int pblk_bio_add_pages(struct pblk *pblk, struct bio *bio, gfp_t flags,
 
 	return 0;
 err:
-	pblk_bio_free_pages(pblk, bio, 0, i - 1);
+	pblk_bio_free_pages(pblk, bio, (bio->bi_vcnt - i), i);
 	return -1;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 16/20] lightnvm: error handling when whole line is bad
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (14 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 15/20] lightnvm: proper error handling for pblk_bio_add_pages Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28 10:59   ` Javier Gonzalez
  2018-05-28  8:58 ` [GIT PULL 17/20] lightnvm: fix partial read error path Matias Bjørling
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Igor Konopko, Marcin Dziegielewski,
	Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

When all the blocks (chunks) in line are marked as bad (offline)
we shouldn't try to read smeta during init process.

Currently we are trying to do so by passing -1 as PPA address,
what causes multiple warnings, that we issuing IOs to out-of-bound
PPAs.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Updated title.
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index a20b41c355c5..e3e883547198 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -868,6 +868,11 @@ int pblk_line_read_smeta(struct pblk *pblk, struct pblk_line *line)
 {
 	u64 bpaddr = pblk_line_smeta_start(pblk, line);
 
+	if (bpaddr == -1) {
+		/* Whole line is bad - do not try to read smeta. */
+		return 1;
+	}
+
 	return pblk_line_submit_smeta_io(pblk, line, bpaddr, PBLK_READ_RECOV);
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 17/20] lightnvm: fix partial read error path
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (15 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 16/20] lightnvm: error handling when whole line is bad Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0 Matias Bjørling
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Igor Konopko, Marcin Dziegielewski,
	Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

When error occurs during bio_add_page on partial read path, pblk
tries to free pages twice.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-read.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index a2e678de428f..fa7b60f852d9 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -256,7 +256,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 	new_bio = bio_alloc(GFP_KERNEL, nr_holes);
 
 	if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
-		goto err;
+		goto err_add_pages;
 
 	if (nr_holes != new_bio->bi_vcnt) {
 		pr_err("pblk: malformed bio\n");
@@ -347,10 +347,10 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
 	return NVM_IO_OK;
 
 err:
-	pr_err("pblk: failed to perform partial read\n");
-
 	/* Free allocated pages in new bio */
-	pblk_bio_free_pages(pblk, bio, 0, new_bio->bi_vcnt);
+	pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
+err_add_pages:
+	pr_err("pblk: failed to perform partial read\n");
 	__pblk_end_io_read(pblk, rqd, false);
 	return NVM_IO_ERR;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (16 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 17/20] lightnvm: fix partial read error path Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28 11:02   ` Javier Gonzalez
  2018-05-28  8:58 ` [GIT PULL 19/20] lightnvm: pblk: add possibility to set write buffer size manually Matias Bjørling
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Marcin Dziegielewski, Igor Konopko,
	Matias Bjørling

From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>

Some devices can expose mw_cunits equal to 0, it can cause creation
of too small write buffer and cause performance to drop on write
workloads.

To handle that, we use the default value for MLC and beacause it
covers both 1.2 and 2.0 OC specification, setting up mw_cunits
in nvme_nvm_setup_12 function isn't longer necessary.

Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c | 10 +++++++++-
 drivers/nvme/host/lightnvm.c |  1 -
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index d65d2f972ccf..0f277744266b 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
 	atomic64_set(&pblk->nr_flush, 0);
 	pblk->nr_flush_rst = 0;
 
-	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
+	if (geo->mw_cunits) {
+		pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
+	} else {
+		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
+		/*
+		 * Some devices can expose mw_cunits equal to 0, so let's use
+		 * here default safe value for MLC.
+		 */
+	}
 
 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
 	max_write_ppas = pblk->min_write_pgs * geo->all_luns;
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 41279da799ed..c747792da915 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct nvme_nvm_id12 *id,
 
 	geo->ws_min = sec_per_pg;
 	geo->ws_opt = sec_per_pg;
-	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC safe values */
 
 	/* Do not impose values for maximum number of open blocks as it is
 	 * unspecified in 1.2. Users of 1.2 must be aware of this and eventually
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 19/20] lightnvm: pblk: add possibility to set write buffer size manually
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (17 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0 Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28  8:58 ` [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC Matias Bjørling
  2018-06-01 10:45 ` [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
  20 siblings, 0 replies; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Marcin Dziegielewski, Igor Konopko,
	Matias Bjørling

From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>

In some cases, users can want set write buffer size manually, e.g. to
adjust it to specific workload. This patch provides the possibility
to set write buffer size via module parameter feature.

Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 0f277744266b..25aa1e73984f 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -20,6 +20,11 @@
 
 #include "pblk.h"
 
+unsigned int write_buffer_size;
+
+module_param(write_buffer_size, uint, 0644);
+MODULE_PARM_DESC(write_buffer_size, "number of entries in a write buffer");
+
 static struct kmem_cache *pblk_ws_cache, *pblk_rec_cache, *pblk_g_rq_cache,
 				*pblk_w_rq_cache;
 static DECLARE_RWSEM(pblk_lock);
@@ -172,10 +177,15 @@ static int pblk_rwb_init(struct pblk *pblk)
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct nvm_geo *geo = &dev->geo;
 	struct pblk_rb_entry *entries;
-	unsigned long nr_entries;
+	unsigned long nr_entries, buffer_size;
 	unsigned int power_size, power_seg_sz;
 
-	nr_entries = pblk_rb_calculate_size(pblk->pgs_in_buffer);
+	if (write_buffer_size && (write_buffer_size > pblk->pgs_in_buffer))
+		buffer_size = write_buffer_size;
+	else
+		buffer_size = pblk->pgs_in_buffer;
+
+	nr_entries = pblk_rb_calculate_size(buffer_size);
 
 	entries = vzalloc(nr_entries * sizeof(struct pblk_rb_entry));
 	if (!entries)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (18 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 19/20] lightnvm: pblk: add possibility to set write buffer size manually Matias Bjørling
@ 2018-05-28  8:58 ` Matias Bjørling
  2018-05-28 10:51   ` Javier Gonzalez
  2018-06-01 10:45 ` [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
  20 siblings, 1 reply; 43+ messages in thread
From: Matias Bjørling @ 2018-05-28  8:58 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Igor Konopko, Marcin Dziegielewski,
	Matias Bjørling

From: Igor Konopko <igor.j.konopko@intel.com>

During sequential workloads we can met the case when almost all the
lines are fully written with data. In that case rate limiter will
significantly reduce the max number of requests for user IOs.

Unfortunately in the case when round buffer is flushed to drive and
the entries are not yet removed (which is ok, since there is still
enough free entries in round buffer for user IO) we hang on user
IO due to not enough entries in rate limiter. The reason is that
rate limiter user entries are decreased after freeing the round
buffer entries, which does not happen if there is still plenty of
space in round buffer.

This patch forces freeing the round buffer by calling
pblk_rb_sync_l2p and thus making new free entries in rate limiter,
when there is no enough of them for user IO.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Reworded description.
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
---
 drivers/lightnvm/pblk-init.c | 2 ++
 drivers/lightnvm/pblk-rb.c   | 7 +++----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 25aa1e73984f..9d7d9e3b8506 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1159,7 +1159,9 @@ static void pblk_tear_down(struct pblk *pblk, bool graceful)
 		__pblk_pipeline_flush(pblk);
 	__pblk_pipeline_stop(pblk);
 	pblk_writer_stop(pblk);
+	spin_lock(&pblk->rwb.w_lock);
 	pblk_rb_sync_l2p(&pblk->rwb);
+	spin_unlock(&pblk->rwb.w_lock);
 	pblk_rl_free(&pblk->rl);
 
 	pr_debug("pblk: consistent tear down (graceful:%d)\n", graceful);
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 1b74ec51a4ad..91824cd3e8d8 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -266,21 +266,18 @@ static int pblk_rb_update_l2p(struct pblk_rb *rb, unsigned int nr_entries,
  * Update the l2p entry for all sectors stored on the write buffer. This means
  * that all future lookups to the l2p table will point to a device address, not
  * to the cacheline in the write buffer.
+ * Caller must ensure that rb->w_lock is taken.
  */
 void pblk_rb_sync_l2p(struct pblk_rb *rb)
 {
 	unsigned int sync;
 	unsigned int to_update;
 
-	spin_lock(&rb->w_lock);
-
 	/* Protect from reads and writes */
 	sync = smp_load_acquire(&rb->sync);
 
 	to_update = pblk_rb_ring_count(sync, rb->l2p_update, rb->nr_entries);
 	__pblk_rb_update_l2p(rb, to_update);
-
-	spin_unlock(&rb->w_lock);
 }
 
 /*
@@ -462,6 +459,8 @@ int pblk_rb_may_write_user(struct pblk_rb *rb, struct bio *bio,
 	spin_lock(&rb->w_lock);
 	io_ret = pblk_rl_user_may_insert(&pblk->rl, nr_entries);
 	if (io_ret) {
+		/* Sync RB & L2P in order to update rate limiter values */
+		pblk_rb_sync_l2p(rb);
 		spin_unlock(&rb->w_lock);
 		return io_ret;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC
  2018-05-28  8:58 ` [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC Matias Bjørling
@ 2018-05-28 10:51   ` Javier Gonzalez
  2018-05-29 13:07     ` Konopko, Igor J
  0 siblings, 1 reply; 43+ messages in thread
From: Javier Gonzalez @ 2018-05-28 10:51 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Jens Axboe, linux-block, linux-kernel, Konopko, Igor J,
	Marcin Dziegielewski

[-- Attachment #1: Type: text/plain, Size: 2514 bytes --]

Javier

I somehow missed these patches in the mailing list. Sorry for coming
with feedback this late. I'll look at my filters - in any case, would
you mind Cc'ing me in the future?

> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> From: Igor Konopko <igor.j.konopko@intel.com>
> 
> During sequential workloads we can met the case when almost all the
> lines are fully written with data. In that case rate limiter will
> significantly reduce the max number of requests for user IOs.

Do you mean random writes? On fully sequential, a line will either be
fully written, fully invalidated or on its way to be written. When
invalidating the line, then the whole line will be invalidated and GC
will free it without having to move valid data.

> 
> Unfortunately in the case when round buffer is flushed to drive and
> the entries are not yet removed (which is ok, since there is still
> enough free entries in round buffer for user IO) we hang on user
> IO due to not enough entries in rate limiter. The reason is that
> rate limiter user entries are decreased after freeing the round
> buffer entries, which does not happen if there is still plenty of
> space in round buffer.
> 
> This patch forces freeing the round buffer by calling
> pblk_rb_sync_l2p and thus making new free entries in rate limiter,
> when there is no enough of them for user IO.

I can see why you might have problems with very low OP due to the rate
limiter, but unfortunately this is not a good way of solving the
problem. When you do this, you basically make the L2P to point to the
device instead of pointing to the write cache, which in essence bypasses
mw_cuints. As a result, if a read comes in to one of the synced entries,
it will violate the device-host contract and most probably fail (for
sure fail on > SLC).

I think that the right way of solving this problem is separating the
write and GC buffers and then assigning tokens to them. The write thread
will then consume both buffers based on these tokens. In this case, user
I/O will have a buffer for itself, which will be guaranteed to advance
at the rate the rate-limiter is allowing it to. Note that the 2 buffers
can be a single buffer with a new set of pointers so that the lookup can
be done with a single bit.

I have been looking for time to implement this for a while. If you want
to give it a go, we can talk and I can give you some pointers on
potential issues I have thought about.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL 16/20] lightnvm: error handling when whole line is bad
  2018-05-28  8:58 ` [GIT PULL 16/20] lightnvm: error handling when whole line is bad Matias Bjørling
@ 2018-05-28 10:59   ` Javier Gonzalez
  2018-05-29 13:15     ` Konopko, Igor J
  0 siblings, 1 reply; 43+ messages in thread
From: Javier Gonzalez @ 2018-05-28 10:59 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Jens Axboe, linux-block, LKML, Konopko, Igor J, Marcin Dziegielewski

> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> From: Igor Konopko <igor.j.konopko@intel.com>
> 
> When all the blocks (chunks) in line are marked as bad (offline)
> we shouldn't try to read smeta during init process.
> 
> Currently we are trying to do so by passing -1 as PPA address,
> what causes multiple warnings, that we issuing IOs to out-of-bound
> PPAs.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> Updated title.
> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
> ---
> drivers/lightnvm/pblk-core.c | 5 +++++
> 1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index a20b41c355c5..e3e883547198 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -868,6 +868,11 @@ int pblk_line_read_smeta(struct pblk *pblk, struct pblk_line *line)
> {
> 	u64 bpaddr = pblk_line_smeta_start(pblk, line);
> 
> +	if (bpaddr == -1) {
> +		/* Whole line is bad - do not try to read smeta. */
> +		return 1;
> +	}

This case cannot occur on the only user of the function
(pblk_recov_l2p()). On the previous check (pblk_line_was_written()), we
verify the state of the line and the position of the first good chunk. In
the case of a bad line (less chunks than a given threshold to allow
emeta), the recovery will not be carried out in the line.

Javier

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-05-28  8:58 ` [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0 Matias Bjørling
@ 2018-05-28 11:02   ` Javier Gonzalez
  2018-06-04 10:09       ` Dziegielewski, Marcin
  0 siblings, 1 reply; 43+ messages in thread
From: Javier Gonzalez @ 2018-05-28 11:02 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Jens Axboe, linux-block, linux-kernel, Marcin Dziegielewski,
	Konopko, Igor J

[-- Attachment #1: Type: text/plain, Size: 2564 bytes --]

> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> 
> Some devices can expose mw_cunits equal to 0, it can cause creation
> of too small write buffer and cause performance to drop on write
> workloads.
> 
> To handle that, we use the default value for MLC and beacause it
> covers both 1.2 and 2.0 OC specification, setting up mw_cunits
> in nvme_nvm_setup_12 function isn't longer necessary.
> 
> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
> ---
> drivers/lightnvm/pblk-init.c | 10 +++++++++-
> drivers/nvme/host/lightnvm.c |  1 -
> 2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index d65d2f972ccf..0f277744266b 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
> 	atomic64_set(&pblk->nr_flush, 0);
> 	pblk->nr_flush_rst = 0;
> 
> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> +	if (geo->mw_cunits) {
> +		pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> +	} else {
> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
> +		/*
> +		 * Some devices can expose mw_cunits equal to 0, so let's use
> +		 * here default safe value for MLC.
> +		 */
> +	}
> 
> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns;
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index 41279da799ed..c747792da915 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct nvme_nvm_id12 *id,
> 
> 	geo->ws_min = sec_per_pg;
> 	geo->ws_opt = sec_per_pg;
> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC safe values */
> 
> 	/* Do not impose values for maximum number of open blocks as it is
> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and eventually
> --
> 2.11.0

By doing this, 1.2 future users (beyond pblk), will fail to have a valid
mw_cunits value. It's ok to deal with the 0 case in pblk, but I believe
that we should have the default value for 1.2 either way.

A more generic way of doing this would be to have a default value for
2.0 too, in case mw_cunits is reported as 0.

Javier


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC
  2018-05-28 10:51   ` Javier Gonzalez
@ 2018-05-29 13:07     ` Konopko, Igor J
  2018-05-29 17:58       ` Javier Gonzalez
  0 siblings, 1 reply; 43+ messages in thread
From: Konopko, Igor J @ 2018-05-29 13:07 UTC (permalink / raw)
  To: Javier Gonzalez, Matias Bjorling
  Cc: Jens Axboe, linux-block, linux-kernel, Dziegielewski, Marcin

> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> Do you mean random writes? On fully sequential, a line will either be
> fully written, fully invalidated or on its way to be written. When
> invalidating the line, then the whole line will be invalidated and GC
> will free it without having to move valid data.

I meant sequential writes, since this is the easiest way to reach rl->rb_state = PBLK_RL_LOW. When we updating this values inside __pblk_rl_update_rates(), most of the times rl->rb_user_cnt will became equal to rl->rb_user_max - which means that we cannot handle any more user IOs. In that case pblk_rl_user_may_insert() will return false, no one will call pblk_rb_update_l2p(), rl->rb_user_cnt will not be decreased, so user IOs will stuck for forever.

> I can see why you might have problems with very low OP due to the rate
> limiter, but unfortunately this is not a good way of solving the
> problem. When you do this, you basically make the L2P to point to the
> device instead of pointing to the write cache, which in essence bypasses
> mw_cuints. As a result, if a read comes in to one of the synced entries,
> it will violate the device-host contract and most probably fail (for
> sure fail on > SLC).

What about using on that path some modified version of pblk_rb_sync_l2p() which will synchronize all the RB entries except last mw_cunits number of elements?

Also you wrote about mw_cuints definitely makes sense, but even without my changes I believe that we can lead into such a situation - especially for pblk with small number of LUNs assigned under  write IOs with high sector count. Pblk_rb_update_l2p() does not explicitly takes mw_cunints this value into consideration right now.

> I think that the right way of solving this problem is separating the
> write and GC buffers and then assigning tokens to them. The write thread
> will then consume both buffers based on these tokens. In this case, user
> I/O will have a buffer for itself, which will be guaranteed to advance
> at the rate the rate-limiter is allowing it to. Note that the 2 buffers
> can be a single buffer with a new set of pointers so that the lookup can
> be done with a single bit.
> 
> I have been looking for time to implement this for a while. If you want
> to give it a go, we can talk and I can give you some pointers on
> potential issues I have thought about.

I believe this is interesting option - we can discuss this for one of next releases.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [GIT PULL 16/20] lightnvm: error handling when whole line is bad
  2018-05-28 10:59   ` Javier Gonzalez
@ 2018-05-29 13:15     ` Konopko, Igor J
  2018-05-29 18:29       ` Javier Gonzalez
  0 siblings, 1 reply; 43+ messages in thread
From: Konopko, Igor J @ 2018-05-29 13:15 UTC (permalink / raw)
  To: Javier Gonzalez, Matias Bjorling
  Cc: Jens Axboe, linux-block, LKML, Dziegielewski, Marcin

> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> This case cannot occur on the only user of the function
> (pblk_recov_l2p()). On the previous check (pblk_line_was_written()), we
> verify the state of the line and the position of the first good chunk. In
> the case of a bad line (less chunks than a given threshold to allow
> emeta), the recovery will not be carried out in the line.
You are right. It looks that during my testing there was some inconsistency between chunks state table which is verified inside pblk_line_was_written() and blk_bitmap which was read from emeta and is verified in pblk_line_smeta_start(). I'm living decision to maintainers whether we should keep this sanity check or not - it really just pass gracefully the result from pblk_line_smeta_start() where similar sanity check is present.

Igor

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC
  2018-05-29 13:07     ` Konopko, Igor J
@ 2018-05-29 17:58       ` Javier Gonzalez
  0 siblings, 0 replies; 43+ messages in thread
From: Javier Gonzalez @ 2018-05-29 17:58 UTC (permalink / raw)
  To: Konopko, Igor J
  Cc: Matias Bjørling, Jens Axboe, linux-block, linux-kernel,
	Dziegielewski, Marcin

[-- Attachment #1: Type: text/plain, Size: 3626 bytes --]


> On 29 May 2018, at 15.07, Konopko, Igor J <igor.j.konopko@intel.com> wrote:
> 
>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>> Do you mean random writes? On fully sequential, a line will either be
>> fully written, fully invalidated or on its way to be written. When
>> invalidating the line, then the whole line will be invalidated and GC
>> will free it without having to move valid data.
> 
> I meant sequential writes, since this is the easiest way to reach
> rl->rb_state = PBLK_RL_LOW. When we updating this values inside
> __pblk_rl_update_rates(), most of the times rl->rb_user_cnt will
> became equal to rl->rb_user_max - which means that we cannot handle
> any more user IOs. In that case pblk_rl_user_may_insert() will return
> false, no one will call pblk_rb_update_l2p(), rl->rb_user_cnt will not
> be decreased, so user IOs will stuck for forever.
> 

What OP are you using? Even with a 5%, full lines should start being
freed as they are completely invalid. But fair enough, if you decide to
optimize for a guaranteed sequential workload where the OP is only to
support grown bad blocks, you will run into this issue.

>> I can see why you might have problems with very low OP due to the rate
>> limiter, but unfortunately this is not a good way of solving the
>> problem. When you do this, you basically make the L2P to point to the
>> device instead of pointing to the write cache, which in essence bypasses
>> mw_cuints. As a result, if a read comes in to one of the synced entries,
>> it will violate the device-host contract and most probably fail (for
>> sure fail on > SLC).
> 
> What about using on that path some modified version of
> pblk_rb_sync_l2p() which will synchronize all the RB entries except
> last mw_cunits number of elements?
> 

Typically, the size of the buffer is the closest upper power of two to
nr_luns * mw_cuints. So in essence, we only update the L2P as we wrap
up. You could go down to match nr_luns * mw_cuints exactly, which
depending on the media geometry can gain you some extra user entries.

> Also you wrote about mw_cuints definitely makes sense, but even
> without my changes I believe that we can lead into such a situation -
> especially for pblk with small number of LUNs assigned under write IOs
> with high sector count. Pblk_rb_update_l2p() does not explicitly takes
> mw_cunints this value into consideration right now.
> 

As mentioned, the size of the buffer is always over nr_luns * mw_cuints
and we only update when we wrap up - this is, when the mem pointer
(which writes new data to the write buffer), wraps and starts writing
new data. So this case is guaranteed not to happen. In fact, this
particular case is what inspired the design of the write buffer and the
current 5 pointers to extend the typical head and tail.

>> I think that the right way of solving this problem is separating the
>> write and GC buffers and then assigning tokens to them. The write thread
>> will then consume both buffers based on these tokens. In this case, user
>> I/O will have a buffer for itself, which will be guaranteed to advance
>> at the rate the rate-limiter is allowing it to. Note that the 2 buffers
>> can be a single buffer with a new set of pointers so that the lookup can
>> be done with a single bit.
>> 
>> I have been looking for time to implement this for a while. If you want
>> to give it a go, we can talk and I can give you some pointers on
>> potential issues I have thought about.
> 
> I believe this is interesting option - we can discuss this for one of
> next releases.

Sure. Ping me if you want to discuss in more detail.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL 16/20] lightnvm: error handling when whole line is bad
  2018-05-29 13:15     ` Konopko, Igor J
@ 2018-05-29 18:29       ` Javier Gonzalez
  0 siblings, 0 replies; 43+ messages in thread
From: Javier Gonzalez @ 2018-05-29 18:29 UTC (permalink / raw)
  To: Konopko, Igor J
  Cc: Matias Bjørling, Jens Axboe, linux-block, LKML,
	Dziegielewski, Marcin

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]

> On 29 May 2018, at 15.15, Konopko, Igor J <igor.j.konopko@intel.com> wrote:
> 
>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>> This case cannot occur on the only user of the function
>> (pblk_recov_l2p()). On the previous check (pblk_line_was_written()), we
>> verify the state of the line and the position of the first good chunk. In
>> the case of a bad line (less chunks than a given threshold to allow
>> emeta), the recovery will not be carried out in the line.
> You are right. It looks that during my testing there was some
> inconsistency between chunks state table which is verified inside
> pblk_line_was_written() and blk_bitmap which was read from emeta and
> is verified in pblk_line_smeta_start(). I'm living decision to
> maintainers whether we should keep this sanity check or not - it
> really just pass gracefully the result from pblk_line_smeta_start()
> where similar sanity check is present.
> 

Let's avoid an extra check now that there is no users for it in the
current flow. If we have a new use for pblk_line_smeta_start() on a flow
that does cannot offer the same guarantees, we can add it at that point.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL 00/20] lightnvm updates for 4.18
  2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
                   ` (19 preceding siblings ...)
  2018-05-28  8:58 ` [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC Matias Bjørling
@ 2018-06-01 10:45 ` Matias Bjørling
  2018-06-01 12:36     ` Jens Axboe
  20 siblings, 1 reply; 43+ messages in thread
From: Matias Bjørling @ 2018-06-01 10:45 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, Javier González, Konopko, Igor J,
	Marcin Dziegielewski

On 05/28/2018 10:58 AM, Matias Bjørling wrote:
> Hi Jens,
> 
> Please pick up the following patches.
> 
>   - Hans reworked the write error recovery path in pblk.
>   - Igor added extra error handling for lines, and fixed a bug in the
>     pblk ringbuffer during GC.
>   - Javier refactored the pblk code a bit, added extra error
>     handling, and added checks to verify that data returned from drive
>     is appropriate.
>   - Marcin added some extra logic to manage the write buffer. Now
>     MW_CUNITS can be zero and the size of write buffer can be changed
>     at module load time.
> 
> Thanks,
> Matias
> 
> Hans Holmberg (3):
>    lightnvm: pblk: rework write error recovery path
>    lightnvm: pblk: garbage collect lines with failed writes
>    lightnvm: pblk: fix smeta write error path
> 
> Igor Konopko (4):
>    lightnvm: proper error handling for pblk_bio_add_pages
>    lightnvm: error handling when whole line is bad
>    lightnvm: fix partial read error path
>    lightnvm: pblk: sync RB and RL states during GC
> 
> Javier González (11):
>    lightnvm: pblk: fail gracefully on line alloc. failure
>    lightnvm: pblk: recheck for bad lines at runtime
>    lightnvm: pblk: check read lba on gc path
>    lightnvn: pblk: improve error msg on corrupted LBAs
>    lightnvm: pblk: warn in case of corrupted write buffer
>    lightnvm: pblk: return NVM_ error on failed submission
>    lightnvm: pblk: remove unnecessary indirection
>    lightnvm: pblk: remove unnecessary argument
>    lightnvm: pblk: check for chunk size before allocating it
>    lightnvn: pass flag on graceful teardown to targets
>    lightnvm: pblk: remove dead function
> 
> Marcin Dziegielewski (2):
>    lightnvm: pblk: handle case when mw_cunits equals to 0
>    lightnvm: pblk: add possibility to set write buffer size manually
> 
>   drivers/lightnvm/core.c          |  10 +-
>   drivers/lightnvm/pblk-core.c     | 149 +++++++++++++++------
>   drivers/lightnvm/pblk-gc.c       | 112 ++++++++++------
>   drivers/lightnvm/pblk-init.c     | 112 +++++++++++-----
>   drivers/lightnvm/pblk-map.c      |  33 +++--
>   drivers/lightnvm/pblk-rb.c       |  51 +------
>   drivers/lightnvm/pblk-read.c     |  83 +++++++++---
>   drivers/lightnvm/pblk-recovery.c |  91 -------------
>   drivers/lightnvm/pblk-rl.c       |  29 +++-
>   drivers/lightnvm/pblk-sysfs.c    |  15 ++-
>   drivers/lightnvm/pblk-write.c    | 281 +++++++++++++++++++++++++--------------
>   drivers/lightnvm/pblk.h          |  43 +++---
>   drivers/nvme/host/lightnvm.c     |   1 -
>   include/linux/lightnvm.h         |   2 +-
>   14 files changed, 604 insertions(+), 408 deletions(-)
> 

Hi Jens,

Javier had some comments to 16, 18, and 20. The rest is ready to go. 
Would you like me to resend the patches?

Thank you!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL 00/20] lightnvm updates for 4.18
  2018-06-01 10:45 ` [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
@ 2018-06-01 12:36     ` Jens Axboe
  0 siblings, 0 replies; 43+ messages in thread
From: Jens Axboe @ 2018-06-01 12:36 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: axboe, linux-block, linux-kernel, Javier González, Konopko,
	Igor J, Marcin Dziegielewski

On Jun 1, 2018, at 4:45 AM, Matias Bj=C3=B8rling <mb@lightnvm.io> wrote:
>=20
>> On 05/28/2018 10:58 AM, Matias Bj=C3=B8rling wrote:
>> Hi Jens,
>> Please pick up the following patches.
>>  - Hans reworked the write error recovery path in pblk.
>>  - Igor added extra error handling for lines, and fixed a bug in the
>>    pblk ringbuffer during GC.
>>  - Javier refactored the pblk code a bit, added extra error
>>    handling, and added checks to verify that data returned from drive
>>    is appropriate.
>>  - Marcin added some extra logic to manage the write buffer. Now
>>    MW_CUNITS can be zero and the size of write buffer can be changed
>>    at module load time.
>> Thanks,
>> Matias
>> Hans Holmberg (3):
>>   lightnvm: pblk: rework write error recovery path
>>   lightnvm: pblk: garbage collect lines with failed writes
>>   lightnvm: pblk: fix smeta write error path
>> Igor Konopko (4):
>>   lightnvm: proper error handling for pblk_bio_add_pages
>>   lightnvm: error handling when whole line is bad
>>   lightnvm: fix partial read error path
>>   lightnvm: pblk: sync RB and RL states during GC
>> Javier Gonz=C3=A1lez (11):
>>   lightnvm: pblk: fail gracefully on line alloc. failure
>>   lightnvm: pblk: recheck for bad lines at runtime
>>   lightnvm: pblk: check read lba on gc path
>>   lightnvn: pblk: improve error msg on corrupted LBAs
>>   lightnvm: pblk: warn in case of corrupted write buffer
>>   lightnvm: pblk: return NVM_ error on failed submission
>>   lightnvm: pblk: remove unnecessary indirection
>>   lightnvm: pblk: remove unnecessary argument
>>   lightnvm: pblk: check for chunk size before allocating it
>>   lightnvn: pass flag on graceful teardown to targets
>>   lightnvm: pblk: remove dead function
>> Marcin Dziegielewski (2):
>>   lightnvm: pblk: handle case when mw_cunits equals to 0
>>   lightnvm: pblk: add possibility to set write buffer size manually
>>  drivers/lightnvm/core.c          |  10 +-
>>  drivers/lightnvm/pblk-core.c     | 149 +++++++++++++++------
>>  drivers/lightnvm/pblk-gc.c       | 112 ++++++++++------
>>  drivers/lightnvm/pblk-init.c     | 112 +++++++++++-----
>>  drivers/lightnvm/pblk-map.c      |  33 +++--
>>  drivers/lightnvm/pblk-rb.c       |  51 +------
>>  drivers/lightnvm/pblk-read.c     |  83 +++++++++---
>>  drivers/lightnvm/pblk-recovery.c |  91 -------------
>>  drivers/lightnvm/pblk-rl.c       |  29 +++-
>>  drivers/lightnvm/pblk-sysfs.c    |  15 ++-
>>  drivers/lightnvm/pblk-write.c    | 281 +++++++++++++++++++++++++--------=
------
>>  drivers/lightnvm/pblk.h          |  43 +++---
>>  drivers/nvme/host/lightnvm.c     |   1 -
>>  include/linux/lightnvm.h         |   2 +-
>>  14 files changed, 604 insertions(+), 408 deletions(-)
>=20
> Hi Jens,
>=20
> Javier had some comments to 16, 18, and 20. The rest is ready to go. Would=
 you like me to resend the patches?

Yes please.=20

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL 00/20] lightnvm updates for 4.18
@ 2018-06-01 12:36     ` Jens Axboe
  0 siblings, 0 replies; 43+ messages in thread
From: Jens Axboe @ 2018-06-01 12:36 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: axboe, linux-block, linux-kernel, Javier González, Konopko,
	Igor J, Marcin Dziegielewski

On Jun 1, 2018, at 4:45 AM, Matias Bjørling <mb@lightnvm.io> wrote:
> 
>> On 05/28/2018 10:58 AM, Matias Bjørling wrote:
>> Hi Jens,
>> Please pick up the following patches.
>>  - Hans reworked the write error recovery path in pblk.
>>  - Igor added extra error handling for lines, and fixed a bug in the
>>    pblk ringbuffer during GC.
>>  - Javier refactored the pblk code a bit, added extra error
>>    handling, and added checks to verify that data returned from drive
>>    is appropriate.
>>  - Marcin added some extra logic to manage the write buffer. Now
>>    MW_CUNITS can be zero and the size of write buffer can be changed
>>    at module load time.
>> Thanks,
>> Matias
>> Hans Holmberg (3):
>>   lightnvm: pblk: rework write error recovery path
>>   lightnvm: pblk: garbage collect lines with failed writes
>>   lightnvm: pblk: fix smeta write error path
>> Igor Konopko (4):
>>   lightnvm: proper error handling for pblk_bio_add_pages
>>   lightnvm: error handling when whole line is bad
>>   lightnvm: fix partial read error path
>>   lightnvm: pblk: sync RB and RL states during GC
>> Javier González (11):
>>   lightnvm: pblk: fail gracefully on line alloc. failure
>>   lightnvm: pblk: recheck for bad lines at runtime
>>   lightnvm: pblk: check read lba on gc path
>>   lightnvn: pblk: improve error msg on corrupted LBAs
>>   lightnvm: pblk: warn in case of corrupted write buffer
>>   lightnvm: pblk: return NVM_ error on failed submission
>>   lightnvm: pblk: remove unnecessary indirection
>>   lightnvm: pblk: remove unnecessary argument
>>   lightnvm: pblk: check for chunk size before allocating it
>>   lightnvn: pass flag on graceful teardown to targets
>>   lightnvm: pblk: remove dead function
>> Marcin Dziegielewski (2):
>>   lightnvm: pblk: handle case when mw_cunits equals to 0
>>   lightnvm: pblk: add possibility to set write buffer size manually
>>  drivers/lightnvm/core.c          |  10 +-
>>  drivers/lightnvm/pblk-core.c     | 149 +++++++++++++++------
>>  drivers/lightnvm/pblk-gc.c       | 112 ++++++++++------
>>  drivers/lightnvm/pblk-init.c     | 112 +++++++++++-----
>>  drivers/lightnvm/pblk-map.c      |  33 +++--
>>  drivers/lightnvm/pblk-rb.c       |  51 +------
>>  drivers/lightnvm/pblk-read.c     |  83 +++++++++---
>>  drivers/lightnvm/pblk-recovery.c |  91 -------------
>>  drivers/lightnvm/pblk-rl.c       |  29 +++-
>>  drivers/lightnvm/pblk-sysfs.c    |  15 ++-
>>  drivers/lightnvm/pblk-write.c    | 281 +++++++++++++++++++++++++--------------
>>  drivers/lightnvm/pblk.h          |  43 +++---
>>  drivers/nvme/host/lightnvm.c     |   1 -
>>  include/linux/lightnvm.h         |   2 +-
>>  14 files changed, 604 insertions(+), 408 deletions(-)
> 
> Hi Jens,
> 
> Javier had some comments to 16, 18, and 20. The rest is ready to go. Would you like me to resend the patches?

Yes please. 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-05-28 11:02   ` Javier Gonzalez
@ 2018-06-04 10:09       ` Dziegielewski, Marcin
  0 siblings, 0 replies; 43+ messages in thread
From: Dziegielewski, Marcin @ 2018-06-04 10:09 UTC (permalink / raw)
  To: Javier Gonzalez, Matias Bjorling
  Cc: Jens Axboe, linux-block, linux-kernel, Konopko, Igor J

DQpGcmlzdCBvZiBhbGwgSSB3YW50IHRvIHNheSBzb3JyeSBmb3IgbGF0ZSByZXNwb25zZSAtIEkg
d2FzIG9uIGhvbGlkYXkuDQoNCj4gRnJvbTogSmF2aWVyIEdvbnphbGV6IFttYWlsdG86amF2aWVy
QGNuZXhsYWJzLmNvbV0NCj4gU2VudDogTW9uZGF5LCBNYXkgMjgsIDIwMTggMTowMyBQTQ0KPiBU
bzogTWF0aWFzIEJqw7hybGluZyA8bWJAbGlnaHRudm0uaW8+DQo+IENjOiBKZW5zIEF4Ym9lIDxh
eGJvZUBmYi5jb20+OyBsaW51eC1ibG9ja0B2Z2VyLmtlcm5lbC5vcmc7IGxpbnV4LQ0KPiBrZXJu
ZWxAdmdlci5rZXJuZWwub3JnOyBEemllZ2llbGV3c2tpLCBNYXJjaW4NCj4gPG1hcmNpbi5kemll
Z2llbGV3c2tpQGludGVsLmNvbT47IEtvbm9wa28sIElnb3IgSg0KPiA8aWdvci5qLmtvbm9wa29A
aW50ZWwuY29tPg0KPiBTdWJqZWN0OiBSZTogW0dJVCBQVUxMIDE4LzIwXSBsaWdodG52bTogcGJs
azogaGFuZGxlIGNhc2Ugd2hlbiBtd19jdW5pdHMNCj4gZXF1YWxzIHRvIDANCj4gDQo+ID4gT24g
MjggTWF5IDIwMTgsIGF0IDEwLjU4LCBNYXRpYXMgQmrDuHJsaW5nIDxtYkBsaWdodG52bS5pbz4g
d3JvdGU6DQo+ID4NCj4gPiBGcm9tOiBNYXJjaW4gRHppZWdpZWxld3NraSA8bWFyY2luLmR6aWVn
aWVsZXdza2lAaW50ZWwuY29tPg0KPiA+DQo+ID4gU29tZSBkZXZpY2VzIGNhbiBleHBvc2UgbXdf
Y3VuaXRzIGVxdWFsIHRvIDAsIGl0IGNhbiBjYXVzZSBjcmVhdGlvbiBvZg0KPiA+IHRvbyBzbWFs
bCB3cml0ZSBidWZmZXIgYW5kIGNhdXNlIHBlcmZvcm1hbmNlIHRvIGRyb3Agb24gd3JpdGUNCj4g
PiB3b3JrbG9hZHMuDQo+ID4NCj4gPiBUbyBoYW5kbGUgdGhhdCwgd2UgdXNlIHRoZSBkZWZhdWx0
IHZhbHVlIGZvciBNTEMgYW5kIGJlYWNhdXNlIGl0DQo+ID4gY292ZXJzIGJvdGggMS4yIGFuZCAy
LjAgT0Mgc3BlY2lmaWNhdGlvbiwgc2V0dGluZyB1cCBtd19jdW5pdHMgaW4NCj4gPiBudm1lX252
bV9zZXR1cF8xMiBmdW5jdGlvbiBpc24ndCBsb25nZXIgbmVjZXNzYXJ5Lg0KPiA+DQo+ID4gU2ln
bmVkLW9mZi1ieTogTWFyY2luIER6aWVnaWVsZXdza2kgPG1hcmNpbi5kemllZ2llbGV3c2tpQGlu
dGVsLmNvbT4NCj4gPiBTaWduZWQtb2ZmLWJ5OiBJZ29yIEtvbm9wa28gPGlnb3Iuai5rb25vcGtv
QGludGVsLmNvbT4NCj4gPiBTaWduZWQtb2ZmLWJ5OiBNYXRpYXMgQmrDuHJsaW5nIDxtYkBsaWdo
dG52bS5pbz4NCj4gPiAtLS0NCj4gPiBkcml2ZXJzL2xpZ2h0bnZtL3BibGstaW5pdC5jIHwgMTAg
KysrKysrKysrLQ0KPiA+IGRyaXZlcnMvbnZtZS9ob3N0L2xpZ2h0bnZtLmMgfCAgMSAtDQo+ID4g
MiBmaWxlcyBjaGFuZ2VkLCA5IGluc2VydGlvbnMoKyksIDIgZGVsZXRpb25zKC0pDQo+ID4NCj4g
PiBkaWZmIC0tZ2l0IGEvZHJpdmVycy9saWdodG52bS9wYmxrLWluaXQuYw0KPiA+IGIvZHJpdmVy
cy9saWdodG52bS9wYmxrLWluaXQuYyBpbmRleCBkNjVkMmY5NzJjY2YuLjBmMjc3NzQ0MjY2YiAx
MDA2NDQNCj4gPiAtLS0gYS9kcml2ZXJzL2xpZ2h0bnZtL3BibGstaW5pdC5jDQo+ID4gKysrIGIv
ZHJpdmVycy9saWdodG52bS9wYmxrLWluaXQuYw0KPiA+IEBAIC0zNTYsNyArMzU2LDE1IEBAIHN0
YXRpYyBpbnQgcGJsa19jb3JlX2luaXQoc3RydWN0IHBibGsgKnBibGspDQo+ID4gCWF0b21pYzY0
X3NldCgmcGJsay0+bnJfZmx1c2gsIDApOw0KPiA+IAlwYmxrLT5ucl9mbHVzaF9yc3QgPSAwOw0K
PiA+DQo+ID4gLQlwYmxrLT5wZ3NfaW5fYnVmZmVyID0gZ2VvLT5td19jdW5pdHMgKiBnZW8tPmFs
bF9sdW5zOw0KPiA+ICsJaWYgKGdlby0+bXdfY3VuaXRzKSB7DQo+ID4gKwkJcGJsay0+cGdzX2lu
X2J1ZmZlciA9IGdlby0+bXdfY3VuaXRzICogZ2VvLT5hbGxfbHVuczsNCj4gPiArCX0gZWxzZSB7
DQo+ID4gKwkJcGJsay0+cGdzX2luX2J1ZmZlciA9IChnZW8tPndzX29wdCA8PCAzKSAqIGdlby0+
YWxsX2x1bnM7DQo+ID4gKwkJLyoNCj4gPiArCQkgKiBTb21lIGRldmljZXMgY2FuIGV4cG9zZSBt
d19jdW5pdHMgZXF1YWwgdG8gMCwgc28gbGV0J3MNCj4gdXNlDQo+ID4gKwkJICogaGVyZSBkZWZh
dWx0IHNhZmUgdmFsdWUgZm9yIE1MQy4NCj4gPiArCQkgKi8NCj4gPiArCX0NCj4gPg0KPiA+IAlw
YmxrLT5taW5fd3JpdGVfcGdzID0gZ2VvLT53c19vcHQgKiAoZ2VvLT5jc2VjcyAvIFBBR0VfU0la
RSk7DQo+ID4gCW1heF93cml0ZV9wcGFzID0gcGJsay0+bWluX3dyaXRlX3BncyAqIGdlby0+YWxs
X2x1bnM7IGRpZmYgLS1naXQNCj4gPiBhL2RyaXZlcnMvbnZtZS9ob3N0L2xpZ2h0bnZtLmMgYi9k
cml2ZXJzL252bWUvaG9zdC9saWdodG52bS5jIGluZGV4DQo+ID4gNDEyNzlkYTc5OWVkLi5jNzQ3
NzkyZGE5MTUgMTAwNjQ0DQo+ID4gLS0tIGEvZHJpdmVycy9udm1lL2hvc3QvbGlnaHRudm0uYw0K
PiA+ICsrKyBiL2RyaXZlcnMvbnZtZS9ob3N0L2xpZ2h0bnZtLmMNCj4gPiBAQCAtMzM4LDcgKzMz
OCw2IEBAIHN0YXRpYyBpbnQgbnZtZV9udm1fc2V0dXBfMTIoc3RydWN0DQo+IG52bWVfbnZtX2lk
MTINCj4gPiAqaWQsDQo+ID4NCj4gPiAJZ2VvLT53c19taW4gPSBzZWNfcGVyX3BnOw0KPiA+IAln
ZW8tPndzX29wdCA9IHNlY19wZXJfcGc7DQo+ID4gLQlnZW8tPm13X2N1bml0cyA9IGdlby0+d3Nf
b3B0IDw8IDM7CS8qIGRlZmF1bHQgdG8gTUxDIHNhZmUgdmFsdWVzDQo+ICovDQo+ID4NCj4gPiAJ
LyogRG8gbm90IGltcG9zZSB2YWx1ZXMgZm9yIG1heGltdW0gbnVtYmVyIG9mIG9wZW4gYmxvY2tz
IGFzIGl0IGlzDQo+ID4gCSAqIHVuc3BlY2lmaWVkIGluIDEuMi4gVXNlcnMgb2YgMS4yIG11c3Qg
YmUgYXdhcmUgb2YgdGhpcyBhbmQNCj4gPiBldmVudHVhbGx5DQo+ID4gLS0NCj4gPiAyLjExLjAN
Cj4gDQo+IEJ5IGRvaW5nIHRoaXMsIDEuMiBmdXR1cmUgdXNlcnMgKGJleW9uZCBwYmxrKSwgd2ls
bCBmYWlsIHRvIGhhdmUgYSB2YWxpZA0KPiBtd19jdW5pdHMgdmFsdWUuIEl0J3Mgb2sgdG8gZGVh
bCB3aXRoIHRoZSAwIGNhc2UgaW4gcGJsaywgYnV0IEkgYmVsaWV2ZSB0aGF0IHdlDQo+IHNob3Vs
ZCBoYXZlIHRoZSBkZWZhdWx0IHZhbHVlIGZvciAxLjIgZWl0aGVyIHdheS4NCg0KSSdtIG5vdCBz
dXJlLiBGcm9tIG15IHVuZGVyc3RhbmRpbmcsIHNldHRpbmcgb2YgZGVmYXVsdCB2YWx1ZSB3YXMg
d29ya2Fyb3VuZCBmb3IgcGJsayBjYXNlLCBhbSBJIHJpZ2h0ID8uIEluIG15IG9waW5pb24gYW55
IHVzZXIgb2YgMS4yIHNwZWMgc2hvdWxkIGJlIGF3YXJlIHRoYXQgdGhlcmUgaXMgbm90IG13X2N1
bml0IHZhbHVlLiBGcm9tIG15IHBvaW50IG9mIHZpZXcsIGxlYXZpbmcgaGVyZSAwIChhbmQgZGVj
aXNpb24gd2hhdCBkbyB3aXRoIGl0IHRvIGxpZ2h0bnZtIHVzZXIpIGlzIG1vcmUgc2FmZXIgd2F5
LCBidXQgbWF5YmUgSSdtIHdyb25nLiBJIGJlbGlldmUgdGhhdCBpdCBpcyB0b3BpYyB0byB3aWRl
ciBkaXNjdXNzaW9uIHdpdGggbWFpbnRhaW5lcnMuDQoNCj4gDQo+IEEgbW9yZSBnZW5lcmljIHdh
eSBvZiBkb2luZyB0aGlzIHdvdWxkIGJlIHRvIGhhdmUgYSBkZWZhdWx0IHZhbHVlIGZvcg0KPiAy
LjAgdG9vLCBpbiBjYXNlIG13X2N1bml0cyBpcyByZXBvcnRlZCBhcyAwLg0KDQpTaW5jZSAwIGlz
IGNvcnJlY3QgdmFsdWUgYW5kIHVzZXJzIGNhbiBtYWtlIGRpZmZlcmVudCBkZWNpc2lvbnMgYmFz
ZWQgb24gaXQsIEkgdGhpbmsgd2Ugc2hvdWxkbid0IG92ZXJ3cml0ZSBpdCBieSBkZWZhdWx0IHZh
bHVlLiBJcyBpdCBtYWtlIHNlbnNlPw0KPiANCj4gSmF2aWVyDQoNClRoYW5rcywNCk1hcmNpbiAN
Cg==

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
@ 2018-06-04 10:09       ` Dziegielewski, Marcin
  0 siblings, 0 replies; 43+ messages in thread
From: Dziegielewski, Marcin @ 2018-06-04 10:09 UTC (permalink / raw)
  To: Javier Gonzalez, Matias Bjorling
  Cc: Jens Axboe, linux-block, linux-kernel, Konopko, Igor J


Frist of all I want to say sorry for late response - I was on holiday.

> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> Sent: Monday, May 28, 2018 1:03 PM
> To: Matias Bjørling <mb@lightnvm.io>
> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org; linux-
> kernel@vger.kernel.org; Dziegielewski, Marcin
> <marcin.dziegielewski@intel.com>; Konopko, Igor J
> <igor.j.konopko@intel.com>
> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
> equals to 0
> 
> > On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> >
> > From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> >
> > Some devices can expose mw_cunits equal to 0, it can cause creation of
> > too small write buffer and cause performance to drop on write
> > workloads.
> >
> > To handle that, we use the default value for MLC and beacause it
> > covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
> > nvme_nvm_setup_12 function isn't longer necessary.
> >
> > Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> > Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> > Signed-off-by: Matias Bjørling <mb@lightnvm.io>
> > ---
> > drivers/lightnvm/pblk-init.c | 10 +++++++++-
> > drivers/nvme/host/lightnvm.c |  1 -
> > 2 files changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/lightnvm/pblk-init.c
> > b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b 100644
> > --- a/drivers/lightnvm/pblk-init.c
> > +++ b/drivers/lightnvm/pblk-init.c
> > @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
> > 	atomic64_set(&pblk->nr_flush, 0);
> > 	pblk->nr_flush_rst = 0;
> >
> > -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> > +	if (geo->mw_cunits) {
> > +		pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> > +	} else {
> > +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
> > +		/*
> > +		 * Some devices can expose mw_cunits equal to 0, so let's
> use
> > +		 * here default safe value for MLC.
> > +		 */
> > +	}
> >
> > 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
> > 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
> > a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index
> > 41279da799ed..c747792da915 100644
> > --- a/drivers/nvme/host/lightnvm.c
> > +++ b/drivers/nvme/host/lightnvm.c
> > @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
> nvme_nvm_id12
> > *id,
> >
> > 	geo->ws_min = sec_per_pg;
> > 	geo->ws_opt = sec_per_pg;
> > -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC safe values
> */
> >
> > 	/* Do not impose values for maximum number of open blocks as it is
> > 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
> > eventually
> > --
> > 2.11.0
> 
> By doing this, 1.2 future users (beyond pblk), will fail to have a valid
> mw_cunits value. It's ok to deal with the 0 case in pblk, but I believe that we
> should have the default value for 1.2 either way.

I'm not sure. From my understanding, setting of default value was workaround for pblk case, am I right ?. In my opinion any user of 1.2 spec should be aware that there is not mw_cunit value. From my point of view, leaving here 0 (and decision what do with it to lightnvm user) is more safer way, but maybe I'm wrong. I believe that it is topic to wider discussion with maintainers.

> 
> A more generic way of doing this would be to have a default value for
> 2.0 too, in case mw_cunits is reported as 0.

Since 0 is correct value and users can make different decisions based on it, I think we shouldn't overwrite it by default value. Is it make sense?
> 
> Javier

Thanks,
Marcin 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-06-04 10:09       ` Dziegielewski, Marcin
@ 2018-06-04 10:21         ` Javier Gonzalez
  -1 siblings, 0 replies; 43+ messages in thread
From: Javier Gonzalez @ 2018-06-04 10:21 UTC (permalink / raw)
  To: Dziegielewski, Marcin
  Cc: Matias Bjørling, Jens Axboe, linux-block, linux-kernel,
	Konopko, Igor J

PiBPbiA0IEp1biAyMDE4LCBhdCAxMi4wOSwgRHppZWdpZWxld3NraSwgTWFyY2luIDxtYXJjaW4u
ZHppZWdpZWxld3NraUBpbnRlbC5jb20+IHdyb3RlOg0KPiANCj4gDQo+IEZyaXN0IG9mIGFsbCBJ
IHdhbnQgdG8gc2F5IHNvcnJ5IGZvciBsYXRlIHJlc3BvbnNlIC0gSSB3YXMgb24gaG9saWRheS4N
Cj4gDQo+PiBGcm9tOiBKYXZpZXIgR29uemFsZXogW21haWx0bzpqYXZpZXJAY25leGxhYnMuY29t
XQ0KPj4gU2VudDogTW9uZGF5LCBNYXkgMjgsIDIwMTggMTowMyBQTQ0KPj4gVG86IE1hdGlhcyBC
asO4cmxpbmcgPG1iQGxpZ2h0bnZtLmlvPg0KPj4gQ2M6IEplbnMgQXhib2UgPGF4Ym9lQGZiLmNv
bT47IGxpbnV4LWJsb2NrQHZnZXIua2VybmVsLm9yZzsgbGludXgtDQo+PiBrZXJuZWxAdmdlci5r
ZXJuZWwub3JnOyBEemllZ2llbGV3c2tpLCBNYXJjaW4NCj4+IDxtYXJjaW4uZHppZWdpZWxld3Nr
aUBpbnRlbC5jb20+OyBLb25vcGtvLCBJZ29yIEoNCj4+IDxpZ29yLmoua29ub3Brb0BpbnRlbC5j
b20+DQo+PiBTdWJqZWN0OiBSZTogW0dJVCBQVUxMIDE4LzIwXSBsaWdodG52bTogcGJsazogaGFu
ZGxlIGNhc2Ugd2hlbiBtd19jdW5pdHMNCj4+IGVxdWFscyB0byAwDQo+PiANCj4+PiBPbiAyOCBN
YXkgMjAxOCwgYXQgMTAuNTgsIE1hdGlhcyBCasO4cmxpbmcgPG1iQGxpZ2h0bnZtLmlvPiB3cm90
ZToNCj4+PiANCj4+PiBGcm9tOiBNYXJjaW4gRHppZWdpZWxld3NraSA8bWFyY2luLmR6aWVnaWVs
ZXdza2lAaW50ZWwuY29tPg0KPj4+IA0KPj4+IFNvbWUgZGV2aWNlcyBjYW4gZXhwb3NlIG13X2N1
bml0cyBlcXVhbCB0byAwLCBpdCBjYW4gY2F1c2UgY3JlYXRpb24gb2YNCj4+PiB0b28gc21hbGwg
d3JpdGUgYnVmZmVyIGFuZCBjYXVzZSBwZXJmb3JtYW5jZSB0byBkcm9wIG9uIHdyaXRlDQo+Pj4g
d29ya2xvYWRzLg0KPj4+IA0KPj4+IFRvIGhhbmRsZSB0aGF0LCB3ZSB1c2UgdGhlIGRlZmF1bHQg
dmFsdWUgZm9yIE1MQyBhbmQgYmVhY2F1c2UgaXQNCj4+PiBjb3ZlcnMgYm90aCAxLjIgYW5kIDIu
MCBPQyBzcGVjaWZpY2F0aW9uLCBzZXR0aW5nIHVwIG13X2N1bml0cyBpbg0KPj4+IG52bWVfbnZt
X3NldHVwXzEyIGZ1bmN0aW9uIGlzbid0IGxvbmdlciBuZWNlc3NhcnkuDQo+Pj4gDQo+Pj4gU2ln
bmVkLW9mZi1ieTogTWFyY2luIER6aWVnaWVsZXdza2kgPG1hcmNpbi5kemllZ2llbGV3c2tpQGlu
dGVsLmNvbT4NCj4+PiBTaWduZWQtb2ZmLWJ5OiBJZ29yIEtvbm9wa28gPGlnb3Iuai5rb25vcGtv
QGludGVsLmNvbT4NCj4+PiBTaWduZWQtb2ZmLWJ5OiBNYXRpYXMgQmrDuHJsaW5nIDxtYkBsaWdo
dG52bS5pbz4NCj4+PiAtLS0NCj4+PiBkcml2ZXJzL2xpZ2h0bnZtL3BibGstaW5pdC5jIHwgMTAg
KysrKysrKysrLQ0KPj4+IGRyaXZlcnMvbnZtZS9ob3N0L2xpZ2h0bnZtLmMgfCAgMSAtDQo+Pj4g
MiBmaWxlcyBjaGFuZ2VkLCA5IGluc2VydGlvbnMoKyksIDIgZGVsZXRpb25zKC0pDQo+Pj4gDQo+
Pj4gZGlmZiAtLWdpdCBhL2RyaXZlcnMvbGlnaHRudm0vcGJsay1pbml0LmMNCj4+PiBiL2RyaXZl
cnMvbGlnaHRudm0vcGJsay1pbml0LmMgaW5kZXggZDY1ZDJmOTcyY2NmLi4wZjI3Nzc0NDI2NmIg
MTAwNjQ0DQo+Pj4gLS0tIGEvZHJpdmVycy9saWdodG52bS9wYmxrLWluaXQuYw0KPj4+ICsrKyBi
L2RyaXZlcnMvbGlnaHRudm0vcGJsay1pbml0LmMNCj4+PiBAQCAtMzU2LDcgKzM1NiwxNSBAQCBz
dGF0aWMgaW50IHBibGtfY29yZV9pbml0KHN0cnVjdCBwYmxrICpwYmxrKQ0KPj4+IAlhdG9taWM2
NF9zZXQoJnBibGstPm5yX2ZsdXNoLCAwKTsNCj4+PiAJcGJsay0+bnJfZmx1c2hfcnN0ID0gMDsN
Cj4+PiANCj4+PiAtCXBibGstPnBnc19pbl9idWZmZXIgPSBnZW8tPm13X2N1bml0cyAqIGdlby0+
YWxsX2x1bnM7DQo+Pj4gKwlpZiAoZ2VvLT5td19jdW5pdHMpIHsNCj4+PiArCQlwYmxrLT5wZ3Nf
aW5fYnVmZmVyID0gZ2VvLT5td19jdW5pdHMgKiBnZW8tPmFsbF9sdW5zOw0KPj4+ICsJfSBlbHNl
IHsNCj4+PiArCQlwYmxrLT5wZ3NfaW5fYnVmZmVyID0gKGdlby0+d3Nfb3B0IDw8IDMpICogZ2Vv
LT5hbGxfbHVuczsNCj4+PiArCQkvKg0KPj4+ICsJCSAqIFNvbWUgZGV2aWNlcyBjYW4gZXhwb3Nl
IG13X2N1bml0cyBlcXVhbCB0byAwLCBzbyBsZXQncw0KPj4gdXNlDQo+Pj4gKwkJICogaGVyZSBk
ZWZhdWx0IHNhZmUgdmFsdWUgZm9yIE1MQy4NCj4+PiArCQkgKi8NCj4+PiArCX0NCj4+PiANCj4+
PiAJcGJsay0+bWluX3dyaXRlX3BncyA9IGdlby0+d3Nfb3B0ICogKGdlby0+Y3NlY3MgLyBQQUdF
X1NJWkUpOw0KPj4+IAltYXhfd3JpdGVfcHBhcyA9IHBibGstPm1pbl93cml0ZV9wZ3MgKiBnZW8t
PmFsbF9sdW5zOyBkaWZmIC0tZ2l0DQo+Pj4gYS9kcml2ZXJzL252bWUvaG9zdC9saWdodG52bS5j
IGIvZHJpdmVycy9udm1lL2hvc3QvbGlnaHRudm0uYyBpbmRleA0KPj4+IDQxMjc5ZGE3OTllZC4u
Yzc0Nzc5MmRhOTE1IDEwMDY0NA0KPj4+IC0tLSBhL2RyaXZlcnMvbnZtZS9ob3N0L2xpZ2h0bnZt
LmMNCj4+PiArKysgYi9kcml2ZXJzL252bWUvaG9zdC9saWdodG52bS5jDQo+Pj4gQEAgLTMzOCw3
ICszMzgsNiBAQCBzdGF0aWMgaW50IG52bWVfbnZtX3NldHVwXzEyKHN0cnVjdA0KPj4gbnZtZV9u
dm1faWQxMg0KPj4+ICppZCwNCj4+PiANCj4+PiAJZ2VvLT53c19taW4gPSBzZWNfcGVyX3BnOw0K
Pj4+IAlnZW8tPndzX29wdCA9IHNlY19wZXJfcGc7DQo+Pj4gLQlnZW8tPm13X2N1bml0cyA9IGdl
by0+d3Nfb3B0IDw8IDM7CS8qIGRlZmF1bHQgdG8gTUxDIHNhZmUgdmFsdWVzDQo+PiAqLw0KPj4+
IC8qIERvIG5vdCBpbXBvc2UgdmFsdWVzIGZvciBtYXhpbXVtIG51bWJlciBvZiBvcGVuIGJsb2Nr
cyBhcyBpdCBpcw0KPj4+IAkgKiB1bnNwZWNpZmllZCBpbiAxLjIuIFVzZXJzIG9mIDEuMiBtdXN0
IGJlIGF3YXJlIG9mIHRoaXMgYW5kDQo+Pj4gZXZlbnR1YWxseQ0KPj4+IC0tDQo+Pj4gMi4xMS4w
DQo+PiANCj4+IEJ5IGRvaW5nIHRoaXMsIDEuMiBmdXR1cmUgdXNlcnMgKGJleW9uZCBwYmxrKSwg
d2lsbCBmYWlsIHRvIGhhdmUgYSB2YWxpZA0KPj4gbXdfY3VuaXRzIHZhbHVlLiBJdCdzIG9rIHRv
IGRlYWwgd2l0aCB0aGUgMCBjYXNlIGluIHBibGssIGJ1dCBJIGJlbGlldmUgdGhhdCB3ZQ0KPj4g
c2hvdWxkIGhhdmUgdGhlIGRlZmF1bHQgdmFsdWUgZm9yIDEuMiBlaXRoZXIgd2F5Lg0KPiANCj4g
SSdtIG5vdCBzdXJlLiBGcm9tIG15IHVuZGVyc3RhbmRpbmcsIHNldHRpbmcgb2YgZGVmYXVsdCB2
YWx1ZSB3YXMNCj4gd29ya2Fyb3VuZCBmb3IgcGJsayBjYXNlLCBhbSBJIHJpZ2h0ID8uDQoNClRo
ZSBkZWZhdWx0IHZhbHVlIGNvdmVycyB0aGUgTUxDIGNhc2UgZGlyZWN0bHkgYXQgdGhlIGxpZ2h0
bnZtIGxheWVyLCBhcw0Kb3Bwb3NlZCB0byBkb2luZyBpdCBkaXJlY3RseSBpbiBwYmxrLiBTaW5j
ZSBwYmxrIGlzIHRoZSBvbmx5IHVzZXIgbm93LA0KeW91IGNhbiBhcmd1ZSB0aGF0IGFsbCBjaGFu
Z2VzIGluIHRoZSBsaWdodG52bSBsYXllciBhcmUgdG8gc29sdmUgcGJsaw0KaXNzdWVzLCBidXQg
dGhlIGlkZWEgaXMgdGhhdCB0aGUgZ2VvbWV0cnkgc2hvdWxkIGJlIGdlbmVyaWMuDQoNCj4gSW4g
bXkgb3BpbmlvbiBhbnkgdXNlciBvZiAxLjINCj4gc3BlYyBzaG91bGQgYmUgYXdhcmUgdGhhdCB0
aGVyZSBpcyBub3QgbXdfY3VuaXQgdmFsdWUuIEZyb20gbXkgcG9pbnQNCj4gb2YgdmlldywgbGVh
dmluZyBoZXJlIDAgKGFuZCBkZWNpc2lvbiB3aGF0IGRvIHdpdGggaXQgdG8gbGlnaHRudm0NCj4g
dXNlcikgaXMgbW9yZSBzYWZlciB3YXksIGJ1dCBtYXliZSBJJ20gd3JvbmcuIEkgYmVsaWV2ZSB0
aGF0IGl0IGlzDQo+IHRvcGljIHRvIHdpZGVyIGRpc2N1c3Npb24gd2l0aCBtYWludGFpbmVycy4N
Cj4gDQoNCjEuMiBhbmQgMi4wIGhhdmUgZGlmZmVyZW50IGdlb21ldHJpZXMsIGJ1dCB3aGVuIHdl
IGRlc2lnbmVkIHRoZSBjb21tb24NCm52bV9nZW8gc3RydWN0dXJlLCB0aGUgaWRlYSB3YXMgdG8g
YWJzdHJhY3QgYm90aCBzcGVjcyBhbmQgYWxsb3cgdGhlDQp1cHBlciBsYXllcnMgdG8gdXNlIHRo
ZSBnZW9tZXRyeSB0cmFuc3BhcmVudGx5LiANCg0KU3BlY2lmaWNhbGx5IGluIHBibGssIEkgd291
bGQgcHJlZmVyIHRvIGtlZXAgaXQgaW4gc3VjaCBhIHdheSB0aGF0IHdlIGRvbid0DQpuZWVkIHRv
IG1lZGlhIHNwZWNpZmljIHBvbGljaWVzIChlLmcuLCBzZXQgZGVmYXVsdCB2YWx1ZXMgZm9yIE1M
Qw0KbWVtb3JpZXMpLCBhcyBhIGdlbmVyYWwgZGVzaWduIHByaW5jaXBsZS4gV2UgYWxyZWFkeSBk
byBzb21lIGdlb21ldHJ5DQp2ZXJzaW9uIGNoZWNrcyB0byBhdm9pZCBkZXJlZmVyZW5jaW5nIHVu
bmVjZXNzYXJ5IHBvaW50ZXJzIG9uIHRoZSBmYXN0DQpwYXRoLCB3aGljaCBJIHdvdWxkIGV2ZW50
dWFsbHkgbGlrZSB0byByZW1vdmUuDQoNCj4+IEEgbW9yZSBnZW5lcmljIHdheSBvZiBkb2luZyB0
aGlzIHdvdWxkIGJlIHRvIGhhdmUgYSBkZWZhdWx0IHZhbHVlIGZvcg0KPj4gMi4wIHRvbywgaW4g
Y2FzZSBtd19jdW5pdHMgaXMgcmVwb3J0ZWQgYXMgMC4NCj4gDQo+IFNpbmNlIDAgaXMgY29ycmVj
dCB2YWx1ZSBhbmQgdXNlcnMgY2FuIG1ha2UgZGlmZmVyZW50IGRlY2lzaW9ucyBiYXNlZA0KPiBv
biBpdCwgSSB0aGluayB3ZSBzaG91bGRuJ3Qgb3ZlcndyaXRlIGl0IGJ5IGRlZmF1bHQgdmFsdWUu
IElzIGl0IG1ha2UNCj4gc2Vuc2U/DQoNCkhlcmUgSSBtZWFudCBhdCBhIHBibGsgbGV2ZWwgLSBJ
IHNob3VsZCBoYXZlIHNwZWNpZmllZCBpdC4gQXQgdGhlDQpnZW9tZXRyeSBsZXZlbCwgd2Ugc2hv
dWxkIG5vdCBjaGFuZ2UgaXQuIA0KDQpUaGUgY2FzZSBJIGFtIHRoaW5raW5nIGlzIGlmIG13X2N1
aW50cyByZXBvaW50cyAwLCBidXQgd3NfbWluID4gMC4gSW4NCnRoaXMgY2FzZSwgd2Ugc3RpbGwg
bmVlZCBhIGhvc3Qgc2lkZSBidWZmZXIgdG8gc2VydmUgPCB3c19taW4gSS9PcywgZXZlbg0KdGhv
dWdoIHRoZSBkZXZpY2UgZG9lcyBub3QgcmVxdWlyZSB0aGUgYnVmZmVyIHRvIGd1YXJhbnRlZSBy
ZWFkcy4NCg0KPj4gSmF2aWVyDQo+IA0KPiBUaGFua3MsDQo+IE1hcmNpbiANCg0KSmF2aWVy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
@ 2018-06-04 10:21         ` Javier Gonzalez
  0 siblings, 0 replies; 43+ messages in thread
From: Javier Gonzalez @ 2018-06-04 10:21 UTC (permalink / raw)
  To: Dziegielewski, Marcin
  Cc: Matias Bjørling, Jens Axboe, linux-block, linux-kernel,
	Konopko, Igor J

> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin <marcin.dziegielewski@intel.com> wrote:
> 
> 
> Frist of all I want to say sorry for late response - I was on holiday.
> 
>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>> Sent: Monday, May 28, 2018 1:03 PM
>> To: Matias Bjørling <mb@lightnvm.io>
>> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Dziegielewski, Marcin
>> <marcin.dziegielewski@intel.com>; Konopko, Igor J
>> <igor.j.konopko@intel.com>
>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
>> equals to 0
>> 
>>> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
>>> 
>>> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
>>> 
>>> Some devices can expose mw_cunits equal to 0, it can cause creation of
>>> too small write buffer and cause performance to drop on write
>>> workloads.
>>> 
>>> To handle that, we use the default value for MLC and beacause it
>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
>>> nvme_nvm_setup_12 function isn't longer necessary.
>>> 
>>> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
>>> ---
>>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
>>> drivers/nvme/host/lightnvm.c |  1 -
>>> 2 files changed, 9 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-init.c
>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b 100644
>>> --- a/drivers/lightnvm/pblk-init.c
>>> +++ b/drivers/lightnvm/pblk-init.c
>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
>>> 	atomic64_set(&pblk->nr_flush, 0);
>>> 	pblk->nr_flush_rst = 0;
>>> 
>>> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>> +	if (geo->mw_cunits) {
>>> +		pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>> +	} else {
>>> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
>>> +		/*
>>> +		 * Some devices can expose mw_cunits equal to 0, so let's
>> use
>>> +		 * here default safe value for MLC.
>>> +		 */
>>> +	}
>>> 
>>> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>>> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
>>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index
>>> 41279da799ed..c747792da915 100644
>>> --- a/drivers/nvme/host/lightnvm.c
>>> +++ b/drivers/nvme/host/lightnvm.c
>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
>> nvme_nvm_id12
>>> *id,
>>> 
>>> 	geo->ws_min = sec_per_pg;
>>> 	geo->ws_opt = sec_per_pg;
>>> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC safe values
>> */
>>> /* Do not impose values for maximum number of open blocks as it is
>>> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
>>> eventually
>>> --
>>> 2.11.0
>> 
>> By doing this, 1.2 future users (beyond pblk), will fail to have a valid
>> mw_cunits value. It's ok to deal with the 0 case in pblk, but I believe that we
>> should have the default value for 1.2 either way.
> 
> I'm not sure. From my understanding, setting of default value was
> workaround for pblk case, am I right ?.

The default value covers the MLC case directly at the lightnvm layer, as
opposed to doing it directly in pblk. Since pblk is the only user now,
you can argue that all changes in the lightnvm layer are to solve pblk
issues, but the idea is that the geometry should be generic.

> In my opinion any user of 1.2
> spec should be aware that there is not mw_cunit value. From my point
> of view, leaving here 0 (and decision what do with it to lightnvm
> user) is more safer way, but maybe I'm wrong. I believe that it is
> topic to wider discussion with maintainers.
> 

1.2 and 2.0 have different geometries, but when we designed the common
nvm_geo structure, the idea was to abstract both specs and allow the
upper layers to use the geometry transparently. 

Specifically in pblk, I would prefer to keep it in such a way that we don't
need to media specific policies (e.g., set default values for MLC
memories), as a general design principle. We already do some geometry
version checks to avoid dereferencing unnecessary pointers on the fast
path, which I would eventually like to remove.

>> A more generic way of doing this would be to have a default value for
>> 2.0 too, in case mw_cunits is reported as 0.
> 
> Since 0 is correct value and users can make different decisions based
> on it, I think we shouldn't overwrite it by default value. Is it make
> sense?

Here I meant at a pblk level - I should have specified it. At the
geometry level, we should not change it. 

The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In
this case, we still need a host side buffer to serve < ws_min I/Os, even
though the device does not require the buffer to guarantee reads.

>> Javier
> 
> Thanks,
> Marcin 

Javier

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-06-04 10:21         ` Javier Gonzalez
@ 2018-06-04 11:11           ` Dziegielewski, Marcin
  -1 siblings, 0 replies; 43+ messages in thread
From: Dziegielewski, Marcin @ 2018-06-04 11:11 UTC (permalink / raw)
  To: Javier Gonzalez
  Cc: Matias Bjorling, Jens Axboe, linux-block, linux-kernel, Konopko, Igor J

DQo+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+IEZyb206IEphdmllciBHb256YWxleiBb
bWFpbHRvOmphdmllckBjbmV4bGFicy5jb21dDQo+IFNlbnQ6IE1vbmRheSwgSnVuZSA0LCAyMDE4
IDEyOjIyIFBNDQo+IFRvOiBEemllZ2llbGV3c2tpLCBNYXJjaW4gPG1hcmNpbi5kemllZ2llbGV3
c2tpQGludGVsLmNvbT4NCj4gQ2M6IE1hdGlhcyBCasO4cmxpbmcgPG1iQGxpZ2h0bnZtLmlvPjsg
SmVucyBBeGJvZSA8YXhib2VAZmIuY29tPjsgbGludXgtDQo+IGJsb2NrQHZnZXIua2VybmVsLm9y
ZzsgbGludXgta2VybmVsQHZnZXIua2VybmVsLm9yZzsgS29ub3BrbywgSWdvciBKDQo+IDxpZ29y
Lmoua29ub3Brb0BpbnRlbC5jb20+DQo+IFN1YmplY3Q6IFJlOiBbR0lUIFBVTEwgMTgvMjBdIGxp
Z2h0bnZtOiBwYmxrOiBoYW5kbGUgY2FzZSB3aGVuIG13X2N1bml0cw0KPiBlcXVhbHMgdG8gMA0K
PiANCj4gPiBPbiA0IEp1biAyMDE4LCBhdCAxMi4wOSwgRHppZWdpZWxld3NraSwgTWFyY2luDQo+
IDxtYXJjaW4uZHppZWdpZWxld3NraUBpbnRlbC5jb20+IHdyb3RlOg0KPiA+DQo+ID4NCj4gPiBG
cmlzdCBvZiBhbGwgSSB3YW50IHRvIHNheSBzb3JyeSBmb3IgbGF0ZSByZXNwb25zZSAtIEkgd2Fz
IG9uIGhvbGlkYXkuDQo+ID4NCj4gPj4gRnJvbTogSmF2aWVyIEdvbnphbGV6IFttYWlsdG86amF2
aWVyQGNuZXhsYWJzLmNvbV0NCj4gPj4gU2VudDogTW9uZGF5LCBNYXkgMjgsIDIwMTggMTowMyBQ
TQ0KPiA+PiBUbzogTWF0aWFzIEJqw7hybGluZyA8bWJAbGlnaHRudm0uaW8+DQo+ID4+IENjOiBK
ZW5zIEF4Ym9lIDxheGJvZUBmYi5jb20+OyBsaW51eC1ibG9ja0B2Z2VyLmtlcm5lbC5vcmc7IGxp
bnV4LQ0KPiA+PiBrZXJuZWxAdmdlci5rZXJuZWwub3JnOyBEemllZ2llbGV3c2tpLCBNYXJjaW4N
Cj4gPj4gPG1hcmNpbi5kemllZ2llbGV3c2tpQGludGVsLmNvbT47IEtvbm9wa28sIElnb3IgSg0K
PiA+PiA8aWdvci5qLmtvbm9wa29AaW50ZWwuY29tPg0KPiA+PiBTdWJqZWN0OiBSZTogW0dJVCBQ
VUxMIDE4LzIwXSBsaWdodG52bTogcGJsazogaGFuZGxlIGNhc2Ugd2hlbg0KPiA+PiBtd19jdW5p
dHMgZXF1YWxzIHRvIDANCj4gPj4NCj4gPj4+IE9uIDI4IE1heSAyMDE4LCBhdCAxMC41OCwgTWF0
aWFzIEJqw7hybGluZyA8bWJAbGlnaHRudm0uaW8+IHdyb3RlOg0KPiA+Pj4NCj4gPj4+IEZyb206
IE1hcmNpbiBEemllZ2llbGV3c2tpIDxtYXJjaW4uZHppZWdpZWxld3NraUBpbnRlbC5jb20+DQo+
ID4+Pg0KPiA+Pj4gU29tZSBkZXZpY2VzIGNhbiBleHBvc2UgbXdfY3VuaXRzIGVxdWFsIHRvIDAs
IGl0IGNhbiBjYXVzZSBjcmVhdGlvbg0KPiA+Pj4gb2YgdG9vIHNtYWxsIHdyaXRlIGJ1ZmZlciBh
bmQgY2F1c2UgcGVyZm9ybWFuY2UgdG8gZHJvcCBvbiB3cml0ZQ0KPiA+Pj4gd29ya2xvYWRzLg0K
PiA+Pj4NCj4gPj4+IFRvIGhhbmRsZSB0aGF0LCB3ZSB1c2UgdGhlIGRlZmF1bHQgdmFsdWUgZm9y
IE1MQyBhbmQgYmVhY2F1c2UgaXQNCj4gPj4+IGNvdmVycyBib3RoIDEuMiBhbmQgMi4wIE9DIHNw
ZWNpZmljYXRpb24sIHNldHRpbmcgdXAgbXdfY3VuaXRzIGluDQo+ID4+PiBudm1lX252bV9zZXR1
cF8xMiBmdW5jdGlvbiBpc24ndCBsb25nZXIgbmVjZXNzYXJ5Lg0KPiA+Pj4NCj4gPj4+IFNpZ25l
ZC1vZmYtYnk6IE1hcmNpbiBEemllZ2llbGV3c2tpIDxtYXJjaW4uZHppZWdpZWxld3NraUBpbnRl
bC5jb20+DQo+ID4+PiBTaWduZWQtb2ZmLWJ5OiBJZ29yIEtvbm9wa28gPGlnb3Iuai5rb25vcGtv
QGludGVsLmNvbT4NCj4gPj4+IFNpZ25lZC1vZmYtYnk6IE1hdGlhcyBCasO4cmxpbmcgPG1iQGxp
Z2h0bnZtLmlvPg0KPiA+Pj4gLS0tDQo+ID4+PiBkcml2ZXJzL2xpZ2h0bnZtL3BibGstaW5pdC5j
IHwgMTAgKysrKysrKysrLQ0KPiA+Pj4gZHJpdmVycy9udm1lL2hvc3QvbGlnaHRudm0uYyB8ICAx
IC0NCj4gPj4+IDIgZmlsZXMgY2hhbmdlZCwgOSBpbnNlcnRpb25zKCspLCAyIGRlbGV0aW9ucygt
KQ0KPiA+Pj4NCj4gPj4+IGRpZmYgLS1naXQgYS9kcml2ZXJzL2xpZ2h0bnZtL3BibGstaW5pdC5j
DQo+ID4+PiBiL2RyaXZlcnMvbGlnaHRudm0vcGJsay1pbml0LmMgaW5kZXggZDY1ZDJmOTcyY2Nm
Li4wZjI3Nzc0NDI2NmINCj4gPj4+IDEwMDY0NA0KPiA+Pj4gLS0tIGEvZHJpdmVycy9saWdodG52
bS9wYmxrLWluaXQuYw0KPiA+Pj4gKysrIGIvZHJpdmVycy9saWdodG52bS9wYmxrLWluaXQuYw0K
PiA+Pj4gQEAgLTM1Niw3ICszNTYsMTUgQEAgc3RhdGljIGludCBwYmxrX2NvcmVfaW5pdChzdHJ1
Y3QgcGJsayAqcGJsaykNCj4gPj4+IAlhdG9taWM2NF9zZXQoJnBibGstPm5yX2ZsdXNoLCAwKTsN
Cj4gPj4+IAlwYmxrLT5ucl9mbHVzaF9yc3QgPSAwOw0KPiA+Pj4NCj4gPj4+IC0JcGJsay0+cGdz
X2luX2J1ZmZlciA9IGdlby0+bXdfY3VuaXRzICogZ2VvLT5hbGxfbHVuczsNCj4gPj4+ICsJaWYg
KGdlby0+bXdfY3VuaXRzKSB7DQo+ID4+PiArCQlwYmxrLT5wZ3NfaW5fYnVmZmVyID0gZ2VvLT5t
d19jdW5pdHMgKiBnZW8tPmFsbF9sdW5zOw0KPiA+Pj4gKwl9IGVsc2Ugew0KPiA+Pj4gKwkJcGJs
ay0+cGdzX2luX2J1ZmZlciA9IChnZW8tPndzX29wdCA8PCAzKSAqIGdlby0+YWxsX2x1bnM7DQo+
ID4+PiArCQkvKg0KPiA+Pj4gKwkJICogU29tZSBkZXZpY2VzIGNhbiBleHBvc2UgbXdfY3VuaXRz
IGVxdWFsIHRvIDAsIHNvIGxldCdzDQo+ID4+IHVzZQ0KPiA+Pj4gKwkJICogaGVyZSBkZWZhdWx0
IHNhZmUgdmFsdWUgZm9yIE1MQy4NCj4gPj4+ICsJCSAqLw0KPiA+Pj4gKwl9DQo+ID4+Pg0KPiA+
Pj4gCXBibGstPm1pbl93cml0ZV9wZ3MgPSBnZW8tPndzX29wdCAqIChnZW8tPmNzZWNzIC8gUEFH
RV9TSVpFKTsNCj4gPj4+IAltYXhfd3JpdGVfcHBhcyA9IHBibGstPm1pbl93cml0ZV9wZ3MgKiBn
ZW8tPmFsbF9sdW5zOyBkaWZmIC0tZ2l0DQo+ID4+PiBhL2RyaXZlcnMvbnZtZS9ob3N0L2xpZ2h0
bnZtLmMgYi9kcml2ZXJzL252bWUvaG9zdC9saWdodG52bS5jIGluZGV4DQo+ID4+PiA0MTI3OWRh
Nzk5ZWQuLmM3NDc3OTJkYTkxNSAxMDA2NDQNCj4gPj4+IC0tLSBhL2RyaXZlcnMvbnZtZS9ob3N0
L2xpZ2h0bnZtLmMNCj4gPj4+ICsrKyBiL2RyaXZlcnMvbnZtZS9ob3N0L2xpZ2h0bnZtLmMNCj4g
Pj4+IEBAIC0zMzgsNyArMzM4LDYgQEAgc3RhdGljIGludCBudm1lX252bV9zZXR1cF8xMihzdHJ1
Y3QNCj4gPj4gbnZtZV9udm1faWQxMg0KPiA+Pj4gKmlkLA0KPiA+Pj4NCj4gPj4+IAlnZW8tPndz
X21pbiA9IHNlY19wZXJfcGc7DQo+ID4+PiAJZ2VvLT53c19vcHQgPSBzZWNfcGVyX3BnOw0KPiA+
Pj4gLQlnZW8tPm13X2N1bml0cyA9IGdlby0+d3Nfb3B0IDw8IDM7CS8qIGRlZmF1bHQgdG8gTUxD
IHNhZmUgdmFsdWVzDQo+ID4+ICovDQo+ID4+PiAvKiBEbyBub3QgaW1wb3NlIHZhbHVlcyBmb3Ig
bWF4aW11bSBudW1iZXIgb2Ygb3BlbiBibG9ja3MgYXMgaXQgaXMNCj4gPj4+IAkgKiB1bnNwZWNp
ZmllZCBpbiAxLjIuIFVzZXJzIG9mIDEuMiBtdXN0IGJlIGF3YXJlIG9mIHRoaXMgYW5kDQo+ID4+
PiBldmVudHVhbGx5DQo+ID4+PiAtLQ0KPiA+Pj4gMi4xMS4wDQo+ID4+DQo+ID4+IEJ5IGRvaW5n
IHRoaXMsIDEuMiBmdXR1cmUgdXNlcnMgKGJleW9uZCBwYmxrKSwgd2lsbCBmYWlsIHRvIGhhdmUg
YQ0KPiA+PiB2YWxpZCBtd19jdW5pdHMgdmFsdWUuIEl0J3Mgb2sgdG8gZGVhbCB3aXRoIHRoZSAw
IGNhc2UgaW4gcGJsaywgYnV0IEkNCj4gPj4gYmVsaWV2ZSB0aGF0IHdlIHNob3VsZCBoYXZlIHRo
ZSBkZWZhdWx0IHZhbHVlIGZvciAxLjIgZWl0aGVyIHdheS4NCj4gPg0KPiA+IEknbSBub3Qgc3Vy
ZS4gRnJvbSBteSB1bmRlcnN0YW5kaW5nLCBzZXR0aW5nIG9mIGRlZmF1bHQgdmFsdWUgd2FzDQo+
ID4gd29ya2Fyb3VuZCBmb3IgcGJsayBjYXNlLCBhbSBJIHJpZ2h0ID8uDQo+IA0KPiBUaGUgZGVm
YXVsdCB2YWx1ZSBjb3ZlcnMgdGhlIE1MQyBjYXNlIGRpcmVjdGx5IGF0IHRoZSBsaWdodG52bSBs
YXllciwgYXMNCj4gb3Bwb3NlZCB0byBkb2luZyBpdCBkaXJlY3RseSBpbiBwYmxrLiBTaW5jZSBw
YmxrIGlzIHRoZSBvbmx5IHVzZXIgbm93LCB5b3UgY2FuDQo+IGFyZ3VlIHRoYXQgYWxsIGNoYW5n
ZXMgaW4gdGhlIGxpZ2h0bnZtIGxheWVyIGFyZSB0byBzb2x2ZSBwYmxrIGlzc3VlcywgYnV0IHRo
ZQ0KPiBpZGVhIGlzIHRoYXQgdGhlIGdlb21ldHJ5IHNob3VsZCBiZSBnZW5lcmljLg0KPiANCj4g
PiBJbiBteSBvcGluaW9uIGFueSB1c2VyIG9mIDEuMg0KPiA+IHNwZWMgc2hvdWxkIGJlIGF3YXJl
IHRoYXQgdGhlcmUgaXMgbm90IG13X2N1bml0IHZhbHVlLiBGcm9tIG15IHBvaW50DQo+ID4gb2Yg
dmlldywgbGVhdmluZyBoZXJlIDAgKGFuZCBkZWNpc2lvbiB3aGF0IGRvIHdpdGggaXQgdG8gbGln
aHRudm0NCj4gPiB1c2VyKSBpcyBtb3JlIHNhZmVyIHdheSwgYnV0IG1heWJlIEknbSB3cm9uZy4g
SSBiZWxpZXZlIHRoYXQgaXQgaXMNCj4gPiB0b3BpYyB0byB3aWRlciBkaXNjdXNzaW9uIHdpdGgg
bWFpbnRhaW5lcnMuDQo+ID4NCj4gDQo+IDEuMiBhbmQgMi4wIGhhdmUgZGlmZmVyZW50IGdlb21l
dHJpZXMsIGJ1dCB3aGVuIHdlIGRlc2lnbmVkIHRoZSBjb21tb24NCj4gbnZtX2dlbyBzdHJ1Y3R1
cmUsIHRoZSBpZGVhIHdhcyB0byBhYnN0cmFjdCBib3RoIHNwZWNzIGFuZCBhbGxvdyB0aGUgdXBw
ZXINCj4gbGF5ZXJzIHRvIHVzZSB0aGUgZ2VvbWV0cnkgdHJhbnNwYXJlbnRseS4NCj4gDQo+IFNw
ZWNpZmljYWxseSBpbiBwYmxrLCBJIHdvdWxkIHByZWZlciB0byBrZWVwIGl0IGluIHN1Y2ggYSB3
YXkgdGhhdCB3ZSBkb24ndCBuZWVkDQo+IHRvIG1lZGlhIHNwZWNpZmljIHBvbGljaWVzIChlLmcu
LCBzZXQgZGVmYXVsdCB2YWx1ZXMgZm9yIE1MQyBtZW1vcmllcyksIGFzIGENCj4gZ2VuZXJhbCBk
ZXNpZ24gcHJpbmNpcGxlLiBXZSBhbHJlYWR5IGRvIHNvbWUgZ2VvbWV0cnkgdmVyc2lvbiBjaGVj
a3MgdG8NCj4gYXZvaWQgZGVyZWZlcmVuY2luZyB1bm5lY2Vzc2FyeSBwb2ludGVycyBvbiB0aGUg
ZmFzdCBwYXRoLCB3aGljaCBJIHdvdWxkDQo+IGV2ZW50dWFsbHkgbGlrZSB0byByZW1vdmUuDQo+
IA0KDQpPaywgbm93IEkgdW5kZXJzdGFuZCB5b3VyIHBvaW50IG9mIHZpZXcgYW5kIGFncmVlIHdp
dGggdGhhdCwgSSB3aWxsIHByZXBhcmUgc2Vjb25kIHZlcnNpb24gb2YgdGhpcyBwYXRjaCB3aXRo
b3V0IHRoaXMgY2hhbmdlLiBUaGFua3MgZm9yIHRoZSBjbGFyaWZpY2F0aW9uLg0KDQo+ID4+IEEg
bW9yZSBnZW5lcmljIHdheSBvZiBkb2luZyB0aGlzIHdvdWxkIGJlIHRvIGhhdmUgYSBkZWZhdWx0
IHZhbHVlIGZvcg0KPiA+PiAyLjAgdG9vLCBpbiBjYXNlIG13X2N1bml0cyBpcyByZXBvcnRlZCBh
cyAwLg0KPiA+DQo+ID4gU2luY2UgMCBpcyBjb3JyZWN0IHZhbHVlIGFuZCB1c2VycyBjYW4gbWFr
ZSBkaWZmZXJlbnQgZGVjaXNpb25zIGJhc2VkDQo+ID4gb24gaXQsIEkgdGhpbmsgd2Ugc2hvdWxk
bid0IG92ZXJ3cml0ZSBpdCBieSBkZWZhdWx0IHZhbHVlLiBJcyBpdCBtYWtlDQo+ID4gc2Vuc2U/
DQo+IA0KPiBIZXJlIEkgbWVhbnQgYXQgYSBwYmxrIGxldmVsIC0gSSBzaG91bGQgaGF2ZSBzcGVj
aWZpZWQgaXQuIEF0IHRoZSBnZW9tZXRyeQ0KPiBsZXZlbCwgd2Ugc2hvdWxkIG5vdCBjaGFuZ2Ug
aXQuDQo+IA0KPiBUaGUgY2FzZSBJIGFtIHRoaW5raW5nIGlzIGlmIG13X2N1aW50cyByZXBvaW50
cyAwLCBidXQgd3NfbWluID4gMC4gSW4gdGhpcyBjYXNlLA0KPiB3ZSBzdGlsbCBuZWVkIGEgaG9z
dCBzaWRlIGJ1ZmZlciB0byBzZXJ2ZSA8IHdzX21pbiBJL09zLCBldmVuIHRob3VnaCB0aGUNCj4g
ZGV2aWNlIGRvZXMgbm90IHJlcXVpcmUgdGhlIGJ1ZmZlciB0byBndWFyYW50ZWUgcmVhZHMuDQoN
Ck9oLCBvayBub3cgd2UgYXJlIG9uIHRoZSBzYW1lIHBhZ2UuIEluIHRoaXMgcGF0Y2ggSSB3YXMg
dHJ5aW5nIHRvIGFkZHJlc3Mgc3VjaCBjYXNlLiBEbyB5b3UgaGF2ZSBvdGhlciBpZGVhIGhvdyB0
byBkbyBpdCBvciBoZXJlIGFyZSB5b3UgdGhpbmtpbmcgb25seSBvbiB2YWx1ZSBvZiBkZWZhdWx0
IHZhcmlhYmxlPw0KDQo+IA0KPiA+PiBKYXZpZXINCj4gPg0KPiA+IFRoYW5rcywNCj4gPiBNYXJj
aW4NCj4gDQo+IEphdmllcg0KVGhhbmtzLA0KTWFyY2luDQo=

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
@ 2018-06-04 11:11           ` Dziegielewski, Marcin
  0 siblings, 0 replies; 43+ messages in thread
From: Dziegielewski, Marcin @ 2018-06-04 11:11 UTC (permalink / raw)
  To: Javier Gonzalez
  Cc: Matias Bjorling, Jens Axboe, linux-block, linux-kernel, Konopko, Igor J


> -----Original Message-----
> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> Sent: Monday, June 4, 2018 12:22 PM
> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>; linux-
> block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko, Igor J
> <igor.j.konopko@intel.com>
> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
> equals to 0
> 
> > On 4 Jun 2018, at 12.09, Dziegielewski, Marcin
> <marcin.dziegielewski@intel.com> wrote:
> >
> >
> > Frist of all I want to say sorry for late response - I was on holiday.
> >
> >> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> >> Sent: Monday, May 28, 2018 1:03 PM
> >> To: Matias Bjørling <mb@lightnvm.io>
> >> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org; linux-
> >> kernel@vger.kernel.org; Dziegielewski, Marcin
> >> <marcin.dziegielewski@intel.com>; Konopko, Igor J
> >> <igor.j.konopko@intel.com>
> >> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >> mw_cunits equals to 0
> >>
> >>> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> >>>
> >>> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> >>>
> >>> Some devices can expose mw_cunits equal to 0, it can cause creation
> >>> of too small write buffer and cause performance to drop on write
> >>> workloads.
> >>>
> >>> To handle that, we use the default value for MLC and beacause it
> >>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
> >>> nvme_nvm_setup_12 function isn't longer necessary.
> >>>
> >>> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> >>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
> >>> ---
> >>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
> >>> drivers/nvme/host/lightnvm.c |  1 -
> >>> 2 files changed, 9 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/lightnvm/pblk-init.c
> >>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b
> >>> 100644
> >>> --- a/drivers/lightnvm/pblk-init.c
> >>> +++ b/drivers/lightnvm/pblk-init.c
> >>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
> >>> 	atomic64_set(&pblk->nr_flush, 0);
> >>> 	pblk->nr_flush_rst = 0;
> >>>
> >>> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> >>> +	if (geo->mw_cunits) {
> >>> +		pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> >>> +	} else {
> >>> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
> >>> +		/*
> >>> +		 * Some devices can expose mw_cunits equal to 0, so let's
> >> use
> >>> +		 * here default safe value for MLC.
> >>> +		 */
> >>> +	}
> >>>
> >>> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
> >>> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
> >>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index
> >>> 41279da799ed..c747792da915 100644
> >>> --- a/drivers/nvme/host/lightnvm.c
> >>> +++ b/drivers/nvme/host/lightnvm.c
> >>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
> >> nvme_nvm_id12
> >>> *id,
> >>>
> >>> 	geo->ws_min = sec_per_pg;
> >>> 	geo->ws_opt = sec_per_pg;
> >>> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC safe values
> >> */
> >>> /* Do not impose values for maximum number of open blocks as it is
> >>> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
> >>> eventually
> >>> --
> >>> 2.11.0
> >>
> >> By doing this, 1.2 future users (beyond pblk), will fail to have a
> >> valid mw_cunits value. It's ok to deal with the 0 case in pblk, but I
> >> believe that we should have the default value for 1.2 either way.
> >
> > I'm not sure. From my understanding, setting of default value was
> > workaround for pblk case, am I right ?.
> 
> The default value covers the MLC case directly at the lightnvm layer, as
> opposed to doing it directly in pblk. Since pblk is the only user now, you can
> argue that all changes in the lightnvm layer are to solve pblk issues, but the
> idea is that the geometry should be generic.
> 
> > In my opinion any user of 1.2
> > spec should be aware that there is not mw_cunit value. From my point
> > of view, leaving here 0 (and decision what do with it to lightnvm
> > user) is more safer way, but maybe I'm wrong. I believe that it is
> > topic to wider discussion with maintainers.
> >
> 
> 1.2 and 2.0 have different geometries, but when we designed the common
> nvm_geo structure, the idea was to abstract both specs and allow the upper
> layers to use the geometry transparently.
> 
> Specifically in pblk, I would prefer to keep it in such a way that we don't need
> to media specific policies (e.g., set default values for MLC memories), as a
> general design principle. We already do some geometry version checks to
> avoid dereferencing unnecessary pointers on the fast path, which I would
> eventually like to remove.
> 

Ok, now I understand your point of view and agree with that, I will prepare second version of this patch without this change. Thanks for the clarification.

> >> A more generic way of doing this would be to have a default value for
> >> 2.0 too, in case mw_cunits is reported as 0.
> >
> > Since 0 is correct value and users can make different decisions based
> > on it, I think we shouldn't overwrite it by default value. Is it make
> > sense?
> 
> Here I meant at a pblk level - I should have specified it. At the geometry
> level, we should not change it.
> 
> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In this case,
> we still need a host side buffer to serve < ws_min I/Os, even though the
> device does not require the buffer to guarantee reads.

Oh, ok now we are on the same page. In this patch I was trying to address such case. Do you have other idea how to do it or here are you thinking only on value of default variable?

> 
> >> Javier
> >
> > Thanks,
> > Marcin
> 
> Javier
Thanks,
Marcin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-06-04 11:11           ` Dziegielewski, Marcin
@ 2018-06-04 11:15             ` Javier Gonzalez
  -1 siblings, 0 replies; 43+ messages in thread
From: Javier Gonzalez @ 2018-06-04 11:15 UTC (permalink / raw)
  To: Dziegielewski, Marcin
  Cc: Matias Bjørling, Jens Axboe, linux-block, linux-kernel,
	Konopko, Igor J

DQo+IE9uIDQgSnVuIDIwMTgsIGF0IDEzLjExLCBEemllZ2llbGV3c2tpLCBNYXJjaW4gPG1hcmNp
bi5kemllZ2llbGV3c2tpQGludGVsLmNvbT4gd3JvdGU6DQo+IA0KPj4gLS0tLS1PcmlnaW5hbCBN
ZXNzYWdlLS0tLS0NCj4+IEZyb206IEphdmllciBHb256YWxleiBbbWFpbHRvOmphdmllckBjbmV4
bGFicy5jb21dDQo+PiBTZW50OiBNb25kYXksIEp1bmUgNCwgMjAxOCAxMjoyMiBQTQ0KPj4gVG86
IER6aWVnaWVsZXdza2ksIE1hcmNpbiA8bWFyY2luLmR6aWVnaWVsZXdza2lAaW50ZWwuY29tPg0K
Pj4gQ2M6IE1hdGlhcyBCasO4cmxpbmcgPG1iQGxpZ2h0bnZtLmlvPjsgSmVucyBBeGJvZSA8YXhi
b2VAZmIuY29tPjsgbGludXgtDQo+PiBibG9ja0B2Z2VyLmtlcm5lbC5vcmc7IGxpbnV4LWtlcm5l
bEB2Z2VyLmtlcm5lbC5vcmc7IEtvbm9wa28sIElnb3IgSg0KPj4gPGlnb3Iuai5rb25vcGtvQGlu
dGVsLmNvbT4NCj4+IFN1YmplY3Q6IFJlOiBbR0lUIFBVTEwgMTgvMjBdIGxpZ2h0bnZtOiBwYmxr
OiBoYW5kbGUgY2FzZSB3aGVuIG13X2N1bml0cw0KPj4gZXF1YWxzIHRvIDANCj4+IA0KPj4+IE9u
IDQgSnVuIDIwMTgsIGF0IDEyLjA5LCBEemllZ2llbGV3c2tpLCBNYXJjaW4NCj4+IDxtYXJjaW4u
ZHppZWdpZWxld3NraUBpbnRlbC5jb20+IHdyb3RlOg0KPj4+IEZyaXN0IG9mIGFsbCBJIHdhbnQg
dG8gc2F5IHNvcnJ5IGZvciBsYXRlIHJlc3BvbnNlIC0gSSB3YXMgb24gaG9saWRheS4NCj4+PiAN
Cj4+Pj4gRnJvbTogSmF2aWVyIEdvbnphbGV6IFttYWlsdG86amF2aWVyQGNuZXhsYWJzLmNvbV0N
Cj4+Pj4gU2VudDogTW9uZGF5LCBNYXkgMjgsIDIwMTggMTowMyBQTQ0KPj4+PiBUbzogTWF0aWFz
IEJqw7hybGluZyA8bWJAbGlnaHRudm0uaW8+DQo+Pj4+IENjOiBKZW5zIEF4Ym9lIDxheGJvZUBm
Yi5jb20+OyBsaW51eC1ibG9ja0B2Z2VyLmtlcm5lbC5vcmc7IGxpbnV4LQ0KPj4+PiBrZXJuZWxA
dmdlci5rZXJuZWwub3JnOyBEemllZ2llbGV3c2tpLCBNYXJjaW4NCj4+Pj4gPG1hcmNpbi5kemll
Z2llbGV3c2tpQGludGVsLmNvbT47IEtvbm9wa28sIElnb3IgSg0KPj4+PiA8aWdvci5qLmtvbm9w
a29AaW50ZWwuY29tPg0KPj4+PiBTdWJqZWN0OiBSZTogW0dJVCBQVUxMIDE4LzIwXSBsaWdodG52
bTogcGJsazogaGFuZGxlIGNhc2Ugd2hlbg0KPj4+PiBtd19jdW5pdHMgZXF1YWxzIHRvIDANCj4+
Pj4gDQo+Pj4+PiBPbiAyOCBNYXkgMjAxOCwgYXQgMTAuNTgsIE1hdGlhcyBCasO4cmxpbmcgPG1i
QGxpZ2h0bnZtLmlvPiB3cm90ZToNCj4+Pj4+IA0KPj4+Pj4gRnJvbTogTWFyY2luIER6aWVnaWVs
ZXdza2kgPG1hcmNpbi5kemllZ2llbGV3c2tpQGludGVsLmNvbT4NCj4+Pj4+IA0KPj4+Pj4gU29t
ZSBkZXZpY2VzIGNhbiBleHBvc2UgbXdfY3VuaXRzIGVxdWFsIHRvIDAsIGl0IGNhbiBjYXVzZSBj
cmVhdGlvbg0KPj4+Pj4gb2YgdG9vIHNtYWxsIHdyaXRlIGJ1ZmZlciBhbmQgY2F1c2UgcGVyZm9y
bWFuY2UgdG8gZHJvcCBvbiB3cml0ZQ0KPj4+Pj4gd29ya2xvYWRzLg0KPj4+Pj4gDQo+Pj4+PiBU
byBoYW5kbGUgdGhhdCwgd2UgdXNlIHRoZSBkZWZhdWx0IHZhbHVlIGZvciBNTEMgYW5kIGJlYWNh
dXNlIGl0DQo+Pj4+PiBjb3ZlcnMgYm90aCAxLjIgYW5kIDIuMCBPQyBzcGVjaWZpY2F0aW9uLCBz
ZXR0aW5nIHVwIG13X2N1bml0cyBpbg0KPj4+Pj4gbnZtZV9udm1fc2V0dXBfMTIgZnVuY3Rpb24g
aXNuJ3QgbG9uZ2VyIG5lY2Vzc2FyeS4NCj4+Pj4+IA0KPj4+Pj4gU2lnbmVkLW9mZi1ieTogTWFy
Y2luIER6aWVnaWVsZXdza2kgPG1hcmNpbi5kemllZ2llbGV3c2tpQGludGVsLmNvbT4NCj4+Pj4+
IFNpZ25lZC1vZmYtYnk6IElnb3IgS29ub3BrbyA8aWdvci5qLmtvbm9wa29AaW50ZWwuY29tPg0K
Pj4+Pj4gU2lnbmVkLW9mZi1ieTogTWF0aWFzIEJqw7hybGluZyA8bWJAbGlnaHRudm0uaW8+DQo+
Pj4+PiAtLS0NCj4+Pj4+IGRyaXZlcnMvbGlnaHRudm0vcGJsay1pbml0LmMgfCAxMCArKysrKysr
KystDQo+Pj4+PiBkcml2ZXJzL252bWUvaG9zdC9saWdodG52bS5jIHwgIDEgLQ0KPj4+Pj4gMiBm
aWxlcyBjaGFuZ2VkLCA5IGluc2VydGlvbnMoKyksIDIgZGVsZXRpb25zKC0pDQo+Pj4+PiANCj4+
Pj4+IGRpZmYgLS1naXQgYS9kcml2ZXJzL2xpZ2h0bnZtL3BibGstaW5pdC5jDQo+Pj4+PiBiL2Ry
aXZlcnMvbGlnaHRudm0vcGJsay1pbml0LmMgaW5kZXggZDY1ZDJmOTcyY2NmLi4wZjI3Nzc0NDI2
NmINCj4+Pj4+IDEwMDY0NA0KPj4+Pj4gLS0tIGEvZHJpdmVycy9saWdodG52bS9wYmxrLWluaXQu
Yw0KPj4+Pj4gKysrIGIvZHJpdmVycy9saWdodG52bS9wYmxrLWluaXQuYw0KPj4+Pj4gQEAgLTM1
Niw3ICszNTYsMTUgQEAgc3RhdGljIGludCBwYmxrX2NvcmVfaW5pdChzdHJ1Y3QgcGJsayAqcGJs
aykNCj4+Pj4+IAlhdG9taWM2NF9zZXQoJnBibGstPm5yX2ZsdXNoLCAwKTsNCj4+Pj4+IAlwYmxr
LT5ucl9mbHVzaF9yc3QgPSAwOw0KPj4+Pj4gDQo+Pj4+PiAtCXBibGstPnBnc19pbl9idWZmZXIg
PSBnZW8tPm13X2N1bml0cyAqIGdlby0+YWxsX2x1bnM7DQo+Pj4+PiArCWlmIChnZW8tPm13X2N1
bml0cykgew0KPj4+Pj4gKwkJcGJsay0+cGdzX2luX2J1ZmZlciA9IGdlby0+bXdfY3VuaXRzICog
Z2VvLT5hbGxfbHVuczsNCj4+Pj4+ICsJfSBlbHNlIHsNCj4+Pj4+ICsJCXBibGstPnBnc19pbl9i
dWZmZXIgPSAoZ2VvLT53c19vcHQgPDwgMykgKiBnZW8tPmFsbF9sdW5zOw0KPj4+Pj4gKwkJLyoN
Cj4+Pj4+ICsJCSAqIFNvbWUgZGV2aWNlcyBjYW4gZXhwb3NlIG13X2N1bml0cyBlcXVhbCB0byAw
LCBzbyBsZXQncw0KPj4+PiB1c2UNCj4+Pj4+ICsJCSAqIGhlcmUgZGVmYXVsdCBzYWZlIHZhbHVl
IGZvciBNTEMuDQo+Pj4+PiArCQkgKi8NCj4+Pj4+ICsJfQ0KPj4+Pj4gDQo+Pj4+PiAJcGJsay0+
bWluX3dyaXRlX3BncyA9IGdlby0+d3Nfb3B0ICogKGdlby0+Y3NlY3MgLyBQQUdFX1NJWkUpOw0K
Pj4+Pj4gCW1heF93cml0ZV9wcGFzID0gcGJsay0+bWluX3dyaXRlX3BncyAqIGdlby0+YWxsX2x1
bnM7IGRpZmYgLS1naXQNCj4+Pj4+IGEvZHJpdmVycy9udm1lL2hvc3QvbGlnaHRudm0uYyBiL2Ry
aXZlcnMvbnZtZS9ob3N0L2xpZ2h0bnZtLmMgaW5kZXgNCj4+Pj4+IDQxMjc5ZGE3OTllZC4uYzc0
Nzc5MmRhOTE1IDEwMDY0NA0KPj4+Pj4gLS0tIGEvZHJpdmVycy9udm1lL2hvc3QvbGlnaHRudm0u
Yw0KPj4+Pj4gKysrIGIvZHJpdmVycy9udm1lL2hvc3QvbGlnaHRudm0uYw0KPj4+Pj4gQEAgLTMz
OCw3ICszMzgsNiBAQCBzdGF0aWMgaW50IG52bWVfbnZtX3NldHVwXzEyKHN0cnVjdA0KPj4+PiBu
dm1lX252bV9pZDEyDQo+Pj4+PiAqaWQsDQo+Pj4+PiANCj4+Pj4+IAlnZW8tPndzX21pbiA9IHNl
Y19wZXJfcGc7DQo+Pj4+PiAJZ2VvLT53c19vcHQgPSBzZWNfcGVyX3BnOw0KPj4+Pj4gLQlnZW8t
Pm13X2N1bml0cyA9IGdlby0+d3Nfb3B0IDw8IDM7CS8qIGRlZmF1bHQgdG8gTUxDIHNhZmUgdmFs
dWVzDQo+Pj4+ICovDQo+Pj4+PiAvKiBEbyBub3QgaW1wb3NlIHZhbHVlcyBmb3IgbWF4aW11bSBu
dW1iZXIgb2Ygb3BlbiBibG9ja3MgYXMgaXQgaXMNCj4+Pj4+IAkgKiB1bnNwZWNpZmllZCBpbiAx
LjIuIFVzZXJzIG9mIDEuMiBtdXN0IGJlIGF3YXJlIG9mIHRoaXMgYW5kDQo+Pj4+PiBldmVudHVh
bGx5DQo+Pj4+PiAtLQ0KPj4+Pj4gMi4xMS4wDQo+Pj4+IA0KPj4+PiBCeSBkb2luZyB0aGlzLCAx
LjIgZnV0dXJlIHVzZXJzIChiZXlvbmQgcGJsayksIHdpbGwgZmFpbCB0byBoYXZlIGENCj4+Pj4g
dmFsaWQgbXdfY3VuaXRzIHZhbHVlLiBJdCdzIG9rIHRvIGRlYWwgd2l0aCB0aGUgMCBjYXNlIGlu
IHBibGssIGJ1dCBJDQo+Pj4+IGJlbGlldmUgdGhhdCB3ZSBzaG91bGQgaGF2ZSB0aGUgZGVmYXVs
dCB2YWx1ZSBmb3IgMS4yIGVpdGhlciB3YXkuDQo+Pj4gDQo+Pj4gSSdtIG5vdCBzdXJlLiBGcm9t
IG15IHVuZGVyc3RhbmRpbmcsIHNldHRpbmcgb2YgZGVmYXVsdCB2YWx1ZSB3YXMNCj4+PiB3b3Jr
YXJvdW5kIGZvciBwYmxrIGNhc2UsIGFtIEkgcmlnaHQgPy4NCj4+IA0KPj4gVGhlIGRlZmF1bHQg
dmFsdWUgY292ZXJzIHRoZSBNTEMgY2FzZSBkaXJlY3RseSBhdCB0aGUgbGlnaHRudm0gbGF5ZXIs
IGFzDQo+PiBvcHBvc2VkIHRvIGRvaW5nIGl0IGRpcmVjdGx5IGluIHBibGsuIFNpbmNlIHBibGsg
aXMgdGhlIG9ubHkgdXNlciBub3csIHlvdSBjYW4NCj4+IGFyZ3VlIHRoYXQgYWxsIGNoYW5nZXMg
aW4gdGhlIGxpZ2h0bnZtIGxheWVyIGFyZSB0byBzb2x2ZSBwYmxrIGlzc3VlcywgYnV0IHRoZQ0K
Pj4gaWRlYSBpcyB0aGF0IHRoZSBnZW9tZXRyeSBzaG91bGQgYmUgZ2VuZXJpYy4NCj4+IA0KPj4+
IEluIG15IG9waW5pb24gYW55IHVzZXIgb2YgMS4yDQo+Pj4gc3BlYyBzaG91bGQgYmUgYXdhcmUg
dGhhdCB0aGVyZSBpcyBub3QgbXdfY3VuaXQgdmFsdWUuIEZyb20gbXkgcG9pbnQNCj4+PiBvZiB2
aWV3LCBsZWF2aW5nIGhlcmUgMCAoYW5kIGRlY2lzaW9uIHdoYXQgZG8gd2l0aCBpdCB0byBsaWdo
dG52bQ0KPj4+IHVzZXIpIGlzIG1vcmUgc2FmZXIgd2F5LCBidXQgbWF5YmUgSSdtIHdyb25nLiBJ
IGJlbGlldmUgdGhhdCBpdCBpcw0KPj4+IHRvcGljIHRvIHdpZGVyIGRpc2N1c3Npb24gd2l0aCBt
YWludGFpbmVycy4NCj4+IA0KPj4gMS4yIGFuZCAyLjAgaGF2ZSBkaWZmZXJlbnQgZ2VvbWV0cmll
cywgYnV0IHdoZW4gd2UgZGVzaWduZWQgdGhlIGNvbW1vbg0KPj4gbnZtX2dlbyBzdHJ1Y3R1cmUs
IHRoZSBpZGVhIHdhcyB0byBhYnN0cmFjdCBib3RoIHNwZWNzIGFuZCBhbGxvdyB0aGUgdXBwZXIN
Cj4+IGxheWVycyB0byB1c2UgdGhlIGdlb21ldHJ5IHRyYW5zcGFyZW50bHkuDQo+PiANCj4+IFNw
ZWNpZmljYWxseSBpbiBwYmxrLCBJIHdvdWxkIHByZWZlciB0byBrZWVwIGl0IGluIHN1Y2ggYSB3
YXkgdGhhdCB3ZSBkb24ndCBuZWVkDQo+PiB0byBtZWRpYSBzcGVjaWZpYyBwb2xpY2llcyAoZS5n
Liwgc2V0IGRlZmF1bHQgdmFsdWVzIGZvciBNTEMgbWVtb3JpZXMpLCBhcyBhDQo+PiBnZW5lcmFs
IGRlc2lnbiBwcmluY2lwbGUuIFdlIGFscmVhZHkgZG8gc29tZSBnZW9tZXRyeSB2ZXJzaW9uIGNo
ZWNrcyB0bw0KPj4gYXZvaWQgZGVyZWZlcmVuY2luZyB1bm5lY2Vzc2FyeSBwb2ludGVycyBvbiB0
aGUgZmFzdCBwYXRoLCB3aGljaCBJIHdvdWxkDQo+PiBldmVudHVhbGx5IGxpa2UgdG8gcmVtb3Zl
Lg0KPiANCj4gT2ssIG5vdyBJIHVuZGVyc3RhbmQgeW91ciBwb2ludCBvZiB2aWV3IGFuZCBhZ3Jl
ZSB3aXRoIHRoYXQsIEkgd2lsbA0KPiBwcmVwYXJlIHNlY29uZCB2ZXJzaW9uIG9mIHRoaXMgcGF0
Y2ggd2l0aG91dCB0aGlzIGNoYW5nZS4NCg0KU291bmRzIGdvb2QuDQoNCj4gVGhhbmtzIGZvcg0K
PiB0aGUgY2xhcmlmaWNhdGlvbi4NCj4gDQoNClN1cmUgOikNCg0KPj4+PiBBIG1vcmUgZ2VuZXJp
YyB3YXkgb2YgZG9pbmcgdGhpcyB3b3VsZCBiZSB0byBoYXZlIGEgZGVmYXVsdCB2YWx1ZSBmb3IN
Cj4+Pj4gMi4wIHRvbywgaW4gY2FzZSBtd19jdW5pdHMgaXMgcmVwb3J0ZWQgYXMgMC4NCj4+PiAN
Cj4+PiBTaW5jZSAwIGlzIGNvcnJlY3QgdmFsdWUgYW5kIHVzZXJzIGNhbiBtYWtlIGRpZmZlcmVu
dCBkZWNpc2lvbnMgYmFzZWQNCj4+PiBvbiBpdCwgSSB0aGluayB3ZSBzaG91bGRuJ3Qgb3Zlcndy
aXRlIGl0IGJ5IGRlZmF1bHQgdmFsdWUuIElzIGl0IG1ha2UNCj4+PiBzZW5zZT8NCj4+IA0KPj4g
SGVyZSBJIG1lYW50IGF0IGEgcGJsayBsZXZlbCAtIEkgc2hvdWxkIGhhdmUgc3BlY2lmaWVkIGl0
LiBBdCB0aGUgZ2VvbWV0cnkNCj4+IGxldmVsLCB3ZSBzaG91bGQgbm90IGNoYW5nZSBpdC4NCj4+
IA0KPj4gVGhlIGNhc2UgSSBhbSB0aGlua2luZyBpcyBpZiBtd19jdWludHMgcmVwb2ludHMgMCwg
YnV0IHdzX21pbiA+IDAuIEluIHRoaXMgY2FzZSwNCj4+IHdlIHN0aWxsIG5lZWQgYSBob3N0IHNp
ZGUgYnVmZmVyIHRvIHNlcnZlIDwgd3NfbWluIEkvT3MsIGV2ZW4gdGhvdWdoIHRoZQ0KPj4gZGV2
aWNlIGRvZXMgbm90IHJlcXVpcmUgdGhlIGJ1ZmZlciB0byBndWFyYW50ZWUgcmVhZHMuDQo+IA0K
PiBPaCwgb2sgbm93IHdlIGFyZSBvbiB0aGUgc2FtZSBwYWdlLiBJbiB0aGlzIHBhdGNoIEkgd2Fz
IHRyeWluZyB0bw0KPiBhZGRyZXNzIHN1Y2ggY2FzZS4gRG8geW91IGhhdmUgb3RoZXIgaWRlYSBo
b3cgdG8gZG8gaXQgb3IgaGVyZSBhcmUgeW91DQo+IHRoaW5raW5nIG9ubHkgb24gdmFsdWUgb2Yg
ZGVmYXVsdCB2YXJpYWJsZT8NCg0KSWYgZG9pbmcgdGhpcywgSSBndWVzcyB0aGF0IHNvbWV0aGlu
ZyBpbiB0aGUgbGluZSBvZiB3aGF0IHlvdSBkaWQgd2l0aA0KaW5jcmVhc2luZyB0aGUgc2l6ZSBv
ZiB0aGUgd3JpdGUgYnVmZmVyIHZpYSBhIG1vZHVsZSBwYXJhbWV0ZXIuIEZvcg0KZXhhbXBsZSwg
Y2hlY2tpbmcgaWYgdGhlIHNpemUgb2YgdGhlIHdyaXRlIGJ1ZmZlciBiYXNlZCBvbiBtd19jdWlu
dHMgaXMNCmVub3VnaCB0byBjb3ZlciB3c19taW4sIHdoaWNoIG5vcm1hbGx5IHdvdWxkIG9ubHkg
YmUgYW4gaXNzdWUgd2hlbg0KbXdfY3VpbnRzID09IDAgb3Igd2hlbiB0aGUgbnVtYmVyIG9mIFBV
cyB1c2VkIGZvciB0aGUgcGJsayBpbnN0YW5jZSBpcw0KdmVyeSBzbWFsbCBhbmQgbXdfY3VpbnRz
IDwgbnJfbHVucyAqIHdzX21pbi4NCg0KPiANCj4+Pj4gSmF2aWVyDQo+Pj4gDQo+Pj4gVGhhbmtz
LA0KPj4+IE1hcmNpbg0KPj4gDQo+PiBKYXZpZXINCj4gVGhhbmtzLA0KPiBNYXJjaW4NCg==

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
@ 2018-06-04 11:15             ` Javier Gonzalez
  0 siblings, 0 replies; 43+ messages in thread
From: Javier Gonzalez @ 2018-06-04 11:15 UTC (permalink / raw)
  To: Dziegielewski, Marcin
  Cc: Matias Bjørling, Jens Axboe, linux-block, linux-kernel,
	Konopko, Igor J


> On 4 Jun 2018, at 13.11, Dziegielewski, Marcin <marcin.dziegielewski@intel.com> wrote:
> 
>> -----Original Message-----
>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>> Sent: Monday, June 4, 2018 12:22 PM
>> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
>> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>; linux-
>> block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko, Igor J
>> <igor.j.konopko@intel.com>
>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
>> equals to 0
>> 
>>> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin
>> <marcin.dziegielewski@intel.com> wrote:
>>> Frist of all I want to say sorry for late response - I was on holiday.
>>> 
>>>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>>>> Sent: Monday, May 28, 2018 1:03 PM
>>>> To: Matias Bjørling <mb@lightnvm.io>
>>>> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org; linux-
>>>> kernel@vger.kernel.org; Dziegielewski, Marcin
>>>> <marcin.dziegielewski@intel.com>; Konopko, Igor J
>>>> <igor.j.konopko@intel.com>
>>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
>>>> mw_cunits equals to 0
>>>> 
>>>>> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
>>>>> 
>>>>> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
>>>>> 
>>>>> Some devices can expose mw_cunits equal to 0, it can cause creation
>>>>> of too small write buffer and cause performance to drop on write
>>>>> workloads.
>>>>> 
>>>>> To handle that, we use the default value for MLC and beacause it
>>>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
>>>>> nvme_nvm_setup_12 function isn't longer necessary.
>>>>> 
>>>>> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
>>>>> ---
>>>>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
>>>>> drivers/nvme/host/lightnvm.c |  1 -
>>>>> 2 files changed, 9 insertions(+), 2 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/lightnvm/pblk-init.c
>>>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b
>>>>> 100644
>>>>> --- a/drivers/lightnvm/pblk-init.c
>>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
>>>>> 	atomic64_set(&pblk->nr_flush, 0);
>>>>> 	pblk->nr_flush_rst = 0;
>>>>> 
>>>>> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>>>> +	if (geo->mw_cunits) {
>>>>> +		pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>>>> +	} else {
>>>>> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
>>>>> +		/*
>>>>> +		 * Some devices can expose mw_cunits equal to 0, so let's
>>>> use
>>>>> +		 * here default safe value for MLC.
>>>>> +		 */
>>>>> +	}
>>>>> 
>>>>> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>>>>> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
>>>>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index
>>>>> 41279da799ed..c747792da915 100644
>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
>>>> nvme_nvm_id12
>>>>> *id,
>>>>> 
>>>>> 	geo->ws_min = sec_per_pg;
>>>>> 	geo->ws_opt = sec_per_pg;
>>>>> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC safe values
>>>> */
>>>>> /* Do not impose values for maximum number of open blocks as it is
>>>>> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
>>>>> eventually
>>>>> --
>>>>> 2.11.0
>>>> 
>>>> By doing this, 1.2 future users (beyond pblk), will fail to have a
>>>> valid mw_cunits value. It's ok to deal with the 0 case in pblk, but I
>>>> believe that we should have the default value for 1.2 either way.
>>> 
>>> I'm not sure. From my understanding, setting of default value was
>>> workaround for pblk case, am I right ?.
>> 
>> The default value covers the MLC case directly at the lightnvm layer, as
>> opposed to doing it directly in pblk. Since pblk is the only user now, you can
>> argue that all changes in the lightnvm layer are to solve pblk issues, but the
>> idea is that the geometry should be generic.
>> 
>>> In my opinion any user of 1.2
>>> spec should be aware that there is not mw_cunit value. From my point
>>> of view, leaving here 0 (and decision what do with it to lightnvm
>>> user) is more safer way, but maybe I'm wrong. I believe that it is
>>> topic to wider discussion with maintainers.
>> 
>> 1.2 and 2.0 have different geometries, but when we designed the common
>> nvm_geo structure, the idea was to abstract both specs and allow the upper
>> layers to use the geometry transparently.
>> 
>> Specifically in pblk, I would prefer to keep it in such a way that we don't need
>> to media specific policies (e.g., set default values for MLC memories), as a
>> general design principle. We already do some geometry version checks to
>> avoid dereferencing unnecessary pointers on the fast path, which I would
>> eventually like to remove.
> 
> Ok, now I understand your point of view and agree with that, I will
> prepare second version of this patch without this change.

Sounds good.

> Thanks for
> the clarification.
> 

Sure :)

>>>> A more generic way of doing this would be to have a default value for
>>>> 2.0 too, in case mw_cunits is reported as 0.
>>> 
>>> Since 0 is correct value and users can make different decisions based
>>> on it, I think we shouldn't overwrite it by default value. Is it make
>>> sense?
>> 
>> Here I meant at a pblk level - I should have specified it. At the geometry
>> level, we should not change it.
>> 
>> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In this case,
>> we still need a host side buffer to serve < ws_min I/Os, even though the
>> device does not require the buffer to guarantee reads.
> 
> Oh, ok now we are on the same page. In this patch I was trying to
> address such case. Do you have other idea how to do it or here are you
> thinking only on value of default variable?

If doing this, I guess that something in the line of what you did with
increasing the size of the write buffer via a module parameter. For
example, checking if the size of the write buffer based on mw_cuints is
enough to cover ws_min, which normally would only be an issue when
mw_cuints == 0 or when the number of PUs used for the pblk instance is
very small and mw_cuints < nr_luns * ws_min.

> 
>>>> Javier
>>> 
>>> Thanks,
>>> Marcin
>> 
>> Javier
> Thanks,
> Marcin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-06-04 11:15             ` Javier Gonzalez
@ 2018-06-04 17:17               ` Dziegielewski, Marcin
  -1 siblings, 0 replies; 43+ messages in thread
From: Dziegielewski, Marcin @ 2018-06-04 17:17 UTC (permalink / raw)
  To: Javier Gonzalez
  Cc: Matias Bjorling, Jens Axboe, linux-block, linux-kernel, Konopko, Igor J

PiBGcm9tOiBKYXZpZXIgR29uemFsZXogW21haWx0bzpqYXZpZXJAY25leGxhYnMuY29tXQ0KPiBT
ZW50OiBNb25kYXksIEp1bmUgNCwgMjAxOCAxOjE2IFBNDQo+IFRvOiBEemllZ2llbGV3c2tpLCBN
YXJjaW4gPG1hcmNpbi5kemllZ2llbGV3c2tpQGludGVsLmNvbT4NCj4gQ2M6IE1hdGlhcyBCasO4
cmxpbmcgPG1iQGxpZ2h0bnZtLmlvPjsgSmVucyBBeGJvZSA8YXhib2VAZmIuY29tPjsgbGludXgt
DQo+IGJsb2NrQHZnZXIua2VybmVsLm9yZzsgbGludXgta2VybmVsQHZnZXIua2VybmVsLm9yZzsg
S29ub3BrbywgSWdvciBKDQo+IDxpZ29yLmoua29ub3Brb0BpbnRlbC5jb20+DQo+IFN1YmplY3Q6
IFJlOiBbR0lUIFBVTEwgMTgvMjBdIGxpZ2h0bnZtOiBwYmxrOiBoYW5kbGUgY2FzZSB3aGVuIG13
X2N1bml0cw0KPiBlcXVhbHMgdG8gMA0KPiANCj4gDQo+ID4gT24gNCBKdW4gMjAxOCwgYXQgMTMu
MTEsIER6aWVnaWVsZXdza2ksIE1hcmNpbg0KPiA8bWFyY2luLmR6aWVnaWVsZXdza2lAaW50ZWwu
Y29tPiB3cm90ZToNCj4gPg0KPiA+PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+PiBG
cm9tOiBKYXZpZXIgR29uemFsZXogW21haWx0bzpqYXZpZXJAY25leGxhYnMuY29tXQ0KPiA+PiBT
ZW50OiBNb25kYXksIEp1bmUgNCwgMjAxOCAxMjoyMiBQTQ0KPiA+PiBUbzogRHppZWdpZWxld3Nr
aSwgTWFyY2luIDxtYXJjaW4uZHppZWdpZWxld3NraUBpbnRlbC5jb20+DQo+ID4+IENjOiBNYXRp
YXMgQmrDuHJsaW5nIDxtYkBsaWdodG52bS5pbz47IEplbnMgQXhib2UgPGF4Ym9lQGZiLmNvbT47
DQo+ID4+IGxpbnV4LSBibG9ja0B2Z2VyLmtlcm5lbC5vcmc7IGxpbnV4LWtlcm5lbEB2Z2VyLmtl
cm5lbC5vcmc7IEtvbm9wa28sDQo+ID4+IElnb3IgSiA8aWdvci5qLmtvbm9wa29AaW50ZWwuY29t
Pg0KPiA+PiBTdWJqZWN0OiBSZTogW0dJVCBQVUxMIDE4LzIwXSBsaWdodG52bTogcGJsazogaGFu
ZGxlIGNhc2Ugd2hlbg0KPiA+PiBtd19jdW5pdHMgZXF1YWxzIHRvIDANCj4gPj4NCj4gPj4+IE9u
IDQgSnVuIDIwMTgsIGF0IDEyLjA5LCBEemllZ2llbGV3c2tpLCBNYXJjaW4NCj4gPj4gPG1hcmNp
bi5kemllZ2llbGV3c2tpQGludGVsLmNvbT4gd3JvdGU6DQo+ID4+PiBGcmlzdCBvZiBhbGwgSSB3
YW50IHRvIHNheSBzb3JyeSBmb3IgbGF0ZSByZXNwb25zZSAtIEkgd2FzIG9uIGhvbGlkYXkuDQo+
ID4+Pg0KPiA+Pj4+IEZyb206IEphdmllciBHb256YWxleiBbbWFpbHRvOmphdmllckBjbmV4bGFi
cy5jb21dDQo+ID4+Pj4gU2VudDogTW9uZGF5LCBNYXkgMjgsIDIwMTggMTowMyBQTQ0KPiA+Pj4+
IFRvOiBNYXRpYXMgQmrDuHJsaW5nIDxtYkBsaWdodG52bS5pbz4NCj4gPj4+PiBDYzogSmVucyBB
eGJvZSA8YXhib2VAZmIuY29tPjsgbGludXgtYmxvY2tAdmdlci5rZXJuZWwub3JnOyBsaW51eC0N
Cj4gPj4+PiBrZXJuZWxAdmdlci5rZXJuZWwub3JnOyBEemllZ2llbGV3c2tpLCBNYXJjaW4NCj4g
Pj4+PiA8bWFyY2luLmR6aWVnaWVsZXdza2lAaW50ZWwuY29tPjsgS29ub3BrbywgSWdvciBKDQo+
ID4+Pj4gPGlnb3Iuai5rb25vcGtvQGludGVsLmNvbT4NCj4gPj4+PiBTdWJqZWN0OiBSZTogW0dJ
VCBQVUxMIDE4LzIwXSBsaWdodG52bTogcGJsazogaGFuZGxlIGNhc2Ugd2hlbg0KPiA+Pj4+IG13
X2N1bml0cyBlcXVhbHMgdG8gMA0KPiA+Pj4+DQo+ID4+Pj4+IE9uIDI4IE1heSAyMDE4LCBhdCAx
MC41OCwgTWF0aWFzIEJqw7hybGluZyA8bWJAbGlnaHRudm0uaW8+IHdyb3RlOg0KPiA+Pj4+Pg0K
PiA+Pj4+PiBGcm9tOiBNYXJjaW4gRHppZWdpZWxld3NraSA8bWFyY2luLmR6aWVnaWVsZXdza2lA
aW50ZWwuY29tPg0KPiA+Pj4+Pg0KPiA+Pj4+PiBTb21lIGRldmljZXMgY2FuIGV4cG9zZSBtd19j
dW5pdHMgZXF1YWwgdG8gMCwgaXQgY2FuIGNhdXNlDQo+ID4+Pj4+IGNyZWF0aW9uIG9mIHRvbyBz
bWFsbCB3cml0ZSBidWZmZXIgYW5kIGNhdXNlIHBlcmZvcm1hbmNlIHRvIGRyb3ANCj4gPj4+Pj4g
b24gd3JpdGUgd29ya2xvYWRzLg0KPiA+Pj4+Pg0KPiA+Pj4+PiBUbyBoYW5kbGUgdGhhdCwgd2Ug
dXNlIHRoZSBkZWZhdWx0IHZhbHVlIGZvciBNTEMgYW5kIGJlYWNhdXNlIGl0DQo+ID4+Pj4+IGNv
dmVycyBib3RoIDEuMiBhbmQgMi4wIE9DIHNwZWNpZmljYXRpb24sIHNldHRpbmcgdXAgbXdfY3Vu
aXRzIGluDQo+ID4+Pj4+IG52bWVfbnZtX3NldHVwXzEyIGZ1bmN0aW9uIGlzbid0IGxvbmdlciBu
ZWNlc3NhcnkuDQo+ID4+Pj4+DQo+ID4+Pj4+IFNpZ25lZC1vZmYtYnk6IE1hcmNpbiBEemllZ2ll
bGV3c2tpDQo+ID4+Pj4+IDxtYXJjaW4uZHppZWdpZWxld3NraUBpbnRlbC5jb20+DQo+ID4+Pj4+
IFNpZ25lZC1vZmYtYnk6IElnb3IgS29ub3BrbyA8aWdvci5qLmtvbm9wa29AaW50ZWwuY29tPg0K
PiA+Pj4+PiBTaWduZWQtb2ZmLWJ5OiBNYXRpYXMgQmrDuHJsaW5nIDxtYkBsaWdodG52bS5pbz4N
Cj4gPj4+Pj4gLS0tDQo+ID4+Pj4+IGRyaXZlcnMvbGlnaHRudm0vcGJsay1pbml0LmMgfCAxMCAr
KysrKysrKystDQo+ID4+Pj4+IGRyaXZlcnMvbnZtZS9ob3N0L2xpZ2h0bnZtLmMgfCAgMSAtDQo+
ID4+Pj4+IDIgZmlsZXMgY2hhbmdlZCwgOSBpbnNlcnRpb25zKCspLCAyIGRlbGV0aW9ucygtKQ0K
PiA+Pj4+Pg0KPiA+Pj4+PiBkaWZmIC0tZ2l0IGEvZHJpdmVycy9saWdodG52bS9wYmxrLWluaXQu
Yw0KPiA+Pj4+PiBiL2RyaXZlcnMvbGlnaHRudm0vcGJsay1pbml0LmMgaW5kZXggZDY1ZDJmOTcy
Y2NmLi4wZjI3Nzc0NDI2NmINCj4gPj4+Pj4gMTAwNjQ0DQo+ID4+Pj4+IC0tLSBhL2RyaXZlcnMv
bGlnaHRudm0vcGJsay1pbml0LmMNCj4gPj4+Pj4gKysrIGIvZHJpdmVycy9saWdodG52bS9wYmxr
LWluaXQuYw0KPiA+Pj4+PiBAQCAtMzU2LDcgKzM1NiwxNSBAQCBzdGF0aWMgaW50IHBibGtfY29y
ZV9pbml0KHN0cnVjdCBwYmxrICpwYmxrKQ0KPiA+Pj4+PiAJYXRvbWljNjRfc2V0KCZwYmxrLT5u
cl9mbHVzaCwgMCk7DQo+ID4+Pj4+IAlwYmxrLT5ucl9mbHVzaF9yc3QgPSAwOw0KPiA+Pj4+Pg0K
PiA+Pj4+PiAtCXBibGstPnBnc19pbl9idWZmZXIgPSBnZW8tPm13X2N1bml0cyAqIGdlby0+YWxs
X2x1bnM7DQo+ID4+Pj4+ICsJaWYgKGdlby0+bXdfY3VuaXRzKSB7DQo+ID4+Pj4+ICsJCXBibGst
PnBnc19pbl9idWZmZXIgPSBnZW8tPm13X2N1bml0cyAqIGdlby0NCj4gPmFsbF9sdW5zOw0KPiA+
Pj4+PiArCX0gZWxzZSB7DQo+ID4+Pj4+ICsJCXBibGstPnBnc19pbl9idWZmZXIgPSAoZ2VvLT53
c19vcHQgPDwgMykgKiBnZW8tDQo+ID5hbGxfbHVuczsNCj4gPj4+Pj4gKwkJLyoNCj4gPj4+Pj4g
KwkJICogU29tZSBkZXZpY2VzIGNhbiBleHBvc2UgbXdfY3VuaXRzIGVxdWFsIHRvIDAsIHNvDQo+
IGxldCdzDQo+ID4+Pj4gdXNlDQo+ID4+Pj4+ICsJCSAqIGhlcmUgZGVmYXVsdCBzYWZlIHZhbHVl
IGZvciBNTEMuDQo+ID4+Pj4+ICsJCSAqLw0KPiA+Pj4+PiArCX0NCj4gPj4+Pj4NCj4gPj4+Pj4g
CXBibGstPm1pbl93cml0ZV9wZ3MgPSBnZW8tPndzX29wdCAqIChnZW8tPmNzZWNzIC8gUEFHRV9T
SVpFKTsNCj4gPj4+Pj4gCW1heF93cml0ZV9wcGFzID0gcGJsay0+bWluX3dyaXRlX3BncyAqIGdl
by0+YWxsX2x1bnM7IGRpZmYgLS1naXQNCj4gPj4+Pj4gYS9kcml2ZXJzL252bWUvaG9zdC9saWdo
dG52bS5jIGIvZHJpdmVycy9udm1lL2hvc3QvbGlnaHRudm0uYw0KPiA+Pj4+PiBpbmRleA0KPiA+
Pj4+PiA0MTI3OWRhNzk5ZWQuLmM3NDc3OTJkYTkxNSAxMDA2NDQNCj4gPj4+Pj4gLS0tIGEvZHJp
dmVycy9udm1lL2hvc3QvbGlnaHRudm0uYw0KPiA+Pj4+PiArKysgYi9kcml2ZXJzL252bWUvaG9z
dC9saWdodG52bS5jDQo+ID4+Pj4+IEBAIC0zMzgsNyArMzM4LDYgQEAgc3RhdGljIGludCBudm1l
X252bV9zZXR1cF8xMihzdHJ1Y3QNCj4gPj4+PiBudm1lX252bV9pZDEyDQo+ID4+Pj4+ICppZCwN
Cj4gPj4+Pj4NCj4gPj4+Pj4gCWdlby0+d3NfbWluID0gc2VjX3Blcl9wZzsNCj4gPj4+Pj4gCWdl
by0+d3Nfb3B0ID0gc2VjX3Blcl9wZzsNCj4gPj4+Pj4gLQlnZW8tPm13X2N1bml0cyA9IGdlby0+
d3Nfb3B0IDw8IDM7CS8qIGRlZmF1bHQgdG8gTUxDDQo+IHNhZmUgdmFsdWVzDQo+ID4+Pj4gKi8N
Cj4gPj4+Pj4gLyogRG8gbm90IGltcG9zZSB2YWx1ZXMgZm9yIG1heGltdW0gbnVtYmVyIG9mIG9w
ZW4gYmxvY2tzIGFzIGl0IGlzDQo+ID4+Pj4+IAkgKiB1bnNwZWNpZmllZCBpbiAxLjIuIFVzZXJz
IG9mIDEuMiBtdXN0IGJlIGF3YXJlIG9mIHRoaXMgYW5kDQo+ID4+Pj4+IGV2ZW50dWFsbHkNCj4g
Pj4+Pj4gLS0NCj4gPj4+Pj4gMi4xMS4wDQo+ID4+Pj4NCj4gPj4+PiBCeSBkb2luZyB0aGlzLCAx
LjIgZnV0dXJlIHVzZXJzIChiZXlvbmQgcGJsayksIHdpbGwgZmFpbCB0byBoYXZlIGENCj4gPj4+
PiB2YWxpZCBtd19jdW5pdHMgdmFsdWUuIEl0J3Mgb2sgdG8gZGVhbCB3aXRoIHRoZSAwIGNhc2Ug
aW4gcGJsaywgYnV0DQo+ID4+Pj4gSSBiZWxpZXZlIHRoYXQgd2Ugc2hvdWxkIGhhdmUgdGhlIGRl
ZmF1bHQgdmFsdWUgZm9yIDEuMiBlaXRoZXIgd2F5Lg0KPiA+Pj4NCj4gPj4+IEknbSBub3Qgc3Vy
ZS4gRnJvbSBteSB1bmRlcnN0YW5kaW5nLCBzZXR0aW5nIG9mIGRlZmF1bHQgdmFsdWUgd2FzDQo+
ID4+PiB3b3JrYXJvdW5kIGZvciBwYmxrIGNhc2UsIGFtIEkgcmlnaHQgPy4NCj4gPj4NCj4gPj4g
VGhlIGRlZmF1bHQgdmFsdWUgY292ZXJzIHRoZSBNTEMgY2FzZSBkaXJlY3RseSBhdCB0aGUgbGln
aHRudm0gbGF5ZXIsDQo+ID4+IGFzIG9wcG9zZWQgdG8gZG9pbmcgaXQgZGlyZWN0bHkgaW4gcGJs
ay4gU2luY2UgcGJsayBpcyB0aGUgb25seSB1c2VyDQo+ID4+IG5vdywgeW91IGNhbiBhcmd1ZSB0
aGF0IGFsbCBjaGFuZ2VzIGluIHRoZSBsaWdodG52bSBsYXllciBhcmUgdG8NCj4gPj4gc29sdmUg
cGJsayBpc3N1ZXMsIGJ1dCB0aGUgaWRlYSBpcyB0aGF0IHRoZSBnZW9tZXRyeSBzaG91bGQgYmUg
Z2VuZXJpYy4NCj4gPj4NCj4gPj4+IEluIG15IG9waW5pb24gYW55IHVzZXIgb2YgMS4yDQo+ID4+
PiBzcGVjIHNob3VsZCBiZSBhd2FyZSB0aGF0IHRoZXJlIGlzIG5vdCBtd19jdW5pdCB2YWx1ZS4g
RnJvbSBteSBwb2ludA0KPiA+Pj4gb2YgdmlldywgbGVhdmluZyBoZXJlIDAgKGFuZCBkZWNpc2lv
biB3aGF0IGRvIHdpdGggaXQgdG8gbGlnaHRudm0NCj4gPj4+IHVzZXIpIGlzIG1vcmUgc2FmZXIg
d2F5LCBidXQgbWF5YmUgSSdtIHdyb25nLiBJIGJlbGlldmUgdGhhdCBpdCBpcw0KPiA+Pj4gdG9w
aWMgdG8gd2lkZXIgZGlzY3Vzc2lvbiB3aXRoIG1haW50YWluZXJzLg0KPiA+Pg0KPiA+PiAxLjIg
YW5kIDIuMCBoYXZlIGRpZmZlcmVudCBnZW9tZXRyaWVzLCBidXQgd2hlbiB3ZSBkZXNpZ25lZCB0
aGUNCj4gPj4gY29tbW9uIG52bV9nZW8gc3RydWN0dXJlLCB0aGUgaWRlYSB3YXMgdG8gYWJzdHJh
Y3QgYm90aCBzcGVjcyBhbmQNCj4gPj4gYWxsb3cgdGhlIHVwcGVyIGxheWVycyB0byB1c2UgdGhl
IGdlb21ldHJ5IHRyYW5zcGFyZW50bHkuDQo+ID4+DQo+ID4+IFNwZWNpZmljYWxseSBpbiBwYmxr
LCBJIHdvdWxkIHByZWZlciB0byBrZWVwIGl0IGluIHN1Y2ggYSB3YXkgdGhhdCB3ZQ0KPiA+PiBk
b24ndCBuZWVkIHRvIG1lZGlhIHNwZWNpZmljIHBvbGljaWVzIChlLmcuLCBzZXQgZGVmYXVsdCB2
YWx1ZXMgZm9yDQo+ID4+IE1MQyBtZW1vcmllcyksIGFzIGEgZ2VuZXJhbCBkZXNpZ24gcHJpbmNp
cGxlLiBXZSBhbHJlYWR5IGRvIHNvbWUNCj4gPj4gZ2VvbWV0cnkgdmVyc2lvbiBjaGVja3MgdG8g
YXZvaWQgZGVyZWZlcmVuY2luZyB1bm5lY2Vzc2FyeSBwb2ludGVycw0KPiA+PiBvbiB0aGUgZmFz
dCBwYXRoLCB3aGljaCBJIHdvdWxkIGV2ZW50dWFsbHkgbGlrZSB0byByZW1vdmUuDQo+ID4NCj4g
PiBPaywgbm93IEkgdW5kZXJzdGFuZCB5b3VyIHBvaW50IG9mIHZpZXcgYW5kIGFncmVlIHdpdGgg
dGhhdCwgSSB3aWxsDQo+ID4gcHJlcGFyZSBzZWNvbmQgdmVyc2lvbiBvZiB0aGlzIHBhdGNoIHdp
dGhvdXQgdGhpcyBjaGFuZ2UuDQo+IA0KPiBTb3VuZHMgZ29vZC4NCj4gDQo+ID4gVGhhbmtzIGZv
cg0KPiA+IHRoZSBjbGFyaWZpY2F0aW9uLg0KPiA+DQo+IA0KPiBTdXJlIDopDQo+IA0KPiA+Pj4+
IEEgbW9yZSBnZW5lcmljIHdheSBvZiBkb2luZyB0aGlzIHdvdWxkIGJlIHRvIGhhdmUgYSBkZWZh
dWx0IHZhbHVlDQo+ID4+Pj4gZm9yDQo+ID4+Pj4gMi4wIHRvbywgaW4gY2FzZSBtd19jdW5pdHMg
aXMgcmVwb3J0ZWQgYXMgMC4NCj4gPj4+DQo+ID4+PiBTaW5jZSAwIGlzIGNvcnJlY3QgdmFsdWUg
YW5kIHVzZXJzIGNhbiBtYWtlIGRpZmZlcmVudCBkZWNpc2lvbnMNCj4gPj4+IGJhc2VkIG9uIGl0
LCBJIHRoaW5rIHdlIHNob3VsZG4ndCBvdmVyd3JpdGUgaXQgYnkgZGVmYXVsdCB2YWx1ZS4gSXMN
Cj4gPj4+IGl0IG1ha2Ugc2Vuc2U/DQo+ID4+DQo+ID4+IEhlcmUgSSBtZWFudCBhdCBhIHBibGsg
bGV2ZWwgLSBJIHNob3VsZCBoYXZlIHNwZWNpZmllZCBpdC4gQXQgdGhlDQo+ID4+IGdlb21ldHJ5
IGxldmVsLCB3ZSBzaG91bGQgbm90IGNoYW5nZSBpdC4NCj4gPj4NCj4gPj4gVGhlIGNhc2UgSSBh
bSB0aGlua2luZyBpcyBpZiBtd19jdWludHMgcmVwb2ludHMgMCwgYnV0IHdzX21pbiA+IDAuIElu
DQo+ID4+IHRoaXMgY2FzZSwgd2Ugc3RpbGwgbmVlZCBhIGhvc3Qgc2lkZSBidWZmZXIgdG8gc2Vy
dmUgPCB3c19taW4gSS9PcywNCj4gPj4gZXZlbiB0aG91Z2ggdGhlIGRldmljZSBkb2VzIG5vdCBy
ZXF1aXJlIHRoZSBidWZmZXIgdG8gZ3VhcmFudGVlIHJlYWRzLg0KPiA+DQo+ID4gT2gsIG9rIG5v
dyB3ZSBhcmUgb24gdGhlIHNhbWUgcGFnZS4gSW4gdGhpcyBwYXRjaCBJIHdhcyB0cnlpbmcgdG8N
Cj4gPiBhZGRyZXNzIHN1Y2ggY2FzZS4gRG8geW91IGhhdmUgb3RoZXIgaWRlYSBob3cgdG8gZG8g
aXQgb3IgaGVyZSBhcmUgeW91DQo+ID4gdGhpbmtpbmcgb25seSBvbiB2YWx1ZSBvZiBkZWZhdWx0
IHZhcmlhYmxlPw0KPiANCj4gSWYgZG9pbmcgdGhpcywgSSBndWVzcyB0aGF0IHNvbWV0aGluZyBp
biB0aGUgbGluZSBvZiB3aGF0IHlvdSBkaWQgd2l0aA0KPiBpbmNyZWFzaW5nIHRoZSBzaXplIG9m
IHRoZSB3cml0ZSBidWZmZXIgdmlhIGEgbW9kdWxlIHBhcmFtZXRlci4gRm9yIGV4YW1wbGUsDQo+
IGNoZWNraW5nIGlmIHRoZSBzaXplIG9mIHRoZSB3cml0ZSBidWZmZXIgYmFzZWQgb24gbXdfY3Vp
bnRzIGlzIGVub3VnaCB0bw0KPiBjb3ZlciB3c19taW4sIHdoaWNoIG5vcm1hbGx5IHdvdWxkIG9u
bHkgYmUgYW4gaXNzdWUgd2hlbiBtd19jdWludHMgPT0gMA0KPiBvciB3aGVuIHRoZSBudW1iZXIg
b2YgUFVzIHVzZWQgZm9yIHRoZSBwYmxrIGluc3RhbmNlIGlzIHZlcnkgc21hbGwgYW5kDQo+IG13
X2N1aW50cyA8IG5yX2x1bnMgKiB3c19taW4uDQoNCg0KSSBzZWUgaGVyZSB0d28gY2FzZXM6DQot
IHdoZW4gbXdfY3VuaXRzID4gMCAgYnVmZmVyIHNpemUgc2hvdWxkIGhhdmUgbnVtYmVyIG9mIGVu
dHJpZXMgYXQgbGVhc3QgIG1heChtd19jdW5pdHMsIHdzX21pbikgKiBucl9sdW5zIGFuZCBoZXJl
IHdlIGFyZSB0YWtpbmcgY2FyZSBvZiBib3RoIGNhc2VzIG13X2N1bml0cyA+IHdzX21pbiBhbmQg
bXdfY3VuaXRzIDwgd3NfbWluLg0KLSB3aGVuIG13X2N1bml0ID09IDAgIGJ1ZmZlciBzaXplIHNo
b3VsZCBoYXZlIG51bWJlciBvZiBlbnRyaWVzIGF0IGxlYXN0ICB3c19taW4gKiBuciBfbHVucyBh
bmQgd2UgY2FuIHVzZSB0aGUgc2FtZSBwdXNldWRvY29kZSBhcyBhYm92ZS4NCg0KRG8geW91IHNl
ZSBhbnkgb3RoZXIgY2FzZT8gQ291bGQgeW91IGNsYXJpZnkgc2Vjb25kIGNhc2UgbWVudGlvbmVk
IGJ5IHlvdSBvciBtYXliZSBkaWQgeW91IG1lYW4gb3Bwb3NpdGUgY2FzZT8gSWYgeWVzLCBJIGJl
bGlldmUgdGhhdCBhYm92ZSBwc2V1ZG8gY29kZSB3aWxsIGhhbmRsZSBzdWNoIGNhc2UgdG9vLg0K
DQo+IA0KPiA+DQo+ID4+Pj4gSmF2aWVyDQo+ID4+Pg0KPiA+Pj4gVGhhbmtzLA0KPiA+Pj4gTWFy
Y2luDQo+ID4+DQo+ID4+IEphdmllcg0KPiA+IFRoYW5rcywNCj4gPiBNYXJjaW4NClRoYW5rcyEs
DQpNYXJjaW4NCg0KDQo=

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
@ 2018-06-04 17:17               ` Dziegielewski, Marcin
  0 siblings, 0 replies; 43+ messages in thread
From: Dziegielewski, Marcin @ 2018-06-04 17:17 UTC (permalink / raw)
  To: Javier Gonzalez
  Cc: Matias Bjorling, Jens Axboe, linux-block, linux-kernel, Konopko, Igor J

> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> Sent: Monday, June 4, 2018 1:16 PM
> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>; linux-
> block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko, Igor J
> <igor.j.konopko@intel.com>
> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
> equals to 0
> 
> 
> > On 4 Jun 2018, at 13.11, Dziegielewski, Marcin
> <marcin.dziegielewski@intel.com> wrote:
> >
> >> -----Original Message-----
> >> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> >> Sent: Monday, June 4, 2018 12:22 PM
> >> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
> >> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>;
> >> linux- block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko,
> >> Igor J <igor.j.konopko@intel.com>
> >> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >> mw_cunits equals to 0
> >>
> >>> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin
> >> <marcin.dziegielewski@intel.com> wrote:
> >>> Frist of all I want to say sorry for late response - I was on holiday.
> >>>
> >>>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> >>>> Sent: Monday, May 28, 2018 1:03 PM
> >>>> To: Matias Bjørling <mb@lightnvm.io>
> >>>> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org; linux-
> >>>> kernel@vger.kernel.org; Dziegielewski, Marcin
> >>>> <marcin.dziegielewski@intel.com>; Konopko, Igor J
> >>>> <igor.j.konopko@intel.com>
> >>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >>>> mw_cunits equals to 0
> >>>>
> >>>>> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> >>>>>
> >>>>> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> >>>>>
> >>>>> Some devices can expose mw_cunits equal to 0, it can cause
> >>>>> creation of too small write buffer and cause performance to drop
> >>>>> on write workloads.
> >>>>>
> >>>>> To handle that, we use the default value for MLC and beacause it
> >>>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
> >>>>> nvme_nvm_setup_12 function isn't longer necessary.
> >>>>>
> >>>>> Signed-off-by: Marcin Dziegielewski
> >>>>> <marcin.dziegielewski@intel.com>
> >>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>>>> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
> >>>>> ---
> >>>>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
> >>>>> drivers/nvme/host/lightnvm.c |  1 -
> >>>>> 2 files changed, 9 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/lightnvm/pblk-init.c
> >>>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b
> >>>>> 100644
> >>>>> --- a/drivers/lightnvm/pblk-init.c
> >>>>> +++ b/drivers/lightnvm/pblk-init.c
> >>>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
> >>>>> 	atomic64_set(&pblk->nr_flush, 0);
> >>>>> 	pblk->nr_flush_rst = 0;
> >>>>>
> >>>>> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> >>>>> +	if (geo->mw_cunits) {
> >>>>> +		pblk->pgs_in_buffer = geo->mw_cunits * geo-
> >all_luns;
> >>>>> +	} else {
> >>>>> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo-
> >all_luns;
> >>>>> +		/*
> >>>>> +		 * Some devices can expose mw_cunits equal to 0, so
> let's
> >>>> use
> >>>>> +		 * here default safe value for MLC.
> >>>>> +		 */
> >>>>> +	}
> >>>>>
> >>>>> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
> >>>>> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
> >>>>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> >>>>> index
> >>>>> 41279da799ed..c747792da915 100644
> >>>>> --- a/drivers/nvme/host/lightnvm.c
> >>>>> +++ b/drivers/nvme/host/lightnvm.c
> >>>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
> >>>> nvme_nvm_id12
> >>>>> *id,
> >>>>>
> >>>>> 	geo->ws_min = sec_per_pg;
> >>>>> 	geo->ws_opt = sec_per_pg;
> >>>>> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC
> safe values
> >>>> */
> >>>>> /* Do not impose values for maximum number of open blocks as it is
> >>>>> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
> >>>>> eventually
> >>>>> --
> >>>>> 2.11.0
> >>>>
> >>>> By doing this, 1.2 future users (beyond pblk), will fail to have a
> >>>> valid mw_cunits value. It's ok to deal with the 0 case in pblk, but
> >>>> I believe that we should have the default value for 1.2 either way.
> >>>
> >>> I'm not sure. From my understanding, setting of default value was
> >>> workaround for pblk case, am I right ?.
> >>
> >> The default value covers the MLC case directly at the lightnvm layer,
> >> as opposed to doing it directly in pblk. Since pblk is the only user
> >> now, you can argue that all changes in the lightnvm layer are to
> >> solve pblk issues, but the idea is that the geometry should be generic.
> >>
> >>> In my opinion any user of 1.2
> >>> spec should be aware that there is not mw_cunit value. From my point
> >>> of view, leaving here 0 (and decision what do with it to lightnvm
> >>> user) is more safer way, but maybe I'm wrong. I believe that it is
> >>> topic to wider discussion with maintainers.
> >>
> >> 1.2 and 2.0 have different geometries, but when we designed the
> >> common nvm_geo structure, the idea was to abstract both specs and
> >> allow the upper layers to use the geometry transparently.
> >>
> >> Specifically in pblk, I would prefer to keep it in such a way that we
> >> don't need to media specific policies (e.g., set default values for
> >> MLC memories), as a general design principle. We already do some
> >> geometry version checks to avoid dereferencing unnecessary pointers
> >> on the fast path, which I would eventually like to remove.
> >
> > Ok, now I understand your point of view and agree with that, I will
> > prepare second version of this patch without this change.
> 
> Sounds good.
> 
> > Thanks for
> > the clarification.
> >
> 
> Sure :)
> 
> >>>> A more generic way of doing this would be to have a default value
> >>>> for
> >>>> 2.0 too, in case mw_cunits is reported as 0.
> >>>
> >>> Since 0 is correct value and users can make different decisions
> >>> based on it, I think we shouldn't overwrite it by default value. Is
> >>> it make sense?
> >>
> >> Here I meant at a pblk level - I should have specified it. At the
> >> geometry level, we should not change it.
> >>
> >> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In
> >> this case, we still need a host side buffer to serve < ws_min I/Os,
> >> even though the device does not require the buffer to guarantee reads.
> >
> > Oh, ok now we are on the same page. In this patch I was trying to
> > address such case. Do you have other idea how to do it or here are you
> > thinking only on value of default variable?
> 
> If doing this, I guess that something in the line of what you did with
> increasing the size of the write buffer via a module parameter. For example,
> checking if the size of the write buffer based on mw_cuints is enough to
> cover ws_min, which normally would only be an issue when mw_cuints == 0
> or when the number of PUs used for the pblk instance is very small and
> mw_cuints < nr_luns * ws_min.


I see here two cases:
- when mw_cunits > 0  buffer size should have number of entries at least  max(mw_cunits, ws_min) * nr_luns and here we are taking care of both cases mw_cunits > ws_min and mw_cunits < ws_min.
- when mw_cunit == 0  buffer size should have number of entries at least  ws_min * nr _luns and we can use the same puseudocode as above.

Do you see any other case? Could you clarify second case mentioned by you or maybe did you mean opposite case? If yes, I believe that above pseudo code will handle such case too.

> 
> >
> >>>> Javier
> >>>
> >>> Thanks,
> >>> Marcin
> >>
> >> Javier
> > Thanks,
> > Marcin
Thanks!,
Marcin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-06-04 17:17               ` Dziegielewski, Marcin
  (?)
@ 2018-06-05  7:12               ` Javier Gonzalez
  2018-06-05  9:18                 ` Dziegielewski, Marcin
  -1 siblings, 1 reply; 43+ messages in thread
From: Javier Gonzalez @ 2018-06-05  7:12 UTC (permalink / raw)
  To: Dziegielewski, Marcin
  Cc: Matias Bjørling, Jens Axboe, linux-block, linux-kernel,
	Konopko, Igor J

> On 4 Jun 2018, at 19.17, Dziegielewski, Marcin <marcin.dziegielewski@intel.com> wrote:
> 
>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>> Sent: Monday, June 4, 2018 1:16 PM
>> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
>> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>; linux-
>> block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko, Igor J
>> <igor.j.konopko@intel.com>
>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
>> equals to 0
>> 
>> 
>>> On 4 Jun 2018, at 13.11, Dziegielewski, Marcin
>> <marcin.dziegielewski@intel.com> wrote:
>>>> -----Original Message-----
>>>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>>>> Sent: Monday, June 4, 2018 12:22 PM
>>>> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
>>>> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>;
>>>> linux- block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko,
>>>> Igor J <igor.j.konopko@intel.com>
>>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
>>>> mw_cunits equals to 0
>>>> 
>>>>> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin
>>>> <marcin.dziegielewski@intel.com> wrote:
>>>>> Frist of all I want to say sorry for late response - I was on holiday.
>>>>> 
>>>>>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
>>>>>> Sent: Monday, May 28, 2018 1:03 PM
>>>>>> To: Matias Bjørling <mb@lightnvm.io>
>>>>>> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org; linux-
>>>>>> kernel@vger.kernel.org; Dziegielewski, Marcin
>>>>>> <marcin.dziegielewski@intel.com>; Konopko, Igor J
>>>>>> <igor.j.konopko@intel.com>
>>>>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
>>>>>> mw_cunits equals to 0
>>>>>> 
>>>>>>> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
>>>>>>> 
>>>>>>> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
>>>>>>> 
>>>>>>> Some devices can expose mw_cunits equal to 0, it can cause
>>>>>>> creation of too small write buffer and cause performance to drop
>>>>>>> on write workloads.
>>>>>>> 
>>>>>>> To handle that, we use the default value for MLC and beacause it
>>>>>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
>>>>>>> nvme_nvm_setup_12 function isn't longer necessary.
>>>>>>> 
>>>>>>> Signed-off-by: Marcin Dziegielewski
>>>>>>> <marcin.dziegielewski@intel.com>
>>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>>>> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
>>>>>>> ---
>>>>>>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
>>>>>>> drivers/nvme/host/lightnvm.c |  1 -
>>>>>>> 2 files changed, 9 insertions(+), 2 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/drivers/lightnvm/pblk-init.c
>>>>>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b
>>>>>>> 100644
>>>>>>> --- a/drivers/lightnvm/pblk-init.c
>>>>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>>>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
>>>>>>> 	atomic64_set(&pblk->nr_flush, 0);
>>>>>>> 	pblk->nr_flush_rst = 0;
>>>>>>> 
>>>>>>> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>>>>>> +	if (geo->mw_cunits) {
>>>>>>> +		pblk->pgs_in_buffer = geo->mw_cunits * geo-
>>> all_luns;
>>>>>>> +	} else {
>>>>>>> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo-
>>> all_luns;
>>>>>>> +		/*
>>>>>>> +		 * Some devices can expose mw_cunits equal to 0, so
>> let's
>>>>>> use
>>>>>>> +		 * here default safe value for MLC.
>>>>>>> +		 */
>>>>>>> +	}
>>>>>>> 
>>>>>>> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>>>>>>> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
>>>>>>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>>>>> index
>>>>>>> 41279da799ed..c747792da915 100644
>>>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
>>>>>> nvme_nvm_id12
>>>>>>> *id,
>>>>>>> 
>>>>>>> 	geo->ws_min = sec_per_pg;
>>>>>>> 	geo->ws_opt = sec_per_pg;
>>>>>>> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC
>> safe values
>>>>>> */
>>>>>>> /* Do not impose values for maximum number of open blocks as it is
>>>>>>> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
>>>>>>> eventually
>>>>>>> --
>>>>>>> 2.11.0
>>>>>> 
>>>>>> By doing this, 1.2 future users (beyond pblk), will fail to have a
>>>>>> valid mw_cunits value. It's ok to deal with the 0 case in pblk, but
>>>>>> I believe that we should have the default value for 1.2 either way.
>>>>> 
>>>>> I'm not sure. From my understanding, setting of default value was
>>>>> workaround for pblk case, am I right ?.
>>>> 
>>>> The default value covers the MLC case directly at the lightnvm layer,
>>>> as opposed to doing it directly in pblk. Since pblk is the only user
>>>> now, you can argue that all changes in the lightnvm layer are to
>>>> solve pblk issues, but the idea is that the geometry should be generic.
>>>> 
>>>>> In my opinion any user of 1.2
>>>>> spec should be aware that there is not mw_cunit value. From my point
>>>>> of view, leaving here 0 (and decision what do with it to lightnvm
>>>>> user) is more safer way, but maybe I'm wrong. I believe that it is
>>>>> topic to wider discussion with maintainers.
>>>> 
>>>> 1.2 and 2.0 have different geometries, but when we designed the
>>>> common nvm_geo structure, the idea was to abstract both specs and
>>>> allow the upper layers to use the geometry transparently.
>>>> 
>>>> Specifically in pblk, I would prefer to keep it in such a way that we
>>>> don't need to media specific policies (e.g., set default values for
>>>> MLC memories), as a general design principle. We already do some
>>>> geometry version checks to avoid dereferencing unnecessary pointers
>>>> on the fast path, which I would eventually like to remove.
>>> 
>>> Ok, now I understand your point of view and agree with that, I will
>>> prepare second version of this patch without this change.
>> 
>> Sounds good.
>> 
>>> Thanks for
>>> the clarification.
>> 
>> Sure :)
>> 
>>>>>> A more generic way of doing this would be to have a default value
>>>>>> for
>>>>>> 2.0 too, in case mw_cunits is reported as 0.
>>>>> 
>>>>> Since 0 is correct value and users can make different decisions
>>>>> based on it, I think we shouldn't overwrite it by default value. Is
>>>>> it make sense?
>>>> 
>>>> Here I meant at a pblk level - I should have specified it. At the
>>>> geometry level, we should not change it.
>>>> 
>>>> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In
>>>> this case, we still need a host side buffer to serve < ws_min I/Os,
>>>> even though the device does not require the buffer to guarantee reads.
>>> 
>>> Oh, ok now we are on the same page. In this patch I was trying to
>>> address such case. Do you have other idea how to do it or here are you
>>> thinking only on value of default variable?
>> 
>> If doing this, I guess that something in the line of what you did with
>> increasing the size of the write buffer via a module parameter. For example,
>> checking if the size of the write buffer based on mw_cuints is enough to
>> cover ws_min, which normally would only be an issue when mw_cuints == 0
>> or when the number of PUs used for the pblk instance is very small and
>> mw_cuints < nr_luns * ws_min.
> 
> 
> I see here two cases:
> - when mw_cunits > 0 buffer size should have number of entries at
> least max(mw_cunits, ws_min) * nr_luns and here we are taking care of
> both cases mw_cunits > ws_min and mw_cunits < ws_min.
> - when mw_cunit == 0 buffer size should have number of entries at
> least ws_min * nr _luns and we can use the same puseudocode as above.
> 

Agree.

> Do you see any other case? Could you clarify second case mentioned by
> you or maybe did you mean opposite case? If yes, I believe that above
> pseudo code will handle such case too.
> 

Yes, it is the same case.

One thing to consider is whether the buffer should at least be ws_opt *
nr_luns for performance reasons. Since the write thread will always try
to send ws_opt, in the case that ws_opt > ws_min, then a buffer size of
ws_min * nr_luns will not make use of the whole parallelism exposed by
the device.

Therefore, I would probably go for ws_opt * nr_luns as the default value
when mw_cuints * nr_luns < ws_opt * nr_luns (which covers mw_cuints ==
0), and then keep ws_min * nr_luns as the minimum requirement when
setting the buffer size manually.

Does this cover your use case?

>>>>>> Javier
>>>>> 
>>>>> Thanks,
>>>>> Marcin
>>>> 
>>>> Javier
>>> Thanks,
>>> Marcin
> Thanks!,
> Marcin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
  2018-06-05  7:12               ` Javier Gonzalez
@ 2018-06-05  9:18                 ` Dziegielewski, Marcin
  0 siblings, 0 replies; 43+ messages in thread
From: Dziegielewski, Marcin @ 2018-06-05  9:18 UTC (permalink / raw)
  To: Javier Gonzalez
  Cc: Matias Bjorling, Jens Axboe, linux-block, linux-kernel, Konopko, Igor J

> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> Sent: Tuesday, June 5, 2018 9:12 AM
> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>; linux-
> block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko, Igor J
> <igor.j.konopko@intel.com>
> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
> equals to 0
> 
> > On 4 Jun 2018, at 19.17, Dziegielewski, Marcin
> <marcin.dziegielewski@intel.com> wrote:
> >
> >> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> >> Sent: Monday, June 4, 2018 1:16 PM
> >> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
> >> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>;
> >> linux- block@vger.kernel.org; linux-kernel@vger.kernel.org; Konopko,
> >> Igor J <igor.j.konopko@intel.com>
> >> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >> mw_cunits equals to 0
> >>
> >>
> >>> On 4 Jun 2018, at 13.11, Dziegielewski, Marcin
> >> <marcin.dziegielewski@intel.com> wrote:
> >>>> -----Original Message-----
> >>>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> >>>> Sent: Monday, June 4, 2018 12:22 PM
> >>>> To: Dziegielewski, Marcin <marcin.dziegielewski@intel.com>
> >>>> Cc: Matias Bjørling <mb@lightnvm.io>; Jens Axboe <axboe@fb.com>;
> >>>> linux- block@vger.kernel.org; linux-kernel@vger.kernel.org;
> >>>> Konopko, Igor J <igor.j.konopko@intel.com>
> >>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >>>> mw_cunits equals to 0
> >>>>
> >>>>> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin
> >>>> <marcin.dziegielewski@intel.com> wrote:
> >>>>> Frist of all I want to say sorry for late response - I was on holiday.
> >>>>>
> >>>>>> From: Javier Gonzalez [mailto:javier@cnexlabs.com]
> >>>>>> Sent: Monday, May 28, 2018 1:03 PM
> >>>>>> To: Matias Bjørling <mb@lightnvm.io>
> >>>>>> Cc: Jens Axboe <axboe@fb.com>; linux-block@vger.kernel.org;
> >>>>>> linux- kernel@vger.kernel.org; Dziegielewski, Marcin
> >>>>>> <marcin.dziegielewski@intel.com>; Konopko, Igor J
> >>>>>> <igor.j.konopko@intel.com>
> >>>>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >>>>>> mw_cunits equals to 0
> >>>>>>
> >>>>>>> On 28 May 2018, at 10.58, Matias Bjørling <mb@lightnvm.io> wrote:
> >>>>>>>
> >>>>>>> From: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
> >>>>>>>
> >>>>>>> Some devices can expose mw_cunits equal to 0, it can cause
> >>>>>>> creation of too small write buffer and cause performance to drop
> >>>>>>> on write workloads.
> >>>>>>>
> >>>>>>> To handle that, we use the default value for MLC and beacause it
> >>>>>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits
> >>>>>>> in
> >>>>>>> nvme_nvm_setup_12 function isn't longer necessary.
> >>>>>>>
> >>>>>>> Signed-off-by: Marcin Dziegielewski
> >>>>>>> <marcin.dziegielewski@intel.com>
> >>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> >>>>>>> Signed-off-by: Matias Bjørling <mb@lightnvm.io>
> >>>>>>> ---
> >>>>>>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
> >>>>>>> drivers/nvme/host/lightnvm.c |  1 -
> >>>>>>> 2 files changed, 9 insertions(+), 2 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/lightnvm/pblk-init.c
> >>>>>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b
> >>>>>>> 100644
> >>>>>>> --- a/drivers/lightnvm/pblk-init.c
> >>>>>>> +++ b/drivers/lightnvm/pblk-init.c
> >>>>>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
> >>>>>>> 	atomic64_set(&pblk->nr_flush, 0);
> >>>>>>> 	pblk->nr_flush_rst = 0;
> >>>>>>>
> >>>>>>> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> >>>>>>> +	if (geo->mw_cunits) {
> >>>>>>> +		pblk->pgs_in_buffer = geo->mw_cunits * geo-
> >>> all_luns;
> >>>>>>> +	} else {
> >>>>>>> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo-
> >>> all_luns;
> >>>>>>> +		/*
> >>>>>>> +		 * Some devices can expose mw_cunits equal to 0, so
> >> let's
> >>>>>> use
> >>>>>>> +		 * here default safe value for MLC.
> >>>>>>> +		 */
> >>>>>>> +	}
> >>>>>>>
> >>>>>>> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs /
> PAGE_SIZE);
> >>>>>>> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff
> >>>>>>> --git a/drivers/nvme/host/lightnvm.c
> >>>>>>> b/drivers/nvme/host/lightnvm.c index
> >>>>>>> 41279da799ed..c747792da915 100644
> >>>>>>> --- a/drivers/nvme/host/lightnvm.c
> >>>>>>> +++ b/drivers/nvme/host/lightnvm.c
> >>>>>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
> >>>>>> nvme_nvm_id12
> >>>>>>> *id,
> >>>>>>>
> >>>>>>> 	geo->ws_min = sec_per_pg;
> >>>>>>> 	geo->ws_opt = sec_per_pg;
> >>>>>>> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC
> >> safe values
> >>>>>> */
> >>>>>>> /* Do not impose values for maximum number of open blocks as it
> is
> >>>>>>> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
> >>>>>>> eventually
> >>>>>>> --
> >>>>>>> 2.11.0
> >>>>>>
> >>>>>> By doing this, 1.2 future users (beyond pblk), will fail to have
> >>>>>> a valid mw_cunits value. It's ok to deal with the 0 case in pblk,
> >>>>>> but I believe that we should have the default value for 1.2 either
> way.
> >>>>>
> >>>>> I'm not sure. From my understanding, setting of default value was
> >>>>> workaround for pblk case, am I right ?.
> >>>>
> >>>> The default value covers the MLC case directly at the lightnvm
> >>>> layer, as opposed to doing it directly in pblk. Since pblk is the
> >>>> only user now, you can argue that all changes in the lightnvm layer
> >>>> are to solve pblk issues, but the idea is that the geometry should be
> generic.
> >>>>
> >>>>> In my opinion any user of 1.2
> >>>>> spec should be aware that there is not mw_cunit value. From my
> >>>>> point of view, leaving here 0 (and decision what do with it to
> >>>>> lightnvm
> >>>>> user) is more safer way, but maybe I'm wrong. I believe that it is
> >>>>> topic to wider discussion with maintainers.
> >>>>
> >>>> 1.2 and 2.0 have different geometries, but when we designed the
> >>>> common nvm_geo structure, the idea was to abstract both specs and
> >>>> allow the upper layers to use the geometry transparently.
> >>>>
> >>>> Specifically in pblk, I would prefer to keep it in such a way that
> >>>> we don't need to media specific policies (e.g., set default values
> >>>> for MLC memories), as a general design principle. We already do
> >>>> some geometry version checks to avoid dereferencing unnecessary
> >>>> pointers on the fast path, which I would eventually like to remove.
> >>>
> >>> Ok, now I understand your point of view and agree with that, I will
> >>> prepare second version of this patch without this change.
> >>
> >> Sounds good.
> >>
> >>> Thanks for
> >>> the clarification.
> >>
> >> Sure :)
> >>
> >>>>>> A more generic way of doing this would be to have a default value
> >>>>>> for
> >>>>>> 2.0 too, in case mw_cunits is reported as 0.
> >>>>>
> >>>>> Since 0 is correct value and users can make different decisions
> >>>>> based on it, I think we shouldn't overwrite it by default value.
> >>>>> Is it make sense?
> >>>>
> >>>> Here I meant at a pblk level - I should have specified it. At the
> >>>> geometry level, we should not change it.
> >>>>
> >>>> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0.
> >>>> In this case, we still need a host side buffer to serve < ws_min
> >>>> I/Os, even though the device does not require the buffer to guarantee
> reads.
> >>>
> >>> Oh, ok now we are on the same page. In this patch I was trying to
> >>> address such case. Do you have other idea how to do it or here are
> >>> you thinking only on value of default variable?
> >>
> >> If doing this, I guess that something in the line of what you did
> >> with increasing the size of the write buffer via a module parameter.
> >> For example, checking if the size of the write buffer based on
> >> mw_cuints is enough to cover ws_min, which normally would only be an
> >> issue when mw_cuints == 0 or when the number of PUs used for the pblk
> >> instance is very small and mw_cuints < nr_luns * ws_min.
> >
> >
> > I see here two cases:
> > - when mw_cunits > 0 buffer size should have number of entries at
> > least max(mw_cunits, ws_min) * nr_luns and here we are taking care of
> > both cases mw_cunits > ws_min and mw_cunits < ws_min.
> > - when mw_cunit == 0 buffer size should have number of entries at
> > least ws_min * nr _luns and we can use the same puseudocode as above.
> >
> 
> Agree.
> 
> > Do you see any other case? Could you clarify second case mentioned by
> > you or maybe did you mean opposite case? If yes, I believe that above
> > pseudo code will handle such case too.
> >
> 
> Yes, it is the same case.
> 
> One thing to consider is whether the buffer should at least be ws_opt *
> nr_luns for performance reasons. Since the write thread will always try to
> send ws_opt, in the case that ws_opt > ws_min, then a buffer size of
> ws_min * nr_luns will not make use of the whole parallelism exposed by the
> device.
> 

Agree. After I had sent last email I also thought  that should be ws_opt instead of ws_min.

> Therefore, I would probably go for ws_opt * nr_luns as the default value
> when mw_cuints * nr_luns < ws_opt * nr_luns (which covers mw_cuints ==
> 0), and then keep ws_min * nr_luns as the minimum requirement when
> setting the buffer size manually.

Sounds good.
> 
> Does this cover your use case?
> 

Yes, I think that cover all cases related to this topic. I will prepare patch in the afternoon.

Many thanks for perfect cooperation!
Marcin
> >>>>>> Javier
> >>>>>
> >>>>> Thanks,
> >>>>> Marcin
> >>>>
> >>>> Javier
> >>> Thanks,
> >>> Marcin
> > Thanks!,
> > Marcin

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2018-06-05  9:18 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-28  8:58 [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 01/20] lightnvm: pblk: fail gracefully on line alloc. failure Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 02/20] lightnvm: pblk: recheck for bad lines at runtime Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 03/20] lightnvm: pblk: check read lba on gc path Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 04/20] lightnvm: pblk: improve error msg on corrupted LBAs Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 05/20] lightnvm: pblk: warn in case of corrupted write buffer Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 06/20] lightnvm: pblk: return NVM_ error on failed submission Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 07/20] lightnvm: pblk: remove unnecessary indirection Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 08/20] lightnvm: pblk: remove unnecessary argument Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 09/20] lightnvm: pblk: check for chunk size before allocating it Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 10/20] lightnvm: pass flag on graceful teardown to targets Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 11/20] lightnvm: pblk: remove dead function Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 12/20] lightnvm: pblk: rework write error recovery path Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 13/20] lightnvm: pblk: garbage collect lines with failed writes Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 14/20] lightnvm: pblk: fix smeta write error path Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 15/20] lightnvm: proper error handling for pblk_bio_add_pages Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 16/20] lightnvm: error handling when whole line is bad Matias Bjørling
2018-05-28 10:59   ` Javier Gonzalez
2018-05-29 13:15     ` Konopko, Igor J
2018-05-29 18:29       ` Javier Gonzalez
2018-05-28  8:58 ` [GIT PULL 17/20] lightnvm: fix partial read error path Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0 Matias Bjørling
2018-05-28 11:02   ` Javier Gonzalez
2018-06-04 10:09     ` Dziegielewski, Marcin
2018-06-04 10:09       ` Dziegielewski, Marcin
2018-06-04 10:21       ` Javier Gonzalez
2018-06-04 10:21         ` Javier Gonzalez
2018-06-04 11:11         ` Dziegielewski, Marcin
2018-06-04 11:11           ` Dziegielewski, Marcin
2018-06-04 11:15           ` Javier Gonzalez
2018-06-04 11:15             ` Javier Gonzalez
2018-06-04 17:17             ` Dziegielewski, Marcin
2018-06-04 17:17               ` Dziegielewski, Marcin
2018-06-05  7:12               ` Javier Gonzalez
2018-06-05  9:18                 ` Dziegielewski, Marcin
2018-05-28  8:58 ` [GIT PULL 19/20] lightnvm: pblk: add possibility to set write buffer size manually Matias Bjørling
2018-05-28  8:58 ` [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC Matias Bjørling
2018-05-28 10:51   ` Javier Gonzalez
2018-05-29 13:07     ` Konopko, Igor J
2018-05-29 17:58       ` Javier Gonzalez
2018-06-01 10:45 ` [GIT PULL 00/20] lightnvm updates for 4.18 Matias Bjørling
2018-06-01 12:36   ` Jens Axboe
2018-06-01 12:36     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.