All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] lightnvm: More flexible approach to metadata
@ 2018-06-15 22:27 Igor Konopko
  2018-06-15 22:27 ` [PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata Igor Konopko
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Igor Konopko @ 2018-06-15 22:27 UTC (permalink / raw)
  To: mb, javier; +Cc: linux-block, michal.sorn, marcin.dziegielewski, igor.j.konopko

This series of patches introduce some more flexibility in pblk
related to OOB meta:
-ability to use different sizes of metadata (previously fixed 16b)
-ability to use pblk on drives without metadata
-ensuring that extended (interleaved) metadata is not in use

I belive that most of this patches, maybe except of number 4 (Support
for packed metadata) are rather simple, so waiting for comments
especially about this one.

Igor Konopko (5):
  lightnvm: pblk: Helpers for OOB metadata
  lightnvm: pblk: Remove resv field for sec meta
  lightnvm: Flexible DMA pool entry size
  lightnvm: pblk: Support for packed metadata in pblk.
  lightnvm: pblk: Disable interleaved metadata in pblk

 drivers/lightnvm/core.c          | 33 ++++++++++-----
 drivers/lightnvm/pblk-core.c     | 86 +++++++++++++++++++++++++++++++---------
 drivers/lightnvm/pblk-init.c     | 52 +++++++++++++++++++++++-
 drivers/lightnvm/pblk-map.c      | 21 ++++++----
 drivers/lightnvm/pblk-rb.c       |  3 ++
 drivers/lightnvm/pblk-read.c     | 85 +++++++++++++++++++++++++--------------
 drivers/lightnvm/pblk-recovery.c | 67 +++++++++++++++++++++----------
 drivers/lightnvm/pblk-sysfs.c    |  7 ++++
 drivers/lightnvm/pblk-write.c    | 22 ++++++----
 drivers/lightnvm/pblk.h          | 46 +++++++++++++++++++--
 drivers/nvme/host/lightnvm.c     |  7 +++-
 include/linux/lightnvm.h         |  9 +++--
 12 files changed, 333 insertions(+), 105 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata
  2018-06-15 22:27 [PATCH 0/5] lightnvm: More flexible approach to metadata Igor Konopko
@ 2018-06-15 22:27 ` Igor Konopko
  2018-06-16 19:24   ` Matias Bjørling
  2018-06-18 14:23   ` Javier Gonzalez
  2018-06-15 22:27 ` [PATCH 2/5] lightnvm: pblk: Remove resv field for sec meta Igor Konopko
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Igor Konopko @ 2018-06-15 22:27 UTC (permalink / raw)
  To: mb, javier; +Cc: linux-block, michal.sorn, marcin.dziegielewski, igor.j.konopko

Currently pblk assumes that size of OOB metadata on drive is always
equal to size of pblk_sec_meta struct. This commit add helpers which will
allow to handle different sizes of OOB metadata on drive.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c     | 10 +++++----
 drivers/lightnvm/pblk-map.c      | 21 ++++++++++++-------
 drivers/lightnvm/pblk-read.c     | 45 +++++++++++++++++++++++++---------------
 drivers/lightnvm/pblk-recovery.c | 24 ++++++++++++---------
 drivers/lightnvm/pblk.h          | 29 ++++++++++++++++++++++++++
 5 files changed, 91 insertions(+), 38 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 66ab1036f2fb..8a0ac466872f 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -685,7 +685,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
 	rqd.nr_ppas = rq_ppas;
 
 	if (dir == PBLK_WRITE) {
-		struct pblk_sec_meta *meta_list = rqd.meta_list;
+		void *meta_list = rqd.meta_list;
 
 		rqd.flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
 		for (i = 0; i < rqd.nr_ppas; ) {
@@ -693,7 +693,8 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
 			paddr = __pblk_alloc_page(pblk, line, min);
 			spin_unlock(&line->lock);
 			for (j = 0; j < min; j++, i++, paddr++) {
-				meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
+				pblk_get_meta_at(pblk, meta_list, i)->lba =
+					cpu_to_le64(ADDR_EMPTY);
 				rqd.ppa_list[i] =
 					addr_to_gen_ppa(pblk, paddr, id);
 			}
@@ -825,14 +826,15 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
 	rqd.nr_ppas = lm->smeta_sec;
 
 	for (i = 0; i < lm->smeta_sec; i++, paddr++) {
-		struct pblk_sec_meta *meta_list = rqd.meta_list;
+		void *meta_list = rqd.meta_list;
 
 		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
 
 		if (dir == PBLK_WRITE) {
 			__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
 
-			meta_list[i].lba = lba_list[paddr] = addr_empty;
+			pblk_get_meta_at(pblk, meta_list, i)->lba =
+				lba_list[paddr] = addr_empty;
 		}
 	}
 
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 953ca31dda68..92c40b546c4e 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -21,7 +21,7 @@
 static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
 			      struct ppa_addr *ppa_list,
 			      unsigned long *lun_bitmap,
-			      struct pblk_sec_meta *meta_list,
+			      void *meta_list,
 			      unsigned int valid_secs)
 {
 	struct pblk_line *line = pblk_line_get_data(pblk);
@@ -67,14 +67,17 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
 			kref_get(&line->ref);
 			w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
 			w_ctx->ppa = ppa_list[i];
-			meta_list[i].lba = cpu_to_le64(w_ctx->lba);
+			pblk_get_meta_at(pblk, meta_list, i)->lba =
+							cpu_to_le64(w_ctx->lba);
 			lba_list[paddr] = cpu_to_le64(w_ctx->lba);
 			if (lba_list[paddr] != addr_empty)
 				line->nr_valid_lbas++;
 			else
 				atomic64_inc(&pblk->pad_wa);
 		} else {
-			lba_list[paddr] = meta_list[i].lba = addr_empty;
+			lba_list[paddr] =
+				pblk_get_meta_at(pblk, meta_list, i)->lba =
+					addr_empty;
 			__pblk_map_invalidate(pblk, line, paddr);
 		}
 	}
@@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
 		 unsigned long *lun_bitmap, unsigned int valid_secs,
 		 unsigned int off)
 {
-	struct pblk_sec_meta *meta_list = rqd->meta_list;
+	void *meta_list = rqd->meta_list;
 	unsigned int map_secs;
 	int min = pblk->min_write_pgs;
 	int i;
@@ -95,7 +98,9 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
 	for (i = off; i < rqd->nr_ppas; i += min) {
 		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
 		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
-					lun_bitmap, &meta_list[i], map_secs)) {
+					lun_bitmap,
+					pblk_get_meta_at(pblk, meta_list, i),
+					map_secs)) {
 			bio_put(rqd->bio);
 			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
 			pblk_pipeline_stop(pblk);
@@ -111,7 +116,7 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct nvm_geo *geo = &dev->geo;
 	struct pblk_line_meta *lm = &pblk->lm;
-	struct pblk_sec_meta *meta_list = rqd->meta_list;
+	void *meta_list = rqd->meta_list;
 	struct pblk_line *e_line, *d_line;
 	unsigned int map_secs;
 	int min = pblk->min_write_pgs;
@@ -120,7 +125,9 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
 	for (i = 0; i < rqd->nr_ppas; i += min) {
 		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
 		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
-					lun_bitmap, &meta_list[i], map_secs)) {
+					lun_bitmap,
+					pblk_get_meta_at(pblk, meta_list, i),
+					map_secs)) {
 			bio_put(rqd->bio);
 			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
 			pblk_pipeline_stop(pblk);
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 6e93c489ce57..81cf79ea2dc6 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -42,7 +42,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
 				 struct bio *bio, sector_t blba,
 				 unsigned long *read_bitmap)
 {
-	struct pblk_sec_meta *meta_list = rqd->meta_list;
+	void *meta_list = rqd->meta_list;
 	struct ppa_addr ppas[PBLK_MAX_REQ_ADDRS];
 	int nr_secs = rqd->nr_ppas;
 	bool advanced_bio = false;
@@ -57,7 +57,8 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
 retry:
 		if (pblk_ppa_empty(p)) {
 			WARN_ON(test_and_set_bit(i, read_bitmap));
-			meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
+			pblk_get_meta_at(pblk, meta_list, i)->lba =
+					cpu_to_le64(ADDR_EMPTY);
 
 			if (unlikely(!advanced_bio)) {
 				bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
@@ -77,7 +78,8 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
 				goto retry;
 			}
 			WARN_ON(test_and_set_bit(i, read_bitmap));
-			meta_list[i].lba = cpu_to_le64(lba);
+			pblk_get_meta_at(pblk, meta_list, i)->lba =
+					cpu_to_le64(lba);
 			advanced_bio = true;
 #ifdef CONFIG_NVM_PBLK_DEBUG
 			atomic_long_inc(&pblk->cache_reads);
@@ -106,13 +108,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
 static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
 				sector_t blba)
 {
-	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
+	void *meta_lba_list = rqd->meta_list;
 	int nr_lbas = rqd->nr_ppas;
 	int i;
 
-	for (i = 0; i < nr_lbas; i++) {
-		u64 lba = le64_to_cpu(meta_lba_list[i].lba);
+	if (!pblk_is_oob_meta_supported(pblk))
+		return;
 
+	for (i = 0; i < nr_lbas; i++) {
+		u64 lba = le64_to_cpu(
+				pblk_get_meta_at(pblk, meta_lba_list, i)->lba);
 		if (lba == ADDR_EMPTY)
 			continue;
 
@@ -136,17 +141,21 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
 static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
 				 u64 *lba_list, int nr_lbas)
 {
-	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
+	void *meta_lba_list = rqd->meta_list;
 	int i, j;
 
-	for (i = 0, j = 0; i < nr_lbas; i++) {
+	if (!pblk_is_oob_meta_supported(pblk))
+		return;
+
+	for (i = 0; j = 0, i < nr_lbas; i++) {
 		u64 lba = lba_list[i];
 		u64 meta_lba;
 
 		if (lba == ADDR_EMPTY)
 			continue;
 
-		meta_lba = le64_to_cpu(meta_lba_list[j].lba);
+		meta_lba = le64_to_cpu(
+				pblk_get_meta_at(pblk, meta_lba_list, i)->lba);
 
 		if (lba != meta_lba) {
 #ifdef CONFIG_NVM_PBLK_DEBUG
@@ -235,7 +244,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
 			     struct bio *orig_bio, unsigned int bio_init_idx,
 			     unsigned long *read_bitmap)
 {
-	struct pblk_sec_meta *meta_list = rqd->meta_list;
+	void *meta_list = rqd->meta_list;
 	struct bio *new_bio;
 	struct bio_vec src_bv, dst_bv;
 	void *ppa_ptr = NULL;
@@ -261,7 +270,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
 	}
 
 	for (i = 0; i < nr_secs; i++)
-		lba_list_mem[i] = meta_list[i].lba;
+		lba_list_mem[i] = pblk_get_meta_at(pblk, meta_list, i)->lba;
 
 	new_bio->bi_iter.bi_sector = 0; /* internal bio */
 	bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
@@ -300,8 +309,8 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
 	}
 
 	for (i = 0; i < nr_secs; i++) {
-		lba_list_media[i] = meta_list[i].lba;
-		meta_list[i].lba = lba_list_mem[i];
+		lba_list_media[i] = pblk_get_meta_at(pblk, meta_list, i)->lba;
+		pblk_get_meta_at(pblk, meta_list, i)->lba = lba_list_mem[i];
 	}
 
 	/* Fill the holes in the original bio */
@@ -313,7 +322,8 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
 
 		kref_put(&line->ref, pblk_line_put);
 
-		meta_list[hole].lba = lba_list_media[i];
+		pblk_get_meta_at(pblk, meta_list, hole)->lba =
+						lba_list_media[i];
 
 		src_bv = new_bio->bi_io_vec[i++];
 		dst_bv = orig_bio->bi_io_vec[bio_init_idx + hole];
@@ -354,7 +364,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
 static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
 			 sector_t lba, unsigned long *read_bitmap)
 {
-	struct pblk_sec_meta *meta_list = rqd->meta_list;
+	void *meta_list = rqd->meta_list;
 	struct ppa_addr ppa;
 
 	pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
@@ -366,7 +376,8 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
 retry:
 	if (pblk_ppa_empty(ppa)) {
 		WARN_ON(test_and_set_bit(0, read_bitmap));
-		meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
+		pblk_get_meta_at(pblk, meta_list, 0)->lba =
+						cpu_to_le64(ADDR_EMPTY);
 		return;
 	}
 
@@ -380,7 +391,7 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
 		}
 
 		WARN_ON(test_and_set_bit(0, read_bitmap));
-		meta_list[0].lba = cpu_to_le64(lba);
+		pblk_get_meta_at(pblk, meta_list, 0)->lba = cpu_to_le64(lba);
 
 #ifdef CONFIG_NVM_PBLK_DEBUG
 		atomic_long_inc(&pblk->cache_reads);
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index b1a91cb3ca4d..0007e8011476 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -98,7 +98,7 @@ static int pblk_calc_sec_in_line(struct pblk *pblk, struct pblk_line *line)
 
 struct pblk_recov_alloc {
 	struct ppa_addr *ppa_list;
-	struct pblk_sec_meta *meta_list;
+	void *meta_list;
 	struct nvm_rq *rqd;
 	void *data;
 	dma_addr_t dma_ppa_list;
@@ -111,7 +111,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct nvm_geo *geo = &dev->geo;
 	struct ppa_addr *ppa_list;
-	struct pblk_sec_meta *meta_list;
+	void *meta_list;
 	struct nvm_rq *rqd;
 	struct bio *bio;
 	void *data;
@@ -199,7 +199,8 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
 	}
 
 	for (i = 0; i < rqd->nr_ppas; i++) {
-		u64 lba = le64_to_cpu(meta_list[i].lba);
+		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
+							meta_list, i)->lba);
 
 		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
 			continue;
@@ -240,7 +241,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct nvm_geo *geo = &dev->geo;
 	struct ppa_addr *ppa_list;
-	struct pblk_sec_meta *meta_list;
+	void *meta_list;
 	struct pblk_pad_rq *pad_rq;
 	struct nvm_rq *rqd;
 	struct bio *bio;
@@ -332,7 +333,8 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
 			dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
 
 			pblk_map_invalidate(pblk, dev_ppa);
-			lba_list[w_ptr] = meta_list[i].lba = addr_empty;
+			lba_list[w_ptr] = pblk_get_meta_at(pblk,
+						meta_list, i)->lba = addr_empty;
 			rqd->ppa_list[i] = dev_ppa;
 		}
 	}
@@ -389,7 +391,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct nvm_geo *geo = &dev->geo;
 	struct ppa_addr *ppa_list;
-	struct pblk_sec_meta *meta_list;
+	void *meta_list;
 	struct nvm_rq *rqd;
 	struct bio *bio;
 	void *data;
@@ -473,7 +475,8 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
 	if (!rec_round++ && !rqd->error) {
 		rec_round = 0;
 		for (i = 0; i < rqd->nr_ppas; i++, r_ptr++) {
-			u64 lba = le64_to_cpu(meta_list[i].lba);
+			u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
+							meta_list, i)->lba);
 
 			if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
 				continue;
@@ -523,7 +526,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 	struct nvm_tgt_dev *dev = pblk->dev;
 	struct nvm_geo *geo = &dev->geo;
 	struct ppa_addr *ppa_list;
-	struct pblk_sec_meta *meta_list;
+	void *meta_list;
 	struct nvm_rq *rqd;
 	struct bio *bio;
 	void *data;
@@ -619,7 +622,8 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 	}
 
 	for (i = 0; i < rqd->nr_ppas; i++) {
-		u64 lba = le64_to_cpu(meta_list[i].lba);
+		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
+							meta_list, i)->lba);
 
 		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
 			continue;
@@ -641,7 +645,7 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
 	struct nvm_geo *geo = &dev->geo;
 	struct nvm_rq *rqd;
 	struct ppa_addr *ppa_list;
-	struct pblk_sec_meta *meta_list;
+	void *meta_list;
 	struct pblk_recov_alloc p;
 	void *data;
 	dma_addr_t dma_ppa_list, dma_meta_list;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index c072955d72c2..f82c3a0b0de5 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -1420,4 +1420,33 @@ static inline void pblk_setup_uuid(struct pblk *pblk)
 	uuid_le_gen(&uuid);
 	memcpy(pblk->instance_uuid, uuid.b, 16);
 }
+
+static inline int pblk_is_oob_meta_supported(struct pblk *pblk)
+{
+	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
+
+	/* Pblk uses OOB meta to store LBA of given physical sector.
+	 * The LBA is eventually used in recovery mode and/or for handling
+	 * telemetry events (e.g., relocate sector).
+	 */
+
+	return (geo->sos >= sizeof(struct pblk_sec_meta));
+}
+
+static inline struct pblk_sec_meta *pblk_get_meta_at(struct pblk *pblk,
+						void *meta_ptr, int index)
+{
+	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
+
+	if (pblk_is_oob_meta_supported(pblk))
+		/* We need to have oob meta layout the same as on drive */
+		return meta_ptr + geo->sos * index;
+	else
+		/* We can create virtual oob meta layout since drive does
+		 * not have real oob metadata
+		 */
+		return meta_ptr + sizeof(struct pblk_sec_meta) * index;
+}
 #endif /* PBLK_H_ */
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/5] lightnvm: pblk: Remove resv field for sec meta
  2018-06-15 22:27 [PATCH 0/5] lightnvm: More flexible approach to metadata Igor Konopko
  2018-06-15 22:27 ` [PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata Igor Konopko
@ 2018-06-15 22:27 ` Igor Konopko
  2018-06-16 19:27   ` Matias Bjørling
  2018-06-15 22:27 ` [PATCH 3/5] lightnvm: Flexible DMA pool entry size Igor Konopko
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Igor Konopko @ 2018-06-15 22:27 UTC (permalink / raw)
  To: mb, javier; +Cc: linux-block, michal.sorn, marcin.dziegielewski, igor.j.konopko

Since we have flexible size of pblk_sec_meta
which depends on drive metadata size we can
remove not needed reserved field from that
structure

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index f82c3a0b0de5..27658dc6fc1a 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -82,7 +82,6 @@ enum {
 };
 
 struct pblk_sec_meta {
-	u64 reserved;
 	__le64 lba;
 };
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/5] lightnvm: Flexible DMA pool entry size
  2018-06-15 22:27 [PATCH 0/5] lightnvm: More flexible approach to metadata Igor Konopko
  2018-06-15 22:27 ` [PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata Igor Konopko
  2018-06-15 22:27 ` [PATCH 2/5] lightnvm: pblk: Remove resv field for sec meta Igor Konopko
@ 2018-06-15 22:27 ` Igor Konopko
  2018-06-16 19:32   ` Matias Bjørling
  2018-06-15 22:27 ` [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk Igor Konopko
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Igor Konopko @ 2018-06-15 22:27 UTC (permalink / raw)
  To: mb, javier; +Cc: linux-block, michal.sorn, marcin.dziegielewski, igor.j.konopko

Currently whole lightnvm and pblk uses single DMA pool,
for which entry size is always equal to PAGE_SIZE.
PPA list always needs 8b*64, so there is only 56b*64
space for OOB meta. Since NVMe OOB meta can be bigger,
such as 128b, this solution is not robustness.

This patch add the possiblity to support OOB meta above
56b by creating separate DMA pool for PBLK with entry
size which is big enough to store both PPA list and such
a OOB metadata.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/core.c          | 33 ++++++++++++++++++++++++---------
 drivers/lightnvm/pblk-core.c     | 24 +++++++++++++-----------
 drivers/lightnvm/pblk-init.c     |  9 +++++++++
 drivers/lightnvm/pblk-read.c     | 40 +++++++++++++++++++++++++++-------------
 drivers/lightnvm/pblk-recovery.c | 18 ++++++++++--------
 drivers/lightnvm/pblk-write.c    |  8 ++++----
 drivers/lightnvm/pblk.h          | 11 ++++++++++-
 drivers/nvme/host/lightnvm.c     |  6 ++++--
 include/linux/lightnvm.h         |  8 +++++---
 9 files changed, 106 insertions(+), 51 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 60aa7bc5a630..bc8e6ecea083 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -642,20 +642,33 @@ void nvm_unregister_tgt_type(struct nvm_tgt_type *tt)
 }
 EXPORT_SYMBOL(nvm_unregister_tgt_type);
 
-void *nvm_dev_dma_alloc(struct nvm_dev *dev, gfp_t mem_flags,
-							dma_addr_t *dma_handler)
+void *nvm_dev_dma_alloc(struct nvm_dev *dev, void *pool,
+				gfp_t mem_flags, dma_addr_t *dma_handler)
 {
-	return dev->ops->dev_dma_alloc(dev, dev->dma_pool, mem_flags,
-								dma_handler);
+	return dev->ops->dev_dma_alloc(dev, pool ?: dev->dma_pool,
+						mem_flags, dma_handler);
 }
 EXPORT_SYMBOL(nvm_dev_dma_alloc);
 
-void nvm_dev_dma_free(struct nvm_dev *dev, void *addr, dma_addr_t dma_handler)
+void nvm_dev_dma_free(struct nvm_dev *dev, void *pool,
+				void *addr, dma_addr_t dma_handler)
 {
-	dev->ops->dev_dma_free(dev->dma_pool, addr, dma_handler);
+	dev->ops->dev_dma_free(pool ?: dev->dma_pool, addr, dma_handler);
 }
 EXPORT_SYMBOL(nvm_dev_dma_free);
 
+void *nvm_dev_dma_create(struct nvm_dev *dev, int size, char *name)
+{
+	return dev->ops->create_dma_pool(dev, name, size);
+}
+EXPORT_SYMBOL(nvm_dev_dma_create);
+
+void nvm_dev_dma_destroy(struct nvm_dev *dev, void *pool)
+{
+	dev->ops->destroy_dma_pool(pool);
+}
+EXPORT_SYMBOL(nvm_dev_dma_destroy);
+
 static struct nvm_dev *nvm_find_nvm_dev(const char *name)
 {
 	struct nvm_dev *dev;
@@ -683,7 +696,8 @@ static int nvm_set_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd,
 	}
 
 	rqd->nr_ppas = nr_ppas;
-	rqd->ppa_list = nvm_dev_dma_alloc(dev, GFP_KERNEL, &rqd->dma_ppa_list);
+	rqd->ppa_list = nvm_dev_dma_alloc(dev, NULL, GFP_KERNEL,
+						&rqd->dma_ppa_list);
 	if (!rqd->ppa_list) {
 		pr_err("nvm: failed to allocate dma memory\n");
 		return -ENOMEM;
@@ -709,7 +723,8 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev *tgt_dev,
 	if (!rqd->ppa_list)
 		return;
 
-	nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list);
+	nvm_dev_dma_free(tgt_dev->parent, NULL, rqd->ppa_list,
+				rqd->dma_ppa_list);
 }
 
 int nvm_get_chunk_meta(struct nvm_tgt_dev *tgt_dev, struct nvm_chk_meta *meta,
@@ -933,7 +948,7 @@ int nvm_register(struct nvm_dev *dev)
 	if (!dev->q || !dev->ops)
 		return -EINVAL;
 
-	dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist");
+	dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", PAGE_SIZE);
 	if (!dev->dma_pool) {
 		pr_err("nvm: could not create dma pool\n");
 		return -ENOMEM;
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 8a0ac466872f..c092ee93a18d 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -279,7 +279,7 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type)
 	}
 
 	if (rqd->meta_list)
-		nvm_dev_dma_free(dev->parent, rqd->meta_list,
+		nvm_dev_dma_free(dev->parent, pblk->dma_pool, rqd->meta_list,
 				rqd->dma_meta_list);
 	mempool_free(rqd, pool);
 }
@@ -652,13 +652,13 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
 	} else
 		return -EINVAL;
 
-	meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
-							&dma_meta_list);
+	meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
+					GFP_KERNEL, &dma_meta_list);
 	if (!meta_list)
 		return -ENOMEM;
 
-	ppa_list = meta_list + pblk_dma_meta_size;
-	dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
+	ppa_list = meta_list + pblk_dma_meta_size(pblk);
+	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
 next_rq:
 	memset(&rqd, 0, sizeof(struct nvm_rq));
@@ -758,7 +758,8 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
 	if (left_ppas)
 		goto next_rq;
 free_rqd_dma:
-	nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
+	nvm_dev_dma_free(dev->parent, pblk->dma_pool, rqd.meta_list,
+						rqd.dma_meta_list);
 	return ret;
 }
 
@@ -803,13 +804,13 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
 
 	memset(&rqd, 0, sizeof(struct nvm_rq));
 
-	rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
-							&rqd.dma_meta_list);
+	rqd.meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
+						GFP_KERNEL, &rqd.dma_meta_list);
 	if (!rqd.meta_list)
 		return -ENOMEM;
 
-	rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
-	rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
+	rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size(pblk);
+	rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size(pblk);
 
 	bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
 	if (IS_ERR(bio)) {
@@ -861,7 +862,8 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
 	}
 
 free_ppa_list:
-	nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
+	nvm_dev_dma_free(dev->parent, pblk->dma_pool, rqd.meta_list,
+						rqd.dma_meta_list);
 
 	return ret;
 }
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index aa2426403171..f05112230a52 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1142,6 +1142,7 @@ static void pblk_free(struct pblk *pblk)
 	pblk_l2p_free(pblk);
 	pblk_rwb_free(pblk);
 	pblk_core_free(pblk);
+	nvm_dev_dma_destroy(pblk->dev->parent, pblk->dma_pool);
 
 	kfree(pblk);
 }
@@ -1212,6 +1213,13 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
 	pblk->disk = tdisk;
 	pblk->state = PBLK_STATE_RUNNING;
 	pblk->gc.gc_enabled = 0;
+	pblk->dma_pool = nvm_dev_dma_create(dev->parent, (pblk_dma_ppa_size +
+						pblk_dma_meta_size(pblk)),
+						tdisk->disk_name);
+	if (!pblk->dma_pool) {
+		kfree(pblk);
+		return ERR_PTR(-ENOMEM);
+	}
 
 	spin_lock_init(&pblk->resubmit_lock);
 	spin_lock_init(&pblk->trans_lock);
@@ -1312,6 +1320,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
 fail_free_core:
 	pblk_core_free(pblk);
 fail:
+	nvm_dev_dma_destroy(dev->parent, pblk->dma_pool);
 	kfree(pblk);
 	return ERR_PTR(ret);
 }
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 81cf79ea2dc6..9ff4f48c4168 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -255,9 +255,13 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
 	int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
 	int i, ret, hole;
 
-	/* Re-use allocated memory for intermediate lbas */
-	lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
-	lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
+	lba_list_mem = kcalloc(nr_secs, sizeof(__le64), GFP_KERNEL);
+	if (!lba_list_mem)
+		goto err_alloc_mem;
+
+	lba_list_media = kcalloc(nr_secs, sizeof(__le64), GFP_KERNEL);
+	if (!lba_list_media)
+		goto err_alloc_media;
 
 	new_bio = bio_alloc(GFP_KERNEL, nr_holes);
 
@@ -349,6 +353,8 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
 	rqd->bio = NULL;
 	rqd->nr_ppas = nr_secs;
 
+	kfree(lba_list_media);
+	kfree(lba_list_mem);
 	__pblk_end_io_read(pblk, rqd, false);
 	return NVM_IO_DONE;
 
@@ -356,6 +362,10 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
 	/* Free allocated pages in new bio */
 	pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
 fail_add_pages:
+	kfree(lba_list_media);
+err_alloc_media:
+	kfree(lba_list_mem);
+err_alloc_mem:
 	pr_err("pblk: failed to perform partial read\n");
 	__pblk_end_io_read(pblk, rqd, false);
 	return NVM_IO_ERR;
@@ -444,16 +454,17 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
 	 */
 	bio_init_idx = pblk_get_bi_idx(bio);
 
-	rqd->meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
-							&rqd->dma_meta_list);
+	rqd->meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
+					GFP_KERNEL, &rqd->dma_meta_list);
 	if (!rqd->meta_list) {
 		pr_err("pblk: not able to allocate ppa list\n");
 		goto fail_rqd_free;
 	}
 
 	if (nr_secs > 1) {
-		rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
-		rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
+		rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk);
+		rqd->dma_ppa_list = rqd->dma_meta_list +
+					pblk_dma_meta_size(pblk);
 
 		pblk_read_ppalist_rq(pblk, rqd, bio, blba, &read_bitmap);
 	} else {
@@ -578,14 +589,15 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 
 	memset(&rqd, 0, sizeof(struct nvm_rq));
 
-	rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
-							&rqd.dma_meta_list);
+	rqd.meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
+						GFP_KERNEL, &rqd.dma_meta_list);
 	if (!rqd.meta_list)
 		return -ENOMEM;
 
 	if (gc_rq->nr_secs > 1) {
-		rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
-		rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
+		rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size(pblk);
+		rqd.dma_ppa_list = rqd.dma_meta_list +
+					pblk_dma_meta_size(pblk);
 
 		gc_rq->secs_to_gc = read_ppalist_rq_gc(pblk, &rqd, gc_rq->line,
 							gc_rq->lba_list,
@@ -642,12 +654,14 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
 #endif
 
 out:
-	nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
+	nvm_dev_dma_free(dev->parent, pblk->dma_pool, rqd.meta_list,
+						rqd.dma_meta_list);
 	return ret;
 
 err_free_bio:
 	bio_put(bio);
 err_free_dma:
-	nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
+	nvm_dev_dma_free(dev->parent, pblk->dma_pool, rqd.meta_list,
+						rqd.dma_meta_list);
 	return ret;
 }
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 0007e8011476..f5853fc77a0c 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -280,14 +280,15 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
 
 	rq_len = rq_ppas * geo->csecs;
 
-	meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
+	meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
+					 GFP_KERNEL, &dma_meta_list);
 	if (!meta_list) {
 		ret = -ENOMEM;
 		goto fail_free_pad;
 	}
 
-	ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
-	dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
+	ppa_list = (void *)(meta_list) + pblk_dma_meta_size(pblk);
+	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
 	bio = pblk_bio_map_addr(pblk, data, rq_ppas, rq_len,
 						PBLK_VMALLOC_META, GFP_KERNEL);
@@ -373,7 +374,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
 fail_free_bio:
 	bio_put(bio);
 fail_free_meta:
-	nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
+	nvm_dev_dma_free(dev->parent, pblk->dma_pool, meta_list, dma_meta_list);
 fail_free_pad:
 	kfree(pad_rq);
 	vfree(data);
@@ -651,12 +652,13 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
 	dma_addr_t dma_ppa_list, dma_meta_list;
 	int done, ret = 0;
 
-	meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
+	meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
+					 GFP_KERNEL, &dma_meta_list);
 	if (!meta_list)
 		return -ENOMEM;
 
-	ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
-	dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
+	ppa_list = (void *)(meta_list) + pblk_dma_meta_size(pblk);
+	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
 	data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
 	if (!data) {
@@ -693,7 +695,7 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
 out:
 	kfree(data);
 free_meta_list:
-	nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
+	nvm_dev_dma_free(dev->parent, pblk->dma_pool, meta_list, dma_meta_list);
 
 	return ret;
 }
diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index 5f44df999aed..6552db35f916 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -306,13 +306,13 @@ static int pblk_alloc_w_rq(struct pblk *pblk, struct nvm_rq *rqd,
 	rqd->private = pblk;
 	rqd->end_io = end_io;
 
-	rqd->meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
-							&rqd->dma_meta_list);
+	rqd->meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
+					GFP_KERNEL, &rqd->dma_meta_list);
 	if (!rqd->meta_list)
 		return -ENOMEM;
 
-	rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
-	rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
+	rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk);
+	rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size(pblk);
 
 	return 0;
 }
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 27658dc6fc1a..4c61ede5b207 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -98,7 +98,6 @@ enum {
 	PBLK_RL_LOW = 4
 };
 
-#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * PBLK_MAX_REQ_ADDRS)
 #define pblk_dma_ppa_size (sizeof(u64) * PBLK_MAX_REQ_ADDRS)
 
 /* write buffer completion context */
@@ -690,6 +689,7 @@ struct pblk {
 	struct timer_list wtimer;
 
 	struct pblk_gc gc;
+	void *dma_pool;
 };
 
 struct pblk_line_ws {
@@ -1448,4 +1448,13 @@ static inline struct pblk_sec_meta *pblk_get_meta_at(struct pblk *pblk,
 		 */
 		return meta_ptr + sizeof(struct pblk_sec_meta) * index;
 }
+
+static inline int pblk_dma_meta_size(struct pblk *pblk)
+{
+	struct nvm_tgt_dev *dev = pblk->dev;
+	struct nvm_geo *geo = &dev->geo;
+
+	return max(((int)sizeof(struct pblk_sec_meta)), ((int)geo->sos))
+				* PBLK_MAX_REQ_ADDRS;
+}
 #endif /* PBLK_H_ */
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 006d09e0af74..670478abc754 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -729,11 +729,13 @@ static int nvme_nvm_submit_io_sync(struct nvm_dev *dev, struct nvm_rq *rqd)
 	return ret;
 }
 
-static void *nvme_nvm_create_dma_pool(struct nvm_dev *nvmdev, char *name)
+static void *nvme_nvm_create_dma_pool(struct nvm_dev *nvmdev, char *name,
+					int size)
 {
 	struct nvme_ns *ns = nvmdev->q->queuedata;
 
-	return dma_pool_create(name, ns->ctrl->dev, PAGE_SIZE, PAGE_SIZE, 0);
+	size = round_up(size, PAGE_SIZE);
+	return dma_pool_create(name, ns->ctrl->dev, size, PAGE_SIZE, 0);
 }
 
 static void nvme_nvm_destroy_dma_pool(void *pool)
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index e9e0d1c7eaf5..72a55d71917e 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -90,7 +90,7 @@ typedef int (nvm_get_chk_meta_fn)(struct nvm_dev *, struct nvm_chk_meta *,
 								sector_t, int);
 typedef int (nvm_submit_io_fn)(struct nvm_dev *, struct nvm_rq *);
 typedef int (nvm_submit_io_sync_fn)(struct nvm_dev *, struct nvm_rq *);
-typedef void *(nvm_create_dma_pool_fn)(struct nvm_dev *, char *);
+typedef void *(nvm_create_dma_pool_fn)(struct nvm_dev *, char *, int);
 typedef void (nvm_destroy_dma_pool_fn)(void *);
 typedef void *(nvm_dev_dma_alloc_fn)(struct nvm_dev *, void *, gfp_t,
 								dma_addr_t *);
@@ -517,8 +517,10 @@ struct nvm_tgt_type {
 extern int nvm_register_tgt_type(struct nvm_tgt_type *);
 extern void nvm_unregister_tgt_type(struct nvm_tgt_type *);
 
-extern void *nvm_dev_dma_alloc(struct nvm_dev *, gfp_t, dma_addr_t *);
-extern void nvm_dev_dma_free(struct nvm_dev *, void *, dma_addr_t);
+extern void *nvm_dev_dma_alloc(struct nvm_dev *, void *, gfp_t, dma_addr_t *);
+extern void nvm_dev_dma_free(struct nvm_dev *, void *, void *, dma_addr_t);
+extern void *nvm_dev_dma_create(struct nvm_dev *, int, char *);
+extern void nvm_dev_dma_destroy(struct nvm_dev *, void *);
 
 extern struct nvm_dev *nvm_alloc_dev(int);
 extern int nvm_register(struct nvm_dev *);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk.
  2018-06-15 22:27 [PATCH 0/5] lightnvm: More flexible approach to metadata Igor Konopko
                   ` (2 preceding siblings ...)
  2018-06-15 22:27 ` [PATCH 3/5] lightnvm: Flexible DMA pool entry size Igor Konopko
@ 2018-06-15 22:27 ` Igor Konopko
  2018-06-16 19:45   ` Matias Bjørling
  2018-06-19 11:08   ` Javier Gonzalez
  2018-06-15 22:27 ` [PATCH 5/5] lightnvm: pblk: Disable interleaved " Igor Konopko
  2018-06-19 10:28 ` [PATCH 0/5] lightnvm: More flexible approach to metadata Javier Gonzalez
  5 siblings, 2 replies; 25+ messages in thread
From: Igor Konopko @ 2018-06-15 22:27 UTC (permalink / raw)
  To: mb, javier; +Cc: linux-block, michal.sorn, marcin.dziegielewski, igor.j.konopko

In current pblk implementation, l2p mapping for not closed lines
is always stored only in OOB metadata and recovered from it.

Such a solution does not provide data integrity when drives does
not have such a OOB metadata space.

The goal of this patch is to add support for so called packed
metadata, which store l2p mapping for open lines in last sector
of every write unit.

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-core.c     | 52 ++++++++++++++++++++++++++++++++++++----
 drivers/lightnvm/pblk-init.c     | 37 ++++++++++++++++++++++++++--
 drivers/lightnvm/pblk-rb.c       |  3 +++
 drivers/lightnvm/pblk-recovery.c | 25 +++++++++++++++----
 drivers/lightnvm/pblk-sysfs.c    |  7 ++++++
 drivers/lightnvm/pblk-write.c    | 14 +++++++----
 drivers/lightnvm/pblk.h          |  5 +++-
 7 files changed, 128 insertions(+), 15 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index c092ee93a18d..375c6430612e 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -340,7 +340,7 @@ void pblk_write_should_kick(struct pblk *pblk)
 {
 	unsigned int secs_avail = pblk_rb_read_count(&pblk->rwb);
 
-	if (secs_avail >= pblk->min_write_pgs)
+	if (secs_avail >= pblk->min_write_pgs_data)
 		pblk_write_kick(pblk);
 }
 
@@ -371,7 +371,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line)
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
 	struct list_head *move_list = NULL;
-	int vsc = le32_to_cpu(*line->vsc);
+	int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
+			* (pblk->min_write_pgs - pblk->min_write_pgs_data);
+	int vsc = le32_to_cpu(*line->vsc) + packed_meta;
 
 	lockdep_assert_held(&line->lock);
 
@@ -540,12 +542,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
 }
 
 int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
-		   unsigned long secs_to_flush)
+		   unsigned long secs_to_flush, bool skip_meta)
 {
 	int max = pblk->sec_per_write;
 	int min = pblk->min_write_pgs;
 	int secs_to_sync = 0;
 
+	if (skip_meta)
+		min = max = pblk->min_write_pgs_data;
+
 	if (secs_avail >= max)
 		secs_to_sync = max;
 	else if (secs_avail >= min)
@@ -663,7 +668,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
 next_rq:
 	memset(&rqd, 0, sizeof(struct nvm_rq));
 
-	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
+	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
 	rq_len = rq_ppas * geo->csecs;
 
 	bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
@@ -2091,3 +2096,42 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
 	}
 	spin_unlock(&pblk->trans_lock);
 }
+
+void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+	void *meta_list = rqd->meta_list;
+	void *page;
+	int i = 0;
+
+	if (pblk_is_oob_meta_supported(pblk))
+		return;
+
+	/* We need to zero out metadata corresponding to packed meta page */
+	pblk_get_meta_at(pblk, meta_list, rqd->nr_ppas - 1)->lba = ADDR_EMPTY;
+
+	page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+	/* We need to fill last page of request (packed metadata)
+	 * with data from oob meta buffer.
+	 */
+	for (; i < rqd->nr_ppas; i++)
+		memcpy(page + (i * sizeof(struct pblk_sec_meta)),
+			pblk_get_meta_at(pblk, meta_list, i),
+			sizeof(struct pblk_sec_meta));
+}
+
+void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+	void *meta_list = rqd->meta_list;
+	void *page;
+	int i = 0;
+
+	if (pblk_is_oob_meta_supported(pblk))
+		return;
+
+	page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+	/* We need to fill oob meta buffer with data from packed metadata */
+	for (; i < rqd->nr_ppas; i++)
+		memcpy(pblk_get_meta_at(pblk, meta_list, i),
+			page + (i * sizeof(struct pblk_sec_meta)),
+			sizeof(struct pblk_sec_meta));
+}
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index f05112230a52..5eb641da46ed 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -372,8 +372,40 @@ static int pblk_core_init(struct pblk *pblk)
 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
 	max_write_ppas = pblk->min_write_pgs * geo->all_luns;
 	pblk->max_write_pgs = min_t(int, max_write_ppas, NVM_MAX_VLBA);
+	pblk->min_write_pgs_data = pblk->min_write_pgs;
 	pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
 
+	if (!pblk_is_oob_meta_supported(pblk)) {
+		/* For drives which does not have OOB metadata feature
+		 * in order to support recovery feature we need to use
+		 * so called packed metadata. Packed metada will store
+		 * the same information as OOB metadata (l2p table mapping,
+		 * but in the form of the single page at the end of
+		 * every write request.
+		 */
+		if (pblk->min_write_pgs
+			* sizeof(struct pblk_sec_meta) > PAGE_SIZE) {
+			/* We want to keep all the packed metadata on single
+			 * page per write requests. So we need to ensure that
+			 * it will fit.
+			 *
+			 * This is more like sanity check, since there is
+			 * no device with such a big minimal write size
+			 * (above 1 metabytes).
+			 */
+			pr_err("pblk: Not supported min write size\n");
+			return -EINVAL;
+		}
+		/* For packed meta approach we do some simplification.
+		 * On read path we always issue requests which size
+		 * equal to max_write_pgs, with all pages filled with
+		 * user payload except of last one page which will be
+		 * filled with packed metadata.
+		 */
+		pblk->max_write_pgs = pblk->min_write_pgs;
+		pblk->min_write_pgs_data = pblk->min_write_pgs - 1;
+	}
+
 	if (pblk->max_write_pgs > PBLK_MAX_REQ_ADDRS) {
 		pr_err("pblk: vector list too big(%u > %u)\n",
 				pblk->max_write_pgs, PBLK_MAX_REQ_ADDRS);
@@ -668,7 +700,7 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
 	struct pblk_line_meta *lm = &pblk->lm;
 	struct nvm_geo *geo = &dev->geo;
 	sector_t provisioned;
-	int sec_meta, blk_meta;
+	int sec_meta, blk_meta, clba;
 
 	if (geo->op == NVM_TARGET_DEFAULT_OP)
 		pblk->op = PBLK_DEFAULT_OP;
@@ -691,7 +723,8 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
 	sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
 	blk_meta = DIV_ROUND_UP(sec_meta, geo->clba);
 
-	pblk->capacity = (provisioned - blk_meta) * geo->clba;
+	clba = (geo->clba / pblk->min_write_pgs) * pblk->min_write_pgs_data;
+	pblk->capacity = (provisioned - blk_meta) * clba;
 
 	atomic_set(&pblk->rl.free_blocks, nr_free_blks);
 	atomic_set(&pblk->rl.free_user_blocks, nr_free_blks);
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index a81a97e8ea6d..081e73e7978f 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -528,6 +528,9 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
 		to_read = count;
 	}
 
+	/* Add space for packed metadata if in use*/
+	pad += (pblk->min_write_pgs - pblk->min_write_pgs_data);
+
 	c_ctx->sentry = pos;
 	c_ctx->nr_valid = to_read;
 	c_ctx->nr_padded = pad;
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index f5853fc77a0c..0fab18fe30d9 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -138,7 +138,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
 next_read_rq:
 	memset(rqd, 0, pblk_g_rq_size);
 
-	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
+	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
 	if (!rq_ppas)
 		rq_ppas = pblk->min_write_pgs;
 	rq_len = rq_ppas * geo->csecs;
@@ -198,6 +198,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
 		return -EINTR;
 	}
 
+	pblk_get_packed_meta(pblk, rqd);
 	for (i = 0; i < rqd->nr_ppas; i++) {
 		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
 							meta_list, i)->lba);
@@ -272,7 +273,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
 	kref_init(&pad_rq->ref);
 
 next_pad_rq:
-	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
+	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
 	if (rq_ppas < pblk->min_write_pgs) {
 		pr_err("pblk: corrupted pad line %d\n", line->id);
 		goto fail_free_pad;
@@ -418,7 +419,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
 next_rq:
 	memset(rqd, 0, pblk_g_rq_size);
 
-	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
+	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
 	if (!rq_ppas)
 		rq_ppas = pblk->min_write_pgs;
 	rq_len = rq_ppas * geo->csecs;
@@ -475,6 +476,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
 	 */
 	if (!rec_round++ && !rqd->error) {
 		rec_round = 0;
+		pblk_get_packed_meta(pblk, rqd);
 		for (i = 0; i < rqd->nr_ppas; i++, r_ptr++) {
 			u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
 							meta_list, i)->lba);
@@ -492,6 +494,12 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
 		int ret;
 
 		bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
+		if (!pblk_is_oob_meta_supported(pblk) && bit > 0) {
+			/* This case should not happen since we always read in
+			 * the same unit here as we wrote in writer thread.
+			 */
+			pr_err("pblk: Inconsistent packed metadata read\n");
+		}
 		nr_error_bits = rqd->nr_ppas - bit;
 
 		/* Roll back failed sectors */
@@ -550,7 +558,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 next_rq:
 	memset(rqd, 0, pblk_g_rq_size);
 
-	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
+	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
 	if (!rq_ppas)
 		rq_ppas = pblk->min_write_pgs;
 	rq_len = rq_ppas * geo->csecs;
@@ -608,6 +616,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 		int nr_error_bits, bit;
 
 		bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
+		if (!pblk_is_oob_meta_supported(pblk)) {
+			/* For packed metadata we do not handle partially
+			 * written requests here, since metadata is always
+			 * in last page on the requests.
+			 */
+			bit = 0;
+			*done = 0;
+		}
 		nr_error_bits = rqd->nr_ppas - bit;
 
 		/* Roll back failed sectors */
@@ -622,6 +638,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
 			*done = 0;
 	}
 
+	pblk_get_packed_meta(pblk, rqd);
 	for (i = 0; i < rqd->nr_ppas; i++) {
 		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
 							meta_list, i)->lba);
diff --git a/drivers/lightnvm/pblk-sysfs.c b/drivers/lightnvm/pblk-sysfs.c
index b0e5e93a9d5f..aa7b4164ce9e 100644
--- a/drivers/lightnvm/pblk-sysfs.c
+++ b/drivers/lightnvm/pblk-sysfs.c
@@ -473,6 +473,13 @@ static ssize_t pblk_sysfs_set_sec_per_write(struct pblk *pblk,
 	if (kstrtouint(page, 0, &sec_per_write))
 		return -EINVAL;
 
+	if (!pblk_is_oob_meta_supported(pblk)) {
+		/* For packed metadata case it is
+		 * not allowed to change sec_per_write.
+		 */
+		return -EINVAL;
+	}
+
 	if (sec_per_write < pblk->min_write_pgs
 				|| sec_per_write > pblk->max_write_pgs
 				|| sec_per_write % pblk->min_write_pgs != 0)
diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index 6552db35f916..bb45c7e6c375 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -354,7 +354,7 @@ static int pblk_calc_secs_to_sync(struct pblk *pblk, unsigned int secs_avail,
 {
 	int secs_to_sync;
 
-	secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush);
+	secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush, true);
 
 #ifdef CONFIG_NVM_PBLK_DEBUG
 	if ((!secs_to_sync && secs_to_flush)
@@ -522,6 +522,11 @@ static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
 		return NVM_IO_ERR;
 	}
 
+	/* This is the first place when we have write requests mapped
+	 * and we can fill packed metadata with l2p mappings.
+	 */
+	pblk_set_packed_meta(pblk, rqd);
+
 	meta_line = pblk_should_submit_meta_io(pblk, rqd);
 
 	/* Submit data write for current data line */
@@ -572,7 +577,7 @@ static int pblk_submit_write(struct pblk *pblk)
 	struct bio *bio;
 	struct nvm_rq *rqd;
 	unsigned int secs_avail, secs_to_sync, secs_to_com;
-	unsigned int secs_to_flush;
+	unsigned int secs_to_flush, packed_meta_pgs;
 	unsigned long pos;
 	unsigned int resubmit;
 
@@ -608,7 +613,7 @@ static int pblk_submit_write(struct pblk *pblk)
 			return 1;
 
 		secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
-		if (!secs_to_flush && secs_avail < pblk->min_write_pgs)
+		if (!secs_to_flush && secs_avail < pblk->min_write_pgs_data)
 			return 1;
 
 		secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail,
@@ -623,7 +628,8 @@ static int pblk_submit_write(struct pblk *pblk)
 		pos = pblk_rb_read_commit(&pblk->rwb, secs_to_com);
 	}
 
-	bio = bio_alloc(GFP_KERNEL, secs_to_sync);
+	packed_meta_pgs = (pblk->min_write_pgs - pblk->min_write_pgs_data);
+	bio = bio_alloc(GFP_KERNEL, secs_to_sync + packed_meta_pgs);
 
 	bio->bi_iter.bi_sector = 0; /* internal bio */
 	bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 4c61ede5b207..c95ecd8bcf79 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -605,6 +605,7 @@ struct pblk {
 	int state;			/* pblk line state */
 
 	int min_write_pgs; /* Minimum amount of pages required by controller */
+	int min_write_pgs_data; /* Minimum amount of payload pages */
 	int max_write_pgs; /* Maximum amount of pages supported by controller */
 
 	sector_t capacity; /* Device capacity when bad blocks are subtracted */
@@ -798,7 +799,7 @@ void pblk_dealloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
 u64 pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
 u64 __pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
 int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
-		   unsigned long secs_to_flush);
+		   unsigned long secs_to_flush, bool skip_meta);
 void pblk_up_page(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas);
 void pblk_down_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
 		  unsigned long *lun_bitmap);
@@ -823,6 +824,8 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
 			  u64 *lba_list, int nr_secs);
 void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
 			 sector_t blba, int nr_secs);
+void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
+void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
 
 /*
  * pblk user I/O write path
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/5] lightnvm: pblk: Disable interleaved metadata in pblk
  2018-06-15 22:27 [PATCH 0/5] lightnvm: More flexible approach to metadata Igor Konopko
                   ` (3 preceding siblings ...)
  2018-06-15 22:27 ` [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk Igor Konopko
@ 2018-06-15 22:27 ` Igor Konopko
  2018-06-16 19:38   ` Matias Bjørling
  2018-06-19 10:28 ` [PATCH 0/5] lightnvm: More flexible approach to metadata Javier Gonzalez
  5 siblings, 1 reply; 25+ messages in thread
From: Igor Konopko @ 2018-06-15 22:27 UTC (permalink / raw)
  To: mb, javier; +Cc: linux-block, michal.sorn, marcin.dziegielewski, igor.j.konopko

Currently pblk and lightnvm does only check for size
of OOB metadata and does not care wheather this meta
is located in separate buffer or is interleaved with
data in single buffer.

In reality only the first scenario is supported, where
second mode will break pblk functionality during any
IO operation.

The goal of this patch is to block creation of pblk
devices in case of interleaved metadata

Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
---
 drivers/lightnvm/pblk-init.c | 6 ++++++
 drivers/nvme/host/lightnvm.c | 1 +
 include/linux/lightnvm.h     | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 5eb641da46ed..483a6d479e7d 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1238,6 +1238,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
 		return ERR_PTR(-EINVAL);
 	}
 
+	if (geo->ext) {
+		pr_err("pblk: extended (interleaved) metadata in data buffer"
+			" not supported\n");
+		return ERR_PTR(-EINVAL);
+	}
+
 	pblk = kzalloc(sizeof(struct pblk), GFP_KERNEL);
 	if (!pblk)
 		return ERR_PTR(-ENOMEM);
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 670478abc754..872ab854ccf5 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -979,6 +979,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
 
 	geo->csecs = 1 << ns->lba_shift;
 	geo->sos = ns->ms;
+	geo->ext = ns->ext;
 }
 
 int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index 72a55d71917e..b13e64e2112f 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -350,6 +350,7 @@ struct nvm_geo {
 	u32	clba;		/* sectors per chunk */
 	u16	csecs;		/* sector size */
 	u16	sos;		/* out-of-band area size */
+	u16	ext;		/* metadata in extended data buffer */
 
 	/* device write constrains */
 	u32	ws_min;		/* minimum write size */
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata
  2018-06-15 22:27 ` [PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata Igor Konopko
@ 2018-06-16 19:24   ` Matias Bjørling
  2018-06-18 14:23   ` Javier Gonzalez
  1 sibling, 0 replies; 25+ messages in thread
From: Matias Bjørling @ 2018-06-16 19:24 UTC (permalink / raw)
  To: igor.j.konopko, javier; +Cc: linux-block, michal.sorn, marcin.dziegielewski

On 06/16/2018 12:27 AM, Igor Konopko wrote:
> Currently pblk assumes that size of OOB metadata on drive is always
> equal to size of pblk_sec_meta struct. This commit add helpers which will
> allow to handle different sizes of OOB metadata on drive.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>   drivers/lightnvm/pblk-core.c     | 10 +++++----
>   drivers/lightnvm/pblk-map.c      | 21 ++++++++++++-------
>   drivers/lightnvm/pblk-read.c     | 45 +++++++++++++++++++++++++---------------
>   drivers/lightnvm/pblk-recovery.c | 24 ++++++++++++---------
>   drivers/lightnvm/pblk.h          | 29 ++++++++++++++++++++++++++
>   5 files changed, 91 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 66ab1036f2fb..8a0ac466872f 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -685,7 +685,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>   	rqd.nr_ppas = rq_ppas;
>   
>   	if (dir == PBLK_WRITE) {
> -		struct pblk_sec_meta *meta_list = rqd.meta_list;
> +		void *meta_list = rqd.meta_list;
>   
>   		rqd.flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
>   		for (i = 0; i < rqd.nr_ppas; ) {
> @@ -693,7 +693,8 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>   			paddr = __pblk_alloc_page(pblk, line, min);
>   			spin_unlock(&line->lock);
>   			for (j = 0; j < min; j++, i++, paddr++) {
> -				meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
> +				pblk_get_meta_at(pblk, meta_list, i)->lba =
> +					cpu_to_le64(ADDR_EMPTY);
>   				rqd.ppa_list[i] =
>   					addr_to_gen_ppa(pblk, paddr, id);
>   			}
> @@ -825,14 +826,15 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
>   	rqd.nr_ppas = lm->smeta_sec;
>   
>   	for (i = 0; i < lm->smeta_sec; i++, paddr++) {
> -		struct pblk_sec_meta *meta_list = rqd.meta_list;
> +		void *meta_list = rqd.meta_list;
>   
>   		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>   
>   		if (dir == PBLK_WRITE) {
>   			__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
>   
> -			meta_list[i].lba = lba_list[paddr] = addr_empty;
> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
> +				lba_list[paddr] = addr_empty;
>   		}
>   	}
>   
> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
> index 953ca31dda68..92c40b546c4e 100644
> --- a/drivers/lightnvm/pblk-map.c
> +++ b/drivers/lightnvm/pblk-map.c
> @@ -21,7 +21,7 @@
>   static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>   			      struct ppa_addr *ppa_list,
>   			      unsigned long *lun_bitmap,
> -			      struct pblk_sec_meta *meta_list,
> +			      void *meta_list,
>   			      unsigned int valid_secs)
>   {
>   	struct pblk_line *line = pblk_line_get_data(pblk);
> @@ -67,14 +67,17 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>   			kref_get(&line->ref);
>   			w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
>   			w_ctx->ppa = ppa_list[i];
> -			meta_list[i].lba = cpu_to_le64(w_ctx->lba);
> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
> +							cpu_to_le64(w_ctx->lba);
>   			lba_list[paddr] = cpu_to_le64(w_ctx->lba);
>   			if (lba_list[paddr] != addr_empty)
>   				line->nr_valid_lbas++;
>   			else
>   				atomic64_inc(&pblk->pad_wa);
>   		} else {
> -			lba_list[paddr] = meta_list[i].lba = addr_empty;
> +			lba_list[paddr] =
> +				pblk_get_meta_at(pblk, meta_list, i)->lba =
> +					addr_empty;
>   			__pblk_map_invalidate(pblk, line, paddr);
>   		}
>   	}
> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>   		 unsigned long *lun_bitmap, unsigned int valid_secs,
>   		 unsigned int off)
>   {
> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
> +	void *meta_list = rqd->meta_list;
>   	unsigned int map_secs;
>   	int min = pblk->min_write_pgs;
>   	int i;
> @@ -95,7 +98,9 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>   	for (i = off; i < rqd->nr_ppas; i += min) {
>   		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>   		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
> -					lun_bitmap, &meta_list[i], map_secs)) {
> +					lun_bitmap,
> +					pblk_get_meta_at(pblk, meta_list, i),
> +					map_secs)) {
>   			bio_put(rqd->bio);
>   			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>   			pblk_pipeline_stop(pblk);
> @@ -111,7 +116,7 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>   	struct nvm_tgt_dev *dev = pblk->dev;
>   	struct nvm_geo *geo = &dev->geo;
>   	struct pblk_line_meta *lm = &pblk->lm;
> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
> +	void *meta_list = rqd->meta_list;
>   	struct pblk_line *e_line, *d_line;
>   	unsigned int map_secs;
>   	int min = pblk->min_write_pgs;
> @@ -120,7 +125,9 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>   	for (i = 0; i < rqd->nr_ppas; i += min) {
>   		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>   		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
> -					lun_bitmap, &meta_list[i], map_secs)) {
> +					lun_bitmap,
> +					pblk_get_meta_at(pblk, meta_list, i),
> +					map_secs)) {
>   			bio_put(rqd->bio);
>   			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>   			pblk_pipeline_stop(pblk);
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 6e93c489ce57..81cf79ea2dc6 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -42,7 +42,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>   				 struct bio *bio, sector_t blba,
>   				 unsigned long *read_bitmap)
>   {
> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
> +	void *meta_list = rqd->meta_list;
>   	struct ppa_addr ppas[PBLK_MAX_REQ_ADDRS];
>   	int nr_secs = rqd->nr_ppas;
>   	bool advanced_bio = false;
> @@ -57,7 +57,8 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>   retry:
>   		if (pblk_ppa_empty(p)) {
>   			WARN_ON(test_and_set_bit(i, read_bitmap));
> -			meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
> +					cpu_to_le64(ADDR_EMPTY);
>   
>   			if (unlikely(!advanced_bio)) {
>   				bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
> @@ -77,7 +78,8 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>   				goto retry;
>   			}
>   			WARN_ON(test_and_set_bit(i, read_bitmap));
> -			meta_list[i].lba = cpu_to_le64(lba);
> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
> +					cpu_to_le64(lba);
>   			advanced_bio = true;
>   #ifdef CONFIG_NVM_PBLK_DEBUG
>   			atomic_long_inc(&pblk->cache_reads);
> @@ -106,13 +108,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>   static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>   				sector_t blba)
>   {
> -	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
> +	void *meta_lba_list = rqd->meta_list;
>   	int nr_lbas = rqd->nr_ppas;
>   	int i;
>   
> -	for (i = 0; i < nr_lbas; i++) {
> -		u64 lba = le64_to_cpu(meta_lba_list[i].lba);
> +	if (!pblk_is_oob_meta_supported(pblk))
> +		return;
>   
> +	for (i = 0; i < nr_lbas; i++) {
> +		u64 lba = le64_to_cpu(
> +				pblk_get_meta_at(pblk, meta_lba_list, i)->lba);
>   		if (lba == ADDR_EMPTY)
>   			continue;
>   
> @@ -136,17 +141,21 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>   static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>   				 u64 *lba_list, int nr_lbas)
>   {
> -	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
> +	void *meta_lba_list = rqd->meta_list;
>   	int i, j;
>   
> -	for (i = 0, j = 0; i < nr_lbas; i++) {
> +	if (!pblk_is_oob_meta_supported(pblk))
> +		return;
> +
> +	for (i = 0; j = 0, i < nr_lbas; i++) {
>   		u64 lba = lba_list[i];
>   		u64 meta_lba;
>   
>   		if (lba == ADDR_EMPTY)
>   			continue;
>   
> -		meta_lba = le64_to_cpu(meta_lba_list[j].lba);
> +		meta_lba = le64_to_cpu(
> +				pblk_get_meta_at(pblk, meta_lba_list, i)->lba);
>   
>   		if (lba != meta_lba) {
>   #ifdef CONFIG_NVM_PBLK_DEBUG
> @@ -235,7 +244,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>   			     struct bio *orig_bio, unsigned int bio_init_idx,
>   			     unsigned long *read_bitmap)
>   {
> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
> +	void *meta_list = rqd->meta_list;
>   	struct bio *new_bio;
>   	struct bio_vec src_bv, dst_bv;
>   	void *ppa_ptr = NULL;
> @@ -261,7 +270,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>   	}
>   
>   	for (i = 0; i < nr_secs; i++)
> -		lba_list_mem[i] = meta_list[i].lba;
> +		lba_list_mem[i] = pblk_get_meta_at(pblk, meta_list, i)->lba;
>   
>   	new_bio->bi_iter.bi_sector = 0; /* internal bio */
>   	bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
> @@ -300,8 +309,8 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>   	}
>   
>   	for (i = 0; i < nr_secs; i++) {
> -		lba_list_media[i] = meta_list[i].lba;
> -		meta_list[i].lba = lba_list_mem[i];
> +		lba_list_media[i] = pblk_get_meta_at(pblk, meta_list, i)->lba;
> +		pblk_get_meta_at(pblk, meta_list, i)->lba = lba_list_mem[i];
>   	}
>   
>   	/* Fill the holes in the original bio */
> @@ -313,7 +322,8 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>   
>   		kref_put(&line->ref, pblk_line_put);
>   
> -		meta_list[hole].lba = lba_list_media[i];
> +		pblk_get_meta_at(pblk, meta_list, hole)->lba =
> +						lba_list_media[i];
>   
>   		src_bv = new_bio->bi_io_vec[i++];
>   		dst_bv = orig_bio->bi_io_vec[bio_init_idx + hole];
> @@ -354,7 +364,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>   static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>   			 sector_t lba, unsigned long *read_bitmap)
>   {
> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
> +	void *meta_list = rqd->meta_list;
>   	struct ppa_addr ppa;
>   
>   	pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
> @@ -366,7 +376,8 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>   retry:
>   	if (pblk_ppa_empty(ppa)) {
>   		WARN_ON(test_and_set_bit(0, read_bitmap));
> -		meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
> +		pblk_get_meta_at(pblk, meta_list, 0)->lba =
> +						cpu_to_le64(ADDR_EMPTY);
>   		return;
>   	}
>   
> @@ -380,7 +391,7 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>   		}
>   
>   		WARN_ON(test_and_set_bit(0, read_bitmap));
> -		meta_list[0].lba = cpu_to_le64(lba);
> +		pblk_get_meta_at(pblk, meta_list, 0)->lba = cpu_to_le64(lba);
>   
>   #ifdef CONFIG_NVM_PBLK_DEBUG
>   		atomic_long_inc(&pblk->cache_reads);
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index b1a91cb3ca4d..0007e8011476 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -98,7 +98,7 @@ static int pblk_calc_sec_in_line(struct pblk *pblk, struct pblk_line *line)
>   
>   struct pblk_recov_alloc {
>   	struct ppa_addr *ppa_list;
> -	struct pblk_sec_meta *meta_list;
> +	void *meta_list;
>   	struct nvm_rq *rqd;
>   	void *data;
>   	dma_addr_t dma_ppa_list;
> @@ -111,7 +111,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>   	struct nvm_tgt_dev *dev = pblk->dev;
>   	struct nvm_geo *geo = &dev->geo;
>   	struct ppa_addr *ppa_list;
> -	struct pblk_sec_meta *meta_list;
> +	void *meta_list;
>   	struct nvm_rq *rqd;
>   	struct bio *bio;
>   	void *data;
> @@ -199,7 +199,8 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>   	}
>   
>   	for (i = 0; i < rqd->nr_ppas; i++) {
> -		u64 lba = le64_to_cpu(meta_list[i].lba);
> +		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
> +							meta_list, i)->lba);
>   
>   		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
>   			continue;
> @@ -240,7 +241,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>   	struct nvm_tgt_dev *dev = pblk->dev;
>   	struct nvm_geo *geo = &dev->geo;
>   	struct ppa_addr *ppa_list;
> -	struct pblk_sec_meta *meta_list;
> +	void *meta_list;
>   	struct pblk_pad_rq *pad_rq;
>   	struct nvm_rq *rqd;
>   	struct bio *bio;
> @@ -332,7 +333,8 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>   			dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
>   
>   			pblk_map_invalidate(pblk, dev_ppa);
> -			lba_list[w_ptr] = meta_list[i].lba = addr_empty;
> +			lba_list[w_ptr] = pblk_get_meta_at(pblk,
> +						meta_list, i)->lba = addr_empty;
>   			rqd->ppa_list[i] = dev_ppa;
>   		}
>   	}
> @@ -389,7 +391,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>   	struct nvm_tgt_dev *dev = pblk->dev;
>   	struct nvm_geo *geo = &dev->geo;
>   	struct ppa_addr *ppa_list;
> -	struct pblk_sec_meta *meta_list;
> +	void *meta_list;
>   	struct nvm_rq *rqd;
>   	struct bio *bio;
>   	void *data;
> @@ -473,7 +475,8 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>   	if (!rec_round++ && !rqd->error) {
>   		rec_round = 0;
>   		for (i = 0; i < rqd->nr_ppas; i++, r_ptr++) {
> -			u64 lba = le64_to_cpu(meta_list[i].lba);
> +			u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
> +							meta_list, i)->lba);
>   
>   			if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
>   				continue;
> @@ -523,7 +526,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>   	struct nvm_tgt_dev *dev = pblk->dev;
>   	struct nvm_geo *geo = &dev->geo;
>   	struct ppa_addr *ppa_list;
> -	struct pblk_sec_meta *meta_list;
> +	void *meta_list;
>   	struct nvm_rq *rqd;
>   	struct bio *bio;
>   	void *data;
> @@ -619,7 +622,8 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>   	}
>   
>   	for (i = 0; i < rqd->nr_ppas; i++) {
> -		u64 lba = le64_to_cpu(meta_list[i].lba);
> +		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
> +							meta_list, i)->lba);
>   
>   		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
>   			continue;
> @@ -641,7 +645,7 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>   	struct nvm_geo *geo = &dev->geo;
>   	struct nvm_rq *rqd;
>   	struct ppa_addr *ppa_list;
> -	struct pblk_sec_meta *meta_list;
> +	void *meta_list;
>   	struct pblk_recov_alloc p;
>   	void *data;
>   	dma_addr_t dma_ppa_list, dma_meta_list;
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index c072955d72c2..f82c3a0b0de5 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -1420,4 +1420,33 @@ static inline void pblk_setup_uuid(struct pblk *pblk)
>   	uuid_le_gen(&uuid);
>   	memcpy(pblk->instance_uuid, uuid.b, 16);
>   }
> +
> +static inline int pblk_is_oob_meta_supported(struct pblk *pblk)
> +{
> +	struct nvm_tgt_dev *dev = pblk->dev;
> +	struct nvm_geo *geo = &dev->geo;
> +
> +	/* Pblk uses OOB meta to store LBA of given physical sector.
> +	 * The LBA is eventually used in recovery mode and/or for handling
> +	 * telemetry events (e.g., relocate sector).
> +	 */
> +
> +	return (geo->sos >= sizeof(struct pblk_sec_meta));
> +}
> +
> +static inline struct pblk_sec_meta *pblk_get_meta_at(struct pblk *pblk,
> +						void *meta_ptr, int index)
> +{
> +	struct nvm_tgt_dev *dev = pblk->dev;
> +	struct nvm_geo *geo = &dev->geo;
> +
> +	if (pblk_is_oob_meta_supported(pblk))
> +		/* We need to have oob meta layout the same as on drive */
> +		return meta_ptr + geo->sos * index;
> +	else
> +		/* We can create virtual oob meta layout since drive does
> +		 * not have real oob metadata
> +		 */
> +		return meta_ptr + sizeof(struct pblk_sec_meta) * index;
> +}
>   #endif /* PBLK_H_ */
> 

Thanks. Looks good to me.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] lightnvm: pblk: Remove resv field for sec meta
  2018-06-15 22:27 ` [PATCH 2/5] lightnvm: pblk: Remove resv field for sec meta Igor Konopko
@ 2018-06-16 19:27   ` Matias Bjørling
  2018-06-18 14:25     ` Javier Gonzalez
  0 siblings, 1 reply; 25+ messages in thread
From: Matias Bjørling @ 2018-06-16 19:27 UTC (permalink / raw)
  To: igor.j.konopko, javier; +Cc: linux-block, michal.sorn, marcin.dziegielewski

On 06/16/2018 12:27 AM, Igor Konopko wrote:
> Since we have flexible size of pblk_sec_meta
> which depends on drive metadata size we can
> remove not needed reserved field from that
> structure
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>   drivers/lightnvm/pblk.h | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index f82c3a0b0de5..27658dc6fc1a 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -82,7 +82,6 @@ enum {
>   };
>   
>   struct pblk_sec_meta {
> -	u64 reserved;
>   	__le64 lba;
>   };
>   
> 

Looks good to me. Javier may have some comment on this, since it is not 
completely obvious from the code why that reserved attribute is there. I 
do like the change to go in, as it needlessly extends the requirement 
from 8 to 16bytes.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] lightnvm: Flexible DMA pool entry size
  2018-06-15 22:27 ` [PATCH 3/5] lightnvm: Flexible DMA pool entry size Igor Konopko
@ 2018-06-16 19:32   ` Matias Bjørling
  0 siblings, 0 replies; 25+ messages in thread
From: Matias Bjørling @ 2018-06-16 19:32 UTC (permalink / raw)
  To: igor.j.konopko, javier; +Cc: linux-block, michal.sorn, marcin.dziegielewski

On 06/16/2018 12:27 AM, Igor Konopko wrote:
> Currently whole lightnvm and pblk uses single DMA pool,
> for which entry size is always equal to PAGE_SIZE.
> PPA list always needs 8b*64, so there is only 56b*64
> space for OOB meta. Since NVMe OOB meta can be bigger,
> such as 128b, this solution is not robustness.
> 
> This patch add the possiblity to support OOB meta above
> 56b by creating separate DMA pool for PBLK with entry
> size which is big enough to store both PPA list and such
> a OOB metadata.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>   drivers/lightnvm/core.c          | 33 ++++++++++++++++++++++++---------
>   drivers/lightnvm/pblk-core.c     | 24 +++++++++++++-----------
>   drivers/lightnvm/pblk-init.c     |  9 +++++++++
>   drivers/lightnvm/pblk-read.c     | 40 +++++++++++++++++++++++++++-------------
>   drivers/lightnvm/pblk-recovery.c | 18 ++++++++++--------
>   drivers/lightnvm/pblk-write.c    |  8 ++++----
>   drivers/lightnvm/pblk.h          | 11 ++++++++++-
>   drivers/nvme/host/lightnvm.c     |  6 ++++--
>   include/linux/lightnvm.h         |  8 +++++---
>   9 files changed, 106 insertions(+), 51 deletions(-)
> 
> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> index 60aa7bc5a630..bc8e6ecea083 100644
> --- a/drivers/lightnvm/core.c
> +++ b/drivers/lightnvm/core.c
> @@ -642,20 +642,33 @@ void nvm_unregister_tgt_type(struct nvm_tgt_type *tt)
>   }
>   EXPORT_SYMBOL(nvm_unregister_tgt_type);
>   
> -void *nvm_dev_dma_alloc(struct nvm_dev *dev, gfp_t mem_flags,
> -							dma_addr_t *dma_handler)
> +void *nvm_dev_dma_alloc(struct nvm_dev *dev, void *pool,
> +				gfp_t mem_flags, dma_addr_t *dma_handler)
>   {
> -	return dev->ops->dev_dma_alloc(dev, dev->dma_pool, mem_flags,
> -								dma_handler);
> +	return dev->ops->dev_dma_alloc(dev, pool ?: dev->dma_pool,
> +						mem_flags, dma_handler);

Nitpick. Let's pass in dev->dma_pool in that case. So we don't need to 
if it here.

>   }
>   EXPORT_SYMBOL(nvm_dev_dma_alloc);
>   
> -void nvm_dev_dma_free(struct nvm_dev *dev, void *addr, dma_addr_t dma_handler)
> +void nvm_dev_dma_free(struct nvm_dev *dev, void *pool,
> +				void *addr, dma_addr_t dma_handler)
>   {
> -	dev->ops->dev_dma_free(dev->dma_pool, addr, dma_handler);
> +	dev->ops->dev_dma_free(pool ?: dev->dma_pool, addr, dma_handler);
>   }
>   EXPORT_SYMBOL(nvm_dev_dma_free);

Sa,e-

>   
> +void *nvm_dev_dma_create(struct nvm_dev *dev, int size, char *name)
> +{
> +	return dev->ops->create_dma_pool(dev, name, size);
> +}
> +EXPORT_SYMBOL(nvm_dev_dma_create);
> +
> +void nvm_dev_dma_destroy(struct nvm_dev *dev, void *pool)
> +{
> +	dev->ops->destroy_dma_pool(pool);
> +}
> +EXPORT_SYMBOL(nvm_dev_dma_destroy);
> +

Let's make these _GPL.

>   static struct nvm_dev *nvm_find_nvm_dev(const char *name)
>   {
>   	struct nvm_dev *dev;
> @@ -683,7 +696,8 @@ static int nvm_set_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd,
>   	}
>   
>   	rqd->nr_ppas = nr_ppas;
> -	rqd->ppa_list = nvm_dev_dma_alloc(dev, GFP_KERNEL, &rqd->dma_ppa_list);
> +	rqd->ppa_list = nvm_dev_dma_alloc(dev, NULL, GFP_KERNEL,
> +						&rqd->dma_ppa_list);
>   	if (!rqd->ppa_list) {
>   		pr_err("nvm: failed to allocate dma memory\n");
>   		return -ENOMEM;
> @@ -709,7 +723,8 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev *tgt_dev,
>   	if (!rqd->ppa_list)
>   		return;
>   
> -	nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list);
> +	nvm_dev_dma_free(tgt_dev->parent, NULL, rqd->ppa_list,
> +				rqd->dma_ppa_list);
>   }
>   
>   int nvm_get_chunk_meta(struct nvm_tgt_dev *tgt_dev, struct nvm_chk_meta *meta,
> @@ -933,7 +948,7 @@ int nvm_register(struct nvm_dev *dev)
>   	if (!dev->q || !dev->ops)
>   		return -EINVAL;
>   
> -	dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist");
> +	dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", PAGE_SIZE);
>   	if (!dev->dma_pool) {
>   		pr_err("nvm: could not create dma pool\n");
>   		return -ENOMEM;
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 8a0ac466872f..c092ee93a18d 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -279,7 +279,7 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type)
>   	}
>   
>   	if (rqd->meta_list)
> -		nvm_dev_dma_free(dev->parent, rqd->meta_list,
> +		nvm_dev_dma_free(dev->parent, pblk->dma_pool, rqd->meta_list,
>   				rqd->dma_meta_list);
>   	mempool_free(rqd, pool);
>   }
> @@ -652,13 +652,13 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>   	} else
>   		return -EINVAL;
>   
> -	meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> -							&dma_meta_list);
> +	meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
> +					GFP_KERNEL, &dma_meta_list);
>   	if (!meta_list)
>   		return -ENOMEM;
>   
> -	ppa_list = meta_list + pblk_dma_meta_size;
> -	dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
> +	ppa_list = meta_list + pblk_dma_meta_size(pblk);
> +	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
>   
>   next_rq:
>   	memset(&rqd, 0, sizeof(struct nvm_rq));
> @@ -758,7 +758,8 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>   	if (left_ppas)
>   		goto next_rq;
>   free_rqd_dma:
> -	nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
> +	nvm_dev_dma_free(dev->parent, pblk->dma_pool, rqd.meta_list,
> +						rqd.dma_meta_list);
>   	return ret;
>   }
>   
> @@ -803,13 +804,13 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
>   
>   	memset(&rqd, 0, sizeof(struct nvm_rq));
>   
> -	rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> -							&rqd.dma_meta_list);
> +	rqd.meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
> +						GFP_KERNEL, &rqd.dma_meta_list);
>   	if (!rqd.meta_list)
>   		return -ENOMEM;
>   
> -	rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
> -	rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
> +	rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size(pblk);
> +	rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size(pblk);
>   
>   	bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
>   	if (IS_ERR(bio)) {
> @@ -861,7 +862,8 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
>   	}
>   
>   free_ppa_list:
> -	nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
> +	nvm_dev_dma_free(dev->parent, pblk->dma_pool, rqd.meta_list,
> +						rqd.dma_meta_list);
>   
>   	return ret;
>   }
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index aa2426403171..f05112230a52 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -1142,6 +1142,7 @@ static void pblk_free(struct pblk *pblk)
>   	pblk_l2p_free(pblk);
>   	pblk_rwb_free(pblk);
>   	pblk_core_free(pblk);
> +	nvm_dev_dma_destroy(pblk->dev->parent, pblk->dma_pool);
>   
>   	kfree(pblk);
>   }
> @@ -1212,6 +1213,13 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
>   	pblk->disk = tdisk;
>   	pblk->state = PBLK_STATE_RUNNING;
>   	pblk->gc.gc_enabled = 0;
> +	pblk->dma_pool = nvm_dev_dma_create(dev->parent, (pblk_dma_ppa_size +
> +						pblk_dma_meta_size(pblk)),
> +						tdisk->disk_name);
> +	if (!pblk->dma_pool) {
> +		kfree(pblk);
> +		return ERR_PTR(-ENOMEM);
> +	}
>   
>   	spin_lock_init(&pblk->resubmit_lock);
>   	spin_lock_init(&pblk->trans_lock);
> @@ -1312,6 +1320,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
>   fail_free_core:
>   	pblk_core_free(pblk);
>   fail:
> +	nvm_dev_dma_destroy(dev->parent, pblk->dma_pool);
>   	kfree(pblk);
>   	return ERR_PTR(ret);
>   }
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 81cf79ea2dc6..9ff4f48c4168 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -255,9 +255,13 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>   	int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
>   	int i, ret, hole;
>   
> -	/* Re-use allocated memory for intermediate lbas */
> -	lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
> -	lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
> +	lba_list_mem = kcalloc(nr_secs, sizeof(__le64), GFP_KERNEL);
> +	if (!lba_list_mem)
> +		goto err_alloc_mem;
> +
> +	lba_list_media = kcalloc(nr_secs, sizeof(__le64), GFP_KERNEL);
> +	if (!lba_list_media)
> +		goto err_alloc_media;
>   
>   	new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>   
> @@ -349,6 +353,8 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>   	rqd->bio = NULL;
>   	rqd->nr_ppas = nr_secs;
>   
> +	kfree(lba_list_media);
> +	kfree(lba_list_mem);
>   	__pblk_end_io_read(pblk, rqd, false);
>   	return NVM_IO_DONE;
>   
> @@ -356,6 +362,10 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>   	/* Free allocated pages in new bio */
>   	pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
>   fail_add_pages:
> +	kfree(lba_list_media);
> +err_alloc_media:
> +	kfree(lba_list_mem);
> +err_alloc_mem:
>   	pr_err("pblk: failed to perform partial read\n");
>   	__pblk_end_io_read(pblk, rqd, false);
>   	return NVM_IO_ERR;
> @@ -444,16 +454,17 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
>   	 */
>   	bio_init_idx = pblk_get_bi_idx(bio);
>   
> -	rqd->meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> -							&rqd->dma_meta_list);
> +	rqd->meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
> +					GFP_KERNEL, &rqd->dma_meta_list);
>   	if (!rqd->meta_list) {
>   		pr_err("pblk: not able to allocate ppa list\n");
>   		goto fail_rqd_free;
>   	}
>   
>   	if (nr_secs > 1) {
> -		rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
> -		rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
> +		rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk);
> +		rqd->dma_ppa_list = rqd->dma_meta_list +
> +					pblk_dma_meta_size(pblk);
>   
>   		pblk_read_ppalist_rq(pblk, rqd, bio, blba, &read_bitmap);
>   	} else {
> @@ -578,14 +589,15 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
>   
>   	memset(&rqd, 0, sizeof(struct nvm_rq));
>   
> -	rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> -							&rqd.dma_meta_list);
> +	rqd.meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
> +						GFP_KERNEL, &rqd.dma_meta_list);
>   	if (!rqd.meta_list)
>   		return -ENOMEM;
>   
>   	if (gc_rq->nr_secs > 1) {
> -		rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
> -		rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
> +		rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size(pblk);
> +		rqd.dma_ppa_list = rqd.dma_meta_list +
> +					pblk_dma_meta_size(pblk);
>   
>   		gc_rq->secs_to_gc = read_ppalist_rq_gc(pblk, &rqd, gc_rq->line,
>   							gc_rq->lba_list,
> @@ -642,12 +654,14 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
>   #endif
>   
>   out:
> -	nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
> +	nvm_dev_dma_free(dev->parent, pblk->dma_pool, rqd.meta_list,
> +						rqd.dma_meta_list);
>   	return ret;
>   
>   err_free_bio:
>   	bio_put(bio);
>   err_free_dma:
> -	nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
> +	nvm_dev_dma_free(dev->parent, pblk->dma_pool, rqd.meta_list,
> +						rqd.dma_meta_list);
>   	return ret;
>   }
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index 0007e8011476..f5853fc77a0c 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -280,14 +280,15 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>   
>   	rq_len = rq_ppas * geo->csecs;
>   
> -	meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
> +	meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
> +					 GFP_KERNEL, &dma_meta_list);
>   	if (!meta_list) {
>   		ret = -ENOMEM;
>   		goto fail_free_pad;
>   	}
>   
> -	ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
> -	dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
> +	ppa_list = (void *)(meta_list) + pblk_dma_meta_size(pblk);
> +	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
>   
>   	bio = pblk_bio_map_addr(pblk, data, rq_ppas, rq_len,
>   						PBLK_VMALLOC_META, GFP_KERNEL);
> @@ -373,7 +374,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>   fail_free_bio:
>   	bio_put(bio);
>   fail_free_meta:
> -	nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
> +	nvm_dev_dma_free(dev->parent, pblk->dma_pool, meta_list, dma_meta_list);
>   fail_free_pad:
>   	kfree(pad_rq);
>   	vfree(data);
> @@ -651,12 +652,13 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>   	dma_addr_t dma_ppa_list, dma_meta_list;
>   	int done, ret = 0;
>   
> -	meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
> +	meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
> +					 GFP_KERNEL, &dma_meta_list);
>   	if (!meta_list)
>   		return -ENOMEM;
>   
> -	ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
> -	dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
> +	ppa_list = (void *)(meta_list) + pblk_dma_meta_size(pblk);
> +	dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
>   
>   	data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
>   	if (!data) {
> @@ -693,7 +695,7 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>   out:
>   	kfree(data);
>   free_meta_list:
> -	nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
> +	nvm_dev_dma_free(dev->parent, pblk->dma_pool, meta_list, dma_meta_list);
>   
>   	return ret;
>   }
> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
> index 5f44df999aed..6552db35f916 100644
> --- a/drivers/lightnvm/pblk-write.c
> +++ b/drivers/lightnvm/pblk-write.c
> @@ -306,13 +306,13 @@ static int pblk_alloc_w_rq(struct pblk *pblk, struct nvm_rq *rqd,
>   	rqd->private = pblk;
>   	rqd->end_io = end_io;
>   
> -	rqd->meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> -							&rqd->dma_meta_list);
> +	rqd->meta_list = nvm_dev_dma_alloc(dev->parent, pblk->dma_pool,
> +					GFP_KERNEL, &rqd->dma_meta_list);
>   	if (!rqd->meta_list)
>   		return -ENOMEM;
>   
> -	rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
> -	rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
> +	rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk);
> +	rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size(pblk);
>   
>   	return 0;
>   }
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 27658dc6fc1a..4c61ede5b207 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -98,7 +98,6 @@ enum {
>   	PBLK_RL_LOW = 4
>   };
>   
> -#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * PBLK_MAX_REQ_ADDRS)
>   #define pblk_dma_ppa_size (sizeof(u64) * PBLK_MAX_REQ_ADDRS)
>   
>   /* write buffer completion context */
> @@ -690,6 +689,7 @@ struct pblk {
>   	struct timer_list wtimer;
>   
>   	struct pblk_gc gc;
> +	void *dma_pool;
>   };
>   
>   struct pblk_line_ws {
> @@ -1448,4 +1448,13 @@ static inline struct pblk_sec_meta *pblk_get_meta_at(struct pblk *pblk,
>   		 */
>   		return meta_ptr + sizeof(struct pblk_sec_meta) * index;
>   }
> +
> +static inline int pblk_dma_meta_size(struct pblk *pblk)
> +{
> +	struct nvm_tgt_dev *dev = pblk->dev;
> +	struct nvm_geo *geo = &dev->geo;
> +
> +	return max(((int)sizeof(struct pblk_sec_meta)), ((int)geo->sos))
> +				* PBLK_MAX_REQ_ADDRS;
> +}
>   #endif /* PBLK_H_ */
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index 006d09e0af74..670478abc754 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -729,11 +729,13 @@ static int nvme_nvm_submit_io_sync(struct nvm_dev *dev, struct nvm_rq *rqd)
>   	return ret;
>   }
>   
> -static void *nvme_nvm_create_dma_pool(struct nvm_dev *nvmdev, char *name)
> +static void *nvme_nvm_create_dma_pool(struct nvm_dev *nvmdev, char *name,
> +					int size)
>   {
>   	struct nvme_ns *ns = nvmdev->q->queuedata;
>   
> -	return dma_pool_create(name, ns->ctrl->dev, PAGE_SIZE, PAGE_SIZE, 0);
> +	size = round_up(size, PAGE_SIZE);
> +	return dma_pool_create(name, ns->ctrl->dev, size, PAGE_SIZE, 0);
>   }
>   
>   static void nvme_nvm_destroy_dma_pool(void *pool)
> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
> index e9e0d1c7eaf5..72a55d71917e 100644
> --- a/include/linux/lightnvm.h
> +++ b/include/linux/lightnvm.h
> @@ -90,7 +90,7 @@ typedef int (nvm_get_chk_meta_fn)(struct nvm_dev *, struct nvm_chk_meta *,
>   								sector_t, int);
>   typedef int (nvm_submit_io_fn)(struct nvm_dev *, struct nvm_rq *);
>   typedef int (nvm_submit_io_sync_fn)(struct nvm_dev *, struct nvm_rq *);
> -typedef void *(nvm_create_dma_pool_fn)(struct nvm_dev *, char *);
> +typedef void *(nvm_create_dma_pool_fn)(struct nvm_dev *, char *, int);
>   typedef void (nvm_destroy_dma_pool_fn)(void *);
>   typedef void *(nvm_dev_dma_alloc_fn)(struct nvm_dev *, void *, gfp_t,
>   								dma_addr_t *);
> @@ -517,8 +517,10 @@ struct nvm_tgt_type {
>   extern int nvm_register_tgt_type(struct nvm_tgt_type *);
>   extern void nvm_unregister_tgt_type(struct nvm_tgt_type *);
>   
> -extern void *nvm_dev_dma_alloc(struct nvm_dev *, gfp_t, dma_addr_t *);
> -extern void nvm_dev_dma_free(struct nvm_dev *, void *, dma_addr_t);
> +extern void *nvm_dev_dma_alloc(struct nvm_dev *, void *, gfp_t, dma_addr_t *);
> +extern void nvm_dev_dma_free(struct nvm_dev *, void *, void *, dma_addr_t);
> +extern void *nvm_dev_dma_create(struct nvm_dev *, int, char *);
> +extern void nvm_dev_dma_destroy(struct nvm_dev *, void *);
>   
>   extern struct nvm_dev *nvm_alloc_dev(int);
>   extern int nvm_register(struct nvm_dev *);
> 

Looks good to me.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] lightnvm: pblk: Disable interleaved metadata in pblk
  2018-06-15 22:27 ` [PATCH 5/5] lightnvm: pblk: Disable interleaved " Igor Konopko
@ 2018-06-16 19:38   ` Matias Bjørling
  2018-06-18 14:29     ` Javier Gonzalez
  0 siblings, 1 reply; 25+ messages in thread
From: Matias Bjørling @ 2018-06-16 19:38 UTC (permalink / raw)
  To: igor.j.konopko, javier; +Cc: linux-block, michal.sorn, marcin.dziegielewski

On 06/16/2018 12:27 AM, Igor Konopko wrote:
> Currently pblk and lightnvm does only check for size
> of OOB metadata and does not care wheather this meta
> is located in separate buffer or is interleaved with
> data in single buffer.
> 
> In reality only the first scenario is supported, where
> second mode will break pblk functionality during any
> IO operation.
> 
> The goal of this patch is to block creation of pblk
> devices in case of interleaved metadata
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>   drivers/lightnvm/pblk-init.c | 6 ++++++
>   drivers/nvme/host/lightnvm.c | 1 +
>   include/linux/lightnvm.h     | 1 +
>   3 files changed, 8 insertions(+)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index 5eb641da46ed..483a6d479e7d 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -1238,6 +1238,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
>   		return ERR_PTR(-EINVAL);
>   	}
>   
> +	if (geo->ext) {
> +		pr_err("pblk: extended (interleaved) metadata in data buffer"
> +			" not supported\n");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
>   	pblk = kzalloc(sizeof(struct pblk), GFP_KERNEL);
>   	if (!pblk)
>   		return ERR_PTR(-ENOMEM);
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index 670478abc754..872ab854ccf5 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -979,6 +979,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
>   
>   	geo->csecs = 1 << ns->lba_shift;
>   	geo->sos = ns->ms;
> +	geo->ext = ns->ext;
>   }
>   
>   int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
> index 72a55d71917e..b13e64e2112f 100644
> --- a/include/linux/lightnvm.h
> +++ b/include/linux/lightnvm.h
> @@ -350,6 +350,7 @@ struct nvm_geo {
>   	u32	clba;		/* sectors per chunk */
>   	u16	csecs;		/* sector size */
>   	u16	sos;		/* out-of-band area size */
> +	u16	ext;		/* metadata in extended data buffer */
>   
>   	/* device write constrains */
>   	u32	ws_min;		/* minimum write size */
> 

I think bool type would be better here. Can it be placesd a bit down, 
just over the 1.2 stuff?

Also, feel free to fix up the checkpatch stuff in patch 1 & 3 & 5.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk.
  2018-06-15 22:27 ` [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk Igor Konopko
@ 2018-06-16 19:45   ` Matias Bjørling
  2018-06-19 11:08   ` Javier Gonzalez
  1 sibling, 0 replies; 25+ messages in thread
From: Matias Bjørling @ 2018-06-16 19:45 UTC (permalink / raw)
  To: igor.j.konopko, javier; +Cc: linux-block, michal.sorn, marcin.dziegielewski

On 06/16/2018 12:27 AM, Igor Konopko wrote:
> In current pblk implementation, l2p mapping for not closed lines
> is always stored only in OOB metadata and recovered from it.
> 
> Such a solution does not provide data integrity when drives does
> not have such a OOB metadata space.
> 
> The goal of this patch is to add support for so called packed
> metadata, which store l2p mapping for open lines in last sector
> of every write unit.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
>   drivers/lightnvm/pblk-core.c     | 52 ++++++++++++++++++++++++++++++++++++----
>   drivers/lightnvm/pblk-init.c     | 37 ++++++++++++++++++++++++++--
>   drivers/lightnvm/pblk-rb.c       |  3 +++
>   drivers/lightnvm/pblk-recovery.c | 25 +++++++++++++++----
>   drivers/lightnvm/pblk-sysfs.c    |  7 ++++++
>   drivers/lightnvm/pblk-write.c    | 14 +++++++----
>   drivers/lightnvm/pblk.h          |  5 +++-
>   7 files changed, 128 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index c092ee93a18d..375c6430612e 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -340,7 +340,7 @@ void pblk_write_should_kick(struct pblk *pblk)
>   {
>   	unsigned int secs_avail = pblk_rb_read_count(&pblk->rwb);
>   
> -	if (secs_avail >= pblk->min_write_pgs)
> +	if (secs_avail >= pblk->min_write_pgs_data)
>   		pblk_write_kick(pblk);
>   }
>   
> @@ -371,7 +371,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line)
>   	struct pblk_line_meta *lm = &pblk->lm;
>   	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>   	struct list_head *move_list = NULL;
> -	int vsc = le32_to_cpu(*line->vsc);
> +	int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
> +			* (pblk->min_write_pgs - pblk->min_write_pgs_data);
> +	int vsc = le32_to_cpu(*line->vsc) + packed_meta;
>   
>   	lockdep_assert_held(&line->lock);
>   
> @@ -540,12 +542,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
>   }
>   
>   int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
> -		   unsigned long secs_to_flush)
> +		   unsigned long secs_to_flush, bool skip_meta)
>   {
>   	int max = pblk->sec_per_write;
>   	int min = pblk->min_write_pgs;
>   	int secs_to_sync = 0;
>   
> +	if (skip_meta)
> +		min = max = pblk->min_write_pgs_data;
> +
>   	if (secs_avail >= max)
>   		secs_to_sync = max;
>   	else if (secs_avail >= min)
> @@ -663,7 +668,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>   next_rq:
>   	memset(&rqd, 0, sizeof(struct nvm_rq));
>   
> -	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
> +	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>   	rq_len = rq_ppas * geo->csecs;
>   
>   	bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
> @@ -2091,3 +2096,42 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>   	}
>   	spin_unlock(&pblk->trans_lock);
>   }
> +
> +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
> +{
> +	void *meta_list = rqd->meta_list;
> +	void *page;
> +	int i = 0;
> +
> +	if (pblk_is_oob_meta_supported(pblk))
> +		return;
> +
> +	/* We need to zero out metadata corresponding to packed meta page */
> +	pblk_get_meta_at(pblk, meta_list, rqd->nr_ppas - 1)->lba = ADDR_EMPTY;
> +
> +	page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
> +	/* We need to fill last page of request (packed metadata)
> +	 * with data from oob meta buffer.
> +	 */
> +	for (; i < rqd->nr_ppas; i++)
> +		memcpy(page + (i * sizeof(struct pblk_sec_meta)),
> +			pblk_get_meta_at(pblk, meta_list, i),
> +			sizeof(struct pblk_sec_meta));
> +}
> +
> +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
> +{
> +	void *meta_list = rqd->meta_list;
> +	void *page;
> +	int i = 0;
> +
> +	if (pblk_is_oob_meta_supported(pblk))
> +		return;
> +
> +	page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
> +	/* We need to fill oob meta buffer with data from packed metadata */
> +	for (; i < rqd->nr_ppas; i++)

Initialize i here. i = 0;

> +		memcpy(pblk_get_meta_at(pblk, meta_list, i),
> +			page + (i * sizeof(struct pblk_sec_meta)),
> +			sizeof(struct pblk_sec_meta));
> +}
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index f05112230a52..5eb641da46ed 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -372,8 +372,40 @@ static int pblk_core_init(struct pblk *pblk)
>   	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>   	max_write_ppas = pblk->min_write_pgs * geo->all_luns;
>   	pblk->max_write_pgs = min_t(int, max_write_ppas, NVM_MAX_VLBA);
> +	pblk->min_write_pgs_data = pblk->min_write_pgs;
>   	pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
>   
> +	if (!pblk_is_oob_meta_supported(pblk)) {
> +		/* For drives which does not have OOB metadata feature
> +		 * in order to support recovery feature we need to use
> +		 * so called packed metadata. Packed metada will store
> +		 * the same information as OOB metadata (l2p table mapping,
> +		 * but in the form of the single page at the end of
> +		 * every write request.
> +		 */
> +		if (pblk->min_write_pgs
> +			* sizeof(struct pblk_sec_meta) > PAGE_SIZE) {
> +			/* We want to keep all the packed metadata on single
> +			 * page per write requests. So we need to ensure that
> +			 * it will fit.
> +			 *
> +			 * This is more like sanity check, since there is
> +			 * no device with such a big minimal write size
> +			 * (above 1 metabytes).
> +			 */
> +			pr_err("pblk: Not supported min write size\n");
> +			return -EINVAL;
> +		}
> +		/* For packed meta approach we do some simplification.
> +		 * On read path we always issue requests which size
> +		 * equal to max_write_pgs, with all pages filled with
> +		 * user payload except of last one page which will be
> +		 * filled with packed metadata.
> +		 */
> +		pblk->max_write_pgs = pblk->min_write_pgs;
> +		pblk->min_write_pgs_data = pblk->min_write_pgs - 1;
> +	}
> +
>   	if (pblk->max_write_pgs > PBLK_MAX_REQ_ADDRS) {
>   		pr_err("pblk: vector list too big(%u > %u)\n",
>   				pblk->max_write_pgs, PBLK_MAX_REQ_ADDRS);
> @@ -668,7 +700,7 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
>   	struct pblk_line_meta *lm = &pblk->lm;
>   	struct nvm_geo *geo = &dev->geo;
>   	sector_t provisioned;
> -	int sec_meta, blk_meta;
> +	int sec_meta, blk_meta, clba;
>   
>   	if (geo->op == NVM_TARGET_DEFAULT_OP)
>   		pblk->op = PBLK_DEFAULT_OP;
> @@ -691,7 +723,8 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
>   	sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
>   	blk_meta = DIV_ROUND_UP(sec_meta, geo->clba);
>   
> -	pblk->capacity = (provisioned - blk_meta) * geo->clba;
> +	clba = (geo->clba / pblk->min_write_pgs) * pblk->min_write_pgs_data;
> +	pblk->capacity = (provisioned - blk_meta) * clba;
>   
>   	atomic_set(&pblk->rl.free_blocks, nr_free_blks);
>   	atomic_set(&pblk->rl.free_user_blocks, nr_free_blks);
> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
> index a81a97e8ea6d..081e73e7978f 100644
> --- a/drivers/lightnvm/pblk-rb.c
> +++ b/drivers/lightnvm/pblk-rb.c
> @@ -528,6 +528,9 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
>   		to_read = count;
>   	}
>   
> +	/* Add space for packed metadata if in use*/
> +	pad += (pblk->min_write_pgs - pblk->min_write_pgs_data);
> +
>   	c_ctx->sentry = pos;
>   	c_ctx->nr_valid = to_read;
>   	c_ctx->nr_padded = pad;
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index f5853fc77a0c..0fab18fe30d9 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -138,7 +138,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>   next_read_rq:
>   	memset(rqd, 0, pblk_g_rq_size);
>   
> -	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
> +	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>   	if (!rq_ppas)
>   		rq_ppas = pblk->min_write_pgs;
>   	rq_len = rq_ppas * geo->csecs;
> @@ -198,6 +198,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>   		return -EINTR;
>   	}
>   
> +	pblk_get_packed_meta(pblk, rqd);
>   	for (i = 0; i < rqd->nr_ppas; i++) {
>   		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>   							meta_list, i)->lba);
> @@ -272,7 +273,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>   	kref_init(&pad_rq->ref);
>   
>   next_pad_rq:
> -	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
> +	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>   	if (rq_ppas < pblk->min_write_pgs) {
>   		pr_err("pblk: corrupted pad line %d\n", line->id);
>   		goto fail_free_pad;
> @@ -418,7 +419,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>   next_rq:
>   	memset(rqd, 0, pblk_g_rq_size);
>   
> -	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
> +	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>   	if (!rq_ppas)
>   		rq_ppas = pblk->min_write_pgs;
>   	rq_len = rq_ppas * geo->csecs;
> @@ -475,6 +476,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>   	 */
>   	if (!rec_round++ && !rqd->error) {
>   		rec_round = 0;
> +		pblk_get_packed_meta(pblk, rqd);
>   		for (i = 0; i < rqd->nr_ppas; i++, r_ptr++) {
>   			u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>   							meta_list, i)->lba);
> @@ -492,6 +494,12 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>   		int ret;
>   
>   		bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
> +		if (!pblk_is_oob_meta_supported(pblk) && bit > 0) {
> +			/* This case should not happen since we always read in
> +			 * the same unit here as we wrote in writer thread.
> +			 */
> +			pr_err("pblk: Inconsistent packed metadata read\n");
> +		}
>   		nr_error_bits = rqd->nr_ppas - bit;
>   
>   		/* Roll back failed sectors */
> @@ -550,7 +558,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>   next_rq:
>   	memset(rqd, 0, pblk_g_rq_size);
>   
> -	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
> +	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>   	if (!rq_ppas)
>   		rq_ppas = pblk->min_write_pgs;
>   	rq_len = rq_ppas * geo->csecs;
> @@ -608,6 +616,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>   		int nr_error_bits, bit;
>   
>   		bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
> +		if (!pblk_is_oob_meta_supported(pblk)) {
> +			/* For packed metadata we do not handle partially
> +			 * written requests here, since metadata is always
> +			 * in last page on the requests.
> +			 */
> +			bit = 0;
> +			*done = 0;
> +		}
>   		nr_error_bits = rqd->nr_ppas - bit;
>   
>   		/* Roll back failed sectors */
> @@ -622,6 +638,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>   			*done = 0;
>   	}
>   
> +	pblk_get_packed_meta(pblk, rqd);
>   	for (i = 0; i < rqd->nr_ppas; i++) {
>   		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>   							meta_list, i)->lba);
> diff --git a/drivers/lightnvm/pblk-sysfs.c b/drivers/lightnvm/pblk-sysfs.c
> index b0e5e93a9d5f..aa7b4164ce9e 100644
> --- a/drivers/lightnvm/pblk-sysfs.c
> +++ b/drivers/lightnvm/pblk-sysfs.c
> @@ -473,6 +473,13 @@ static ssize_t pblk_sysfs_set_sec_per_write(struct pblk *pblk,
>   	if (kstrtouint(page, 0, &sec_per_write))
>   		return -EINVAL;
>   
> +	if (!pblk_is_oob_meta_supported(pblk)) {
> +		/* For packed metadata case it is
> +		 * not allowed to change sec_per_write.
> +		 */
> +		return -EINVAL;
> +	}
> +
>   	if (sec_per_write < pblk->min_write_pgs
>   				|| sec_per_write > pblk->max_write_pgs
>   				|| sec_per_write % pblk->min_write_pgs != 0)
> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
> index 6552db35f916..bb45c7e6c375 100644
> --- a/drivers/lightnvm/pblk-write.c
> +++ b/drivers/lightnvm/pblk-write.c
> @@ -354,7 +354,7 @@ static int pblk_calc_secs_to_sync(struct pblk *pblk, unsigned int secs_avail,
>   {
>   	int secs_to_sync;
>   
> -	secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush);
> +	secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush, true);
>   
>   #ifdef CONFIG_NVM_PBLK_DEBUG
>   	if ((!secs_to_sync && secs_to_flush)
> @@ -522,6 +522,11 @@ static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
>   		return NVM_IO_ERR;
>   	}
>   
> +	/* This is the first place when we have write requests mapped
> +	 * and we can fill packed metadata with l2p mappings.
> +	 */
> +	pblk_set_packed_meta(pblk, rqd);
> +
>   	meta_line = pblk_should_submit_meta_io(pblk, rqd);
>   
>   	/* Submit data write for current data line */
> @@ -572,7 +577,7 @@ static int pblk_submit_write(struct pblk *pblk)
>   	struct bio *bio;
>   	struct nvm_rq *rqd;
>   	unsigned int secs_avail, secs_to_sync, secs_to_com;
> -	unsigned int secs_to_flush;
> +	unsigned int secs_to_flush, packed_meta_pgs;
>   	unsigned long pos;
>   	unsigned int resubmit;
>   
> @@ -608,7 +613,7 @@ static int pblk_submit_write(struct pblk *pblk)
>   			return 1;
>   
>   		secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
> -		if (!secs_to_flush && secs_avail < pblk->min_write_pgs)
> +		if (!secs_to_flush && secs_avail < pblk->min_write_pgs_data)
>   			return 1;
>   
>   		secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail,
> @@ -623,7 +628,8 @@ static int pblk_submit_write(struct pblk *pblk)
>   		pos = pblk_rb_read_commit(&pblk->rwb, secs_to_com);
>   	}
>   
> -	bio = bio_alloc(GFP_KERNEL, secs_to_sync);
> +	packed_meta_pgs = (pblk->min_write_pgs - pblk->min_write_pgs_data);
> +	bio = bio_alloc(GFP_KERNEL, secs_to_sync + packed_meta_pgs);
>   
>   	bio->bi_iter.bi_sector = 0; /* internal bio */
>   	bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 4c61ede5b207..c95ecd8bcf79 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -605,6 +605,7 @@ struct pblk {
>   	int state;			/* pblk line state */
>   
>   	int min_write_pgs; /* Minimum amount of pages required by controller */
> +	int min_write_pgs_data; /* Minimum amount of payload pages */
>   	int max_write_pgs; /* Maximum amount of pages supported by controller */
>   
>   	sector_t capacity; /* Device capacity when bad blocks are subtracted */
> @@ -798,7 +799,7 @@ void pblk_dealloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>   u64 pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>   u64 __pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>   int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
> -		   unsigned long secs_to_flush);
> +		   unsigned long secs_to_flush, bool skip_meta);
>   void pblk_up_page(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas);
>   void pblk_down_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
>   		  unsigned long *lun_bitmap);
> @@ -823,6 +824,8 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>   			  u64 *lba_list, int nr_secs);
>   void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
>   			 sector_t blba, int nr_secs);
> +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
> +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
>   
>   /*
>    * pblk user I/O write path
> 

Great work. Looks good to me.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata
  2018-06-15 22:27 ` [PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata Igor Konopko
  2018-06-16 19:24   ` Matias Bjørling
@ 2018-06-18 14:23   ` Javier Gonzalez
  2018-06-18 20:53     ` Igor Konopko
  1 sibling, 1 reply; 25+ messages in thread
From: Javier Gonzalez @ 2018-06-18 14:23 UTC (permalink / raw)
  To: Konopko, Igor J
  Cc: Matias Bjørling, linux-block, michal.sorn, marcin.dziegielewski

[-- Attachment #1: Type: text/plain, Size: 16361 bytes --]


> On 16 Jun 2018, at 00.27, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> Currently pblk assumes that size of OOB metadata on drive is always
> equal to size of pblk_sec_meta struct. This commit add helpers which will
> allow to handle different sizes of OOB metadata on drive.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-core.c     | 10 +++++----
> drivers/lightnvm/pblk-map.c      | 21 ++++++++++++-------
> drivers/lightnvm/pblk-read.c     | 45 +++++++++++++++++++++++++---------------
> drivers/lightnvm/pblk-recovery.c | 24 ++++++++++++---------
> drivers/lightnvm/pblk.h          | 29 ++++++++++++++++++++++++++
> 5 files changed, 91 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 66ab1036f2fb..8a0ac466872f 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -685,7 +685,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
> 	rqd.nr_ppas = rq_ppas;
> 
> 	if (dir == PBLK_WRITE) {
> -		struct pblk_sec_meta *meta_list = rqd.meta_list;
> +		void *meta_list = rqd.meta_list;
> 
> 		rqd.flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
> 		for (i = 0; i < rqd.nr_ppas; ) {
> @@ -693,7 +693,8 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
> 			paddr = __pblk_alloc_page(pblk, line, min);
> 			spin_unlock(&line->lock);
> 			for (j = 0; j < min; j++, i++, paddr++) {
> -				meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
> +				pblk_get_meta_at(pblk, meta_list, i)->lba =
> +					cpu_to_le64(ADDR_EMPTY);
> 				rqd.ppa_list[i] =
> 					addr_to_gen_ppa(pblk, paddr, id);
> 			}
> @@ -825,14 +826,15 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
> 	rqd.nr_ppas = lm->smeta_sec;
> 
> 	for (i = 0; i < lm->smeta_sec; i++, paddr++) {
> -		struct pblk_sec_meta *meta_list = rqd.meta_list;
> +		void *meta_list = rqd.meta_list;
> 
> 		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
> 
> 		if (dir == PBLK_WRITE) {
> 			__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
> 
> -			meta_list[i].lba = lba_list[paddr] = addr_empty;
> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
> +				lba_list[paddr] = addr_empty;
> 		}
> 	}
> 
> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
> index 953ca31dda68..92c40b546c4e 100644
> --- a/drivers/lightnvm/pblk-map.c
> +++ b/drivers/lightnvm/pblk-map.c
> @@ -21,7 +21,7 @@
> static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
> 			      struct ppa_addr *ppa_list,
> 			      unsigned long *lun_bitmap,
> -			      struct pblk_sec_meta *meta_list,
> +			      void *meta_list,
> 			      unsigned int valid_secs)
> {
> 	struct pblk_line *line = pblk_line_get_data(pblk);
> @@ -67,14 +67,17 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
> 			kref_get(&line->ref);
> 			w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
> 			w_ctx->ppa = ppa_list[i];
> -			meta_list[i].lba = cpu_to_le64(w_ctx->lba);
> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
> +							cpu_to_le64(w_ctx->lba);
> 			lba_list[paddr] = cpu_to_le64(w_ctx->lba);
> 			if (lba_list[paddr] != addr_empty)
> 				line->nr_valid_lbas++;
> 			else
> 				atomic64_inc(&pblk->pad_wa);
> 		} else {
> -			lba_list[paddr] = meta_list[i].lba = addr_empty;
> +			lba_list[paddr] =
> +				pblk_get_meta_at(pblk, meta_list, i)->lba =
> +					addr_empty;
> 			__pblk_map_invalidate(pblk, line, paddr);
> 		}
> 	}
> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
> 		 unsigned long *lun_bitmap, unsigned int valid_secs,
> 		 unsigned int off)
> {
> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
> +	void *meta_list = rqd->meta_list;
> 	unsigned int map_secs;
> 	int min = pblk->min_write_pgs;
> 	int i;
> @@ -95,7 +98,9 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
> 	for (i = off; i < rqd->nr_ppas; i += min) {
> 		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
> 		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
> -					lun_bitmap, &meta_list[i], map_secs)) {
> +					lun_bitmap,
> +					pblk_get_meta_at(pblk, meta_list, i),
> +					map_secs)) {
> 			bio_put(rqd->bio);
> 			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
> 			pblk_pipeline_stop(pblk);
> @@ -111,7 +116,7 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
> 	struct nvm_tgt_dev *dev = pblk->dev;
> 	struct nvm_geo *geo = &dev->geo;
> 	struct pblk_line_meta *lm = &pblk->lm;
> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
> +	void *meta_list = rqd->meta_list;
> 	struct pblk_line *e_line, *d_line;
> 	unsigned int map_secs;
> 	int min = pblk->min_write_pgs;
> @@ -120,7 +125,9 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
> 	for (i = 0; i < rqd->nr_ppas; i += min) {
> 		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
> 		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
> -					lun_bitmap, &meta_list[i], map_secs)) {
> +					lun_bitmap,
> +					pblk_get_meta_at(pblk, meta_list, i),
> +					map_secs)) {
> 			bio_put(rqd->bio);
> 			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
> 			pblk_pipeline_stop(pblk);
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 6e93c489ce57..81cf79ea2dc6 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -42,7 +42,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
> 				 struct bio *bio, sector_t blba,
> 				 unsigned long *read_bitmap)
> {
> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
> +	void *meta_list = rqd->meta_list;
> 	struct ppa_addr ppas[PBLK_MAX_REQ_ADDRS];
> 	int nr_secs = rqd->nr_ppas;
> 	bool advanced_bio = false;
> @@ -57,7 +57,8 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
> retry:
> 		if (pblk_ppa_empty(p)) {
> 			WARN_ON(test_and_set_bit(i, read_bitmap));
> -			meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
> +					cpu_to_le64(ADDR_EMPTY);
> 
> 			if (unlikely(!advanced_bio)) {
> 				bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
> @@ -77,7 +78,8 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
> 				goto retry;
> 			}
> 			WARN_ON(test_and_set_bit(i, read_bitmap));
> -			meta_list[i].lba = cpu_to_le64(lba);
> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
> +					cpu_to_le64(lba);
> 			advanced_bio = true;
> #ifdef CONFIG_NVM_PBLK_DEBUG
> 			atomic_long_inc(&pblk->cache_reads);
> @@ -106,13 +108,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
> 				sector_t blba)
> {
> -	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
> +	void *meta_lba_list = rqd->meta_list;
> 	int nr_lbas = rqd->nr_ppas;
> 	int i;
> 
> -	for (i = 0; i < nr_lbas; i++) {
> -		u64 lba = le64_to_cpu(meta_lba_list[i].lba);
> +	if (!pblk_is_oob_meta_supported(pblk))
> +		return;
> 
> +	for (i = 0; i < nr_lbas; i++) {
> +		u64 lba = le64_to_cpu(
> +				pblk_get_meta_at(pblk, meta_lba_list, i)->lba);
> 		if (lba == ADDR_EMPTY)
> 			continue;
> 
> @@ -136,17 +141,21 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
> 				 u64 *lba_list, int nr_lbas)
> {
> -	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
> +	void *meta_lba_list = rqd->meta_list;
> 	int i, j;
> 
> -	for (i = 0, j = 0; i < nr_lbas; i++) {
> +	if (!pblk_is_oob_meta_supported(pblk))
> +		return;
> +
> +	for (i = 0; j = 0, i < nr_lbas; i++) {
> 		u64 lba = lba_list[i];
> 		u64 meta_lba;
> 
> 		if (lba == ADDR_EMPTY)
> 			continue;
> 
> -		meta_lba = le64_to_cpu(meta_lba_list[j].lba);
> +		meta_lba = le64_to_cpu(
> +				pblk_get_meta_at(pblk, meta_lba_list, i)->lba);
> 
> 		if (lba != meta_lba) {
> #ifdef CONFIG_NVM_PBLK_DEBUG
> @@ -235,7 +244,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
> 			     struct bio *orig_bio, unsigned int bio_init_idx,
> 			     unsigned long *read_bitmap)
> {
> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
> +	void *meta_list = rqd->meta_list;
> 	struct bio *new_bio;
> 	struct bio_vec src_bv, dst_bv;
> 	void *ppa_ptr = NULL;
> @@ -261,7 +270,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
> 	}
> 
> 	for (i = 0; i < nr_secs; i++)
> -		lba_list_mem[i] = meta_list[i].lba;
> +		lba_list_mem[i] = pblk_get_meta_at(pblk, meta_list, i)->lba;
> 
> 	new_bio->bi_iter.bi_sector = 0; /* internal bio */
> 	bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
> @@ -300,8 +309,8 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
> 	}
> 
> 	for (i = 0; i < nr_secs; i++) {
> -		lba_list_media[i] = meta_list[i].lba;
> -		meta_list[i].lba = lba_list_mem[i];
> +		lba_list_media[i] = pblk_get_meta_at(pblk, meta_list, i)->lba;
> +		pblk_get_meta_at(pblk, meta_list, i)->lba = lba_list_mem[i];
> 	}
> 
> 	/* Fill the holes in the original bio */
> @@ -313,7 +322,8 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
> 
> 		kref_put(&line->ref, pblk_line_put);
> 
> -		meta_list[hole].lba = lba_list_media[i];
> +		pblk_get_meta_at(pblk, meta_list, hole)->lba =
> +						lba_list_media[i];
> 
> 		src_bv = new_bio->bi_io_vec[i++];
> 		dst_bv = orig_bio->bi_io_vec[bio_init_idx + hole];
> @@ -354,7 +364,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
> 			 sector_t lba, unsigned long *read_bitmap)
> {
> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
> +	void *meta_list = rqd->meta_list;
> 	struct ppa_addr ppa;
> 
> 	pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
> @@ -366,7 +376,8 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
> retry:
> 	if (pblk_ppa_empty(ppa)) {
> 		WARN_ON(test_and_set_bit(0, read_bitmap));
> -		meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
> +		pblk_get_meta_at(pblk, meta_list, 0)->lba =
> +						cpu_to_le64(ADDR_EMPTY);
> 		return;
> 	}
> 
> @@ -380,7 +391,7 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
> 		}
> 
> 		WARN_ON(test_and_set_bit(0, read_bitmap));
> -		meta_list[0].lba = cpu_to_le64(lba);
> +		pblk_get_meta_at(pblk, meta_list, 0)->lba = cpu_to_le64(lba);
> 
> #ifdef CONFIG_NVM_PBLK_DEBUG
> 		atomic_long_inc(&pblk->cache_reads);
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index b1a91cb3ca4d..0007e8011476 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -98,7 +98,7 @@ static int pblk_calc_sec_in_line(struct pblk *pblk, struct pblk_line *line)
> 
> struct pblk_recov_alloc {
> 	struct ppa_addr *ppa_list;
> -	struct pblk_sec_meta *meta_list;
> +	void *meta_list;
> 	struct nvm_rq *rqd;
> 	void *data;
> 	dma_addr_t dma_ppa_list;
> @@ -111,7 +111,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
> 	struct nvm_tgt_dev *dev = pblk->dev;
> 	struct nvm_geo *geo = &dev->geo;
> 	struct ppa_addr *ppa_list;
> -	struct pblk_sec_meta *meta_list;
> +	void *meta_list;
> 	struct nvm_rq *rqd;
> 	struct bio *bio;
> 	void *data;
> @@ -199,7 +199,8 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
> 	}
> 
> 	for (i = 0; i < rqd->nr_ppas; i++) {
> -		u64 lba = le64_to_cpu(meta_list[i].lba);
> +		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
> +							meta_list, i)->lba);
> 
> 		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
> 			continue;
> @@ -240,7 +241,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
> 	struct nvm_tgt_dev *dev = pblk->dev;
> 	struct nvm_geo *geo = &dev->geo;
> 	struct ppa_addr *ppa_list;
> -	struct pblk_sec_meta *meta_list;
> +	void *meta_list;
> 	struct pblk_pad_rq *pad_rq;
> 	struct nvm_rq *rqd;
> 	struct bio *bio;
> @@ -332,7 +333,8 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
> 			dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
> 
> 			pblk_map_invalidate(pblk, dev_ppa);
> -			lba_list[w_ptr] = meta_list[i].lba = addr_empty;
> +			lba_list[w_ptr] = pblk_get_meta_at(pblk,
> +						meta_list, i)->lba = addr_empty;
> 			rqd->ppa_list[i] = dev_ppa;
> 		}
> 	}
> @@ -389,7 +391,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
> 	struct nvm_tgt_dev *dev = pblk->dev;
> 	struct nvm_geo *geo = &dev->geo;
> 	struct ppa_addr *ppa_list;
> -	struct pblk_sec_meta *meta_list;
> +	void *meta_list;
> 	struct nvm_rq *rqd;
> 	struct bio *bio;
> 	void *data;
> @@ -473,7 +475,8 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
> 	if (!rec_round++ && !rqd->error) {
> 		rec_round = 0;
> 		for (i = 0; i < rqd->nr_ppas; i++, r_ptr++) {
> -			u64 lba = le64_to_cpu(meta_list[i].lba);
> +			u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
> +							meta_list, i)->lba);
> 
> 			if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
> 				continue;
> @@ -523,7 +526,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
> 	struct nvm_tgt_dev *dev = pblk->dev;
> 	struct nvm_geo *geo = &dev->geo;
> 	struct ppa_addr *ppa_list;
> -	struct pblk_sec_meta *meta_list;
> +	void *meta_list;
> 	struct nvm_rq *rqd;
> 	struct bio *bio;
> 	void *data;
> @@ -619,7 +622,8 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
> 	}
> 
> 	for (i = 0; i < rqd->nr_ppas; i++) {
> -		u64 lba = le64_to_cpu(meta_list[i].lba);
> +		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
> +							meta_list, i)->lba);
> 
> 		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
> 			continue;
> @@ -641,7 +645,7 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
> 	struct nvm_geo *geo = &dev->geo;
> 	struct nvm_rq *rqd;
> 	struct ppa_addr *ppa_list;
> -	struct pblk_sec_meta *meta_list;
> +	void *meta_list;
> 	struct pblk_recov_alloc p;
> 	void *data;
> 	dma_addr_t dma_ppa_list, dma_meta_list;
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index c072955d72c2..f82c3a0b0de5 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -1420,4 +1420,33 @@ static inline void pblk_setup_uuid(struct pblk *pblk)
> 	uuid_le_gen(&uuid);
> 	memcpy(pblk->instance_uuid, uuid.b, 16);
> }
> +
> +static inline int pblk_is_oob_meta_supported(struct pblk *pblk)
> +{
> +	struct nvm_tgt_dev *dev = pblk->dev;
> +	struct nvm_geo *geo = &dev->geo;
> +
> +	/* Pblk uses OOB meta to store LBA of given physical sector.
> +	 * The LBA is eventually used in recovery mode and/or for handling
> +	 * telemetry events (e.g., relocate sector).
> +	 */
> +
> +	return (geo->sos >= sizeof(struct pblk_sec_meta));
> +}
> +

In principle, we need 8 bytes for storing the lba in pblk, if the OOB
area is not big enough, I'm ok with making this optional, but if so,
there should be a comment that some of the pfail recovery will not be
supported.

If you are not facing this problem, then I would suggest failing pblk
creation in case the metadata size is not big store the lba.

> +static inline struct pblk_sec_meta *pblk_get_meta_at(struct pblk *pblk,
> +						void *meta_ptr, int index)
> +{
> +	struct nvm_tgt_dev *dev = pblk->dev;
> +	struct nvm_geo *geo = &dev->geo;
> +
> +	if (pblk_is_oob_meta_supported(pblk))
> +		/* We need to have oob meta layout the same as on drive */
> +		return meta_ptr + geo->sos * index;
> +	else
> +		/* We can create virtual oob meta layout since drive does
> +		 * not have real oob metadata
> +		 */
> +		return meta_ptr + sizeof(struct pblk_sec_meta) * index;
> +}

The dereference that this helper forces to is quite ugly. Could you use a
helper to set the value and another to get it instead?

> #endif /* PBLK_H_ */
> --
> 2.14.3

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] lightnvm: pblk: Remove resv field for sec meta
  2018-06-16 19:27   ` Matias Bjørling
@ 2018-06-18 14:25     ` Javier Gonzalez
  2018-06-18 20:50       ` Igor Konopko
  0 siblings, 1 reply; 25+ messages in thread
From: Javier Gonzalez @ 2018-06-18 14:25 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Konopko, Igor J, linux-block, michal.sorn, marcin.dziegielewski

[-- Attachment #1: Type: text/plain, Size: 1059 bytes --]

> On 16 Jun 2018, at 21.27, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> On 06/16/2018 12:27 AM, Igor Konopko wrote:
>> Since we have flexible size of pblk_sec_meta
>> which depends on drive metadata size we can
>> remove not needed reserved field from that
>> structure
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>>  drivers/lightnvm/pblk.h | 1 -
>>  1 file changed, 1 deletion(-)
>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>> index f82c3a0b0de5..27658dc6fc1a 100644
>> --- a/drivers/lightnvm/pblk.h
>> +++ b/drivers/lightnvm/pblk.h
>> @@ -82,7 +82,6 @@ enum {
>>  };
>>    struct pblk_sec_meta {
>> -	u64 reserved;
>>  	__le64 lba;
>>  };
>> 
> 
> Looks good to me. Javier may have some comment on this, since it is
> not completely obvious from the code why that reserved attribute is
> there. I do like the change to go in, as it needlessly extends the
> requirement from 8 to 16bytes.

Looks good to me. Maybe marge this patch with 1/5? It was actually a
comment I added to it.

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] lightnvm: pblk: Disable interleaved metadata in pblk
  2018-06-16 19:38   ` Matias Bjørling
@ 2018-06-18 14:29     ` Javier Gonzalez
  2018-06-18 20:51       ` Igor Konopko
  2018-06-19  8:24       ` Matias Bjørling
  0 siblings, 2 replies; 25+ messages in thread
From: Javier Gonzalez @ 2018-06-18 14:29 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Konopko, Igor J, linux-block, michal.sorn, marcin.dziegielewski

[-- Attachment #1: Type: text/plain, Size: 2783 bytes --]

> On 16 Jun 2018, at 21.38, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> On 06/16/2018 12:27 AM, Igor Konopko wrote:
>> Currently pblk and lightnvm does only check for size
>> of OOB metadata and does not care wheather this meta
>> is located in separate buffer or is interleaved with
>> data in single buffer.
>> In reality only the first scenario is supported, where
>> second mode will break pblk functionality during any
>> IO operation.
>> The goal of this patch is to block creation of pblk
>> devices in case of interleaved metadata
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>>  drivers/lightnvm/pblk-init.c | 6 ++++++
>>  drivers/nvme/host/lightnvm.c | 1 +
>>  include/linux/lightnvm.h     | 1 +
>>  3 files changed, 8 insertions(+)
>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>> index 5eb641da46ed..483a6d479e7d 100644
>> --- a/drivers/lightnvm/pblk-init.c
>> +++ b/drivers/lightnvm/pblk-init.c
>> @@ -1238,6 +1238,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
>>  		return ERR_PTR(-EINVAL);
>>  	}
>>  +	if (geo->ext) {
>> +		pr_err("pblk: extended (interleaved) metadata in data buffer"
>> +			" not supported\n");
>> +		return ERR_PTR(-EINVAL);
>> +	}
>> +
>>  	pblk = kzalloc(sizeof(struct pblk), GFP_KERNEL);
>>  	if (!pblk)
>>  		return ERR_PTR(-ENOMEM);
>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>> index 670478abc754..872ab854ccf5 100644
>> --- a/drivers/nvme/host/lightnvm.c
>> +++ b/drivers/nvme/host/lightnvm.c
>> @@ -979,6 +979,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
>>    	geo->csecs = 1 << ns->lba_shift;
>>  	geo->sos = ns->ms;
>> +	geo->ext = ns->ext;
>>  }
>>    int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
>> index 72a55d71917e..b13e64e2112f 100644
>> --- a/include/linux/lightnvm.h
>> +++ b/include/linux/lightnvm.h
>> @@ -350,6 +350,7 @@ struct nvm_geo {
>>  	u32	clba;		/* sectors per chunk */
>>  	u16	csecs;		/* sector size */
>>  	u16	sos;		/* out-of-band area size */
>> +	u16	ext;		/* metadata in extended data buffer */
>>    	/* device write constrains */
>>  	u32	ws_min;		/* minimum write size */
> 
> I think bool type would be better here. Can it be placesd a bit down, just over the 1.2 stuff?
> 
> Also, feel free to fix up the checkpatch stuff in patch 1 & 3 & 5.

Apart from Matias' comments, it looks good to me.

Traditionally, we have separated subsystem and target patches to make
sure there is no coupling between pblk and lightnvm, but if Matias is ok
with starting having patches covering all at once, then good for me too.

Javier


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] lightnvm: pblk: Remove resv field for sec meta
  2018-06-18 14:25     ` Javier Gonzalez
@ 2018-06-18 20:50       ` Igor Konopko
  0 siblings, 0 replies; 25+ messages in thread
From: Igor Konopko @ 2018-06-18 20:50 UTC (permalink / raw)
  To: Javier Gonzalez, Matias Bjørling
  Cc: linux-block, michal.sorn, marcin.dziegielewski



On 18.06.2018 07:25, Javier Gonzalez wrote:
>> On 16 Jun 2018, at 21.27, Matias Bjørling <mb@lightnvm.io> wrote:
>>
>> On 06/16/2018 12:27 AM, Igor Konopko wrote:
>>> Since we have flexible size of pblk_sec_meta
>>> which depends on drive metadata size we can
>>> remove not needed reserved field from that
>>> structure
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>>   drivers/lightnvm/pblk.h | 1 -
>>>   1 file changed, 1 deletion(-)
>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>> index f82c3a0b0de5..27658dc6fc1a 100644
>>> --- a/drivers/lightnvm/pblk.h
>>> +++ b/drivers/lightnvm/pblk.h
>>> @@ -82,7 +82,6 @@ enum {
>>>   };
>>>     struct pblk_sec_meta {
>>> -	u64 reserved;
>>>   	__le64 lba;
>>>   };
>>>
>>
>> Looks good to me. Javier may have some comment on this, since it is
>> not completely obvious from the code why that reserved attribute is
>> there. I do like the change to go in, as it needlessly extends the
>> requirement from 8 to 16bytes.
> 
> Looks good to me. Maybe marge this patch with 1/5? It was actually a
> comment I added to it.
> 

Sure, can merge it.

Igor

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] lightnvm: pblk: Disable interleaved metadata in pblk
  2018-06-18 14:29     ` Javier Gonzalez
@ 2018-06-18 20:51       ` Igor Konopko
  2018-06-19  8:24       ` Matias Bjørling
  1 sibling, 0 replies; 25+ messages in thread
From: Igor Konopko @ 2018-06-18 20:51 UTC (permalink / raw)
  To: Javier Gonzalez, Matias Bjørling
  Cc: linux-block, michal.sorn, marcin.dziegielewski



On 18.06.2018 07:29, Javier Gonzalez wrote:
>> On 16 Jun 2018, at 21.38, Matias Bjørling <mb@lightnvm.io> wrote:
>>
>> On 06/16/2018 12:27 AM, Igor Konopko wrote:
>>> Currently pblk and lightnvm does only check for size
>>> of OOB metadata and does not care wheather this meta
>>> is located in separate buffer or is interleaved with
>>> data in single buffer.
>>> In reality only the first scenario is supported, where
>>> second mode will break pblk functionality during any
>>> IO operation.
>>> The goal of this patch is to block creation of pblk
>>> devices in case of interleaved metadata
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>>   drivers/lightnvm/pblk-init.c | 6 ++++++
>>>   drivers/nvme/host/lightnvm.c | 1 +
>>>   include/linux/lightnvm.h     | 1 +
>>>   3 files changed, 8 insertions(+)
>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>> index 5eb641da46ed..483a6d479e7d 100644
>>> --- a/drivers/lightnvm/pblk-init.c
>>> +++ b/drivers/lightnvm/pblk-init.c
>>> @@ -1238,6 +1238,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
>>>   		return ERR_PTR(-EINVAL);
>>>   	}
>>>   +	if (geo->ext) {
>>> +		pr_err("pblk: extended (interleaved) metadata in data buffer"
>>> +			" not supported\n");
>>> +		return ERR_PTR(-EINVAL);
>>> +	}
>>> +
>>>   	pblk = kzalloc(sizeof(struct pblk), GFP_KERNEL);
>>>   	if (!pblk)
>>>   		return ERR_PTR(-ENOMEM);
>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>> index 670478abc754..872ab854ccf5 100644
>>> --- a/drivers/nvme/host/lightnvm.c
>>> +++ b/drivers/nvme/host/lightnvm.c
>>> @@ -979,6 +979,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
>>>     	geo->csecs = 1 << ns->lba_shift;
>>>   	geo->sos = ns->ms;
>>> +	geo->ext = ns->ext;
>>>   }
>>>     int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
>>> index 72a55d71917e..b13e64e2112f 100644
>>> --- a/include/linux/lightnvm.h
>>> +++ b/include/linux/lightnvm.h
>>> @@ -350,6 +350,7 @@ struct nvm_geo {
>>>   	u32	clba;		/* sectors per chunk */
>>>   	u16	csecs;		/* sector size */
>>>   	u16	sos;		/* out-of-band area size */
>>> +	u16	ext;		/* metadata in extended data buffer */
>>>     	/* device write constrains */
>>>   	u32	ws_min;		/* minimum write size */
>>
>> I think bool type would be better here. Can it be placesd a bit down, just over the 1.2 stuff?
>>
>> Also, feel free to fix up the checkpatch stuff in patch 1 & 3 & 5.
> 
> Apart from Matias' comments, it looks good to me.
> 
> Traditionally, we have separated subsystem and target patches to make
> sure there is no coupling between pblk and lightnvm, but if Matias is ok
> with starting having patches covering all at once, then good for me too.
> 
> Javier
> 

Will fix above comments and resend.

Igor

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata
  2018-06-18 14:23   ` Javier Gonzalez
@ 2018-06-18 20:53     ` Igor Konopko
  2018-06-19  7:44       ` Javier Gonzalez
  0 siblings, 1 reply; 25+ messages in thread
From: Igor Konopko @ 2018-06-18 20:53 UTC (permalink / raw)
  To: Javier Gonzalez
  Cc: Matias Bjørling, linux-block, michal.sorn, marcin.dziegielewski



On 18.06.2018 07:23, Javier Gonzalez wrote:
> 
>> On 16 Jun 2018, at 00.27, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>> Currently pblk assumes that size of OOB metadata on drive is always
>> equal to size of pblk_sec_meta struct. This commit add helpers which will
>> allow to handle different sizes of OOB metadata on drive.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>> drivers/lightnvm/pblk-core.c     | 10 +++++----
>> drivers/lightnvm/pblk-map.c      | 21 ++++++++++++-------
>> drivers/lightnvm/pblk-read.c     | 45 +++++++++++++++++++++++++---------------
>> drivers/lightnvm/pblk-recovery.c | 24 ++++++++++++---------
>> drivers/lightnvm/pblk.h          | 29 ++++++++++++++++++++++++++
>> 5 files changed, 91 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index 66ab1036f2fb..8a0ac466872f 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -685,7 +685,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>> 	rqd.nr_ppas = rq_ppas;
>>
>> 	if (dir == PBLK_WRITE) {
>> -		struct pblk_sec_meta *meta_list = rqd.meta_list;
>> +		void *meta_list = rqd.meta_list;
>>
>> 		rqd.flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
>> 		for (i = 0; i < rqd.nr_ppas; ) {
>> @@ -693,7 +693,8 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>> 			paddr = __pblk_alloc_page(pblk, line, min);
>> 			spin_unlock(&line->lock);
>> 			for (j = 0; j < min; j++, i++, paddr++) {
>> -				meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
>> +				pblk_get_meta_at(pblk, meta_list, i)->lba =
>> +					cpu_to_le64(ADDR_EMPTY);
>> 				rqd.ppa_list[i] =
>> 					addr_to_gen_ppa(pblk, paddr, id);
>> 			}
>> @@ -825,14 +826,15 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
>> 	rqd.nr_ppas = lm->smeta_sec;
>>
>> 	for (i = 0; i < lm->smeta_sec; i++, paddr++) {
>> -		struct pblk_sec_meta *meta_list = rqd.meta_list;
>> +		void *meta_list = rqd.meta_list;
>>
>> 		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>
>> 		if (dir == PBLK_WRITE) {
>> 			__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
>>
>> -			meta_list[i].lba = lba_list[paddr] = addr_empty;
>> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
>> +				lba_list[paddr] = addr_empty;
>> 		}
>> 	}
>>
>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>> index 953ca31dda68..92c40b546c4e 100644
>> --- a/drivers/lightnvm/pblk-map.c
>> +++ b/drivers/lightnvm/pblk-map.c
>> @@ -21,7 +21,7 @@
>> static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>> 			      struct ppa_addr *ppa_list,
>> 			      unsigned long *lun_bitmap,
>> -			      struct pblk_sec_meta *meta_list,
>> +			      void *meta_list,
>> 			      unsigned int valid_secs)
>> {
>> 	struct pblk_line *line = pblk_line_get_data(pblk);
>> @@ -67,14 +67,17 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>> 			kref_get(&line->ref);
>> 			w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
>> 			w_ctx->ppa = ppa_list[i];
>> -			meta_list[i].lba = cpu_to_le64(w_ctx->lba);
>> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
>> +							cpu_to_le64(w_ctx->lba);
>> 			lba_list[paddr] = cpu_to_le64(w_ctx->lba);
>> 			if (lba_list[paddr] != addr_empty)
>> 				line->nr_valid_lbas++;
>> 			else
>> 				atomic64_inc(&pblk->pad_wa);
>> 		} else {
>> -			lba_list[paddr] = meta_list[i].lba = addr_empty;
>> +			lba_list[paddr] =
>> +				pblk_get_meta_at(pblk, meta_list, i)->lba =
>> +					addr_empty;
>> 			__pblk_map_invalidate(pblk, line, paddr);
>> 		}
>> 	}
>> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>> 		 unsigned long *lun_bitmap, unsigned int valid_secs,
>> 		 unsigned int off)
>> {
>> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
>> +	void *meta_list = rqd->meta_list;
>> 	unsigned int map_secs;
>> 	int min = pblk->min_write_pgs;
>> 	int i;
>> @@ -95,7 +98,9 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>> 	for (i = off; i < rqd->nr_ppas; i += min) {
>> 		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>> 		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
>> -					lun_bitmap, &meta_list[i], map_secs)) {
>> +					lun_bitmap,
>> +					pblk_get_meta_at(pblk, meta_list, i),
>> +					map_secs)) {
>> 			bio_put(rqd->bio);
>> 			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>> 			pblk_pipeline_stop(pblk);
>> @@ -111,7 +116,7 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>> 	struct nvm_tgt_dev *dev = pblk->dev;
>> 	struct nvm_geo *geo = &dev->geo;
>> 	struct pblk_line_meta *lm = &pblk->lm;
>> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
>> +	void *meta_list = rqd->meta_list;
>> 	struct pblk_line *e_line, *d_line;
>> 	unsigned int map_secs;
>> 	int min = pblk->min_write_pgs;
>> @@ -120,7 +125,9 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>> 	for (i = 0; i < rqd->nr_ppas; i += min) {
>> 		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>> 		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
>> -					lun_bitmap, &meta_list[i], map_secs)) {
>> +					lun_bitmap,
>> +					pblk_get_meta_at(pblk, meta_list, i),
>> +					map_secs)) {
>> 			bio_put(rqd->bio);
>> 			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>> 			pblk_pipeline_stop(pblk);
>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>> index 6e93c489ce57..81cf79ea2dc6 100644
>> --- a/drivers/lightnvm/pblk-read.c
>> +++ b/drivers/lightnvm/pblk-read.c
>> @@ -42,7 +42,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>> 				 struct bio *bio, sector_t blba,
>> 				 unsigned long *read_bitmap)
>> {
>> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
>> +	void *meta_list = rqd->meta_list;
>> 	struct ppa_addr ppas[PBLK_MAX_REQ_ADDRS];
>> 	int nr_secs = rqd->nr_ppas;
>> 	bool advanced_bio = false;
>> @@ -57,7 +57,8 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>> retry:
>> 		if (pblk_ppa_empty(p)) {
>> 			WARN_ON(test_and_set_bit(i, read_bitmap));
>> -			meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
>> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
>> +					cpu_to_le64(ADDR_EMPTY);
>>
>> 			if (unlikely(!advanced_bio)) {
>> 				bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
>> @@ -77,7 +78,8 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>> 				goto retry;
>> 			}
>> 			WARN_ON(test_and_set_bit(i, read_bitmap));
>> -			meta_list[i].lba = cpu_to_le64(lba);
>> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
>> +					cpu_to_le64(lba);
>> 			advanced_bio = true;
>> #ifdef CONFIG_NVM_PBLK_DEBUG
>> 			atomic_long_inc(&pblk->cache_reads);
>> @@ -106,13 +108,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>> 				sector_t blba)
>> {
>> -	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>> +	void *meta_lba_list = rqd->meta_list;
>> 	int nr_lbas = rqd->nr_ppas;
>> 	int i;
>>
>> -	for (i = 0; i < nr_lbas; i++) {
>> -		u64 lba = le64_to_cpu(meta_lba_list[i].lba);
>> +	if (!pblk_is_oob_meta_supported(pblk))
>> +		return;
>>
>> +	for (i = 0; i < nr_lbas; i++) {
>> +		u64 lba = le64_to_cpu(
>> +				pblk_get_meta_at(pblk, meta_lba_list, i)->lba);
>> 		if (lba == ADDR_EMPTY)
>> 			continue;
>>
>> @@ -136,17 +141,21 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>> 				 u64 *lba_list, int nr_lbas)
>> {
>> -	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>> +	void *meta_lba_list = rqd->meta_list;
>> 	int i, j;
>>
>> -	for (i = 0, j = 0; i < nr_lbas; i++) {
>> +	if (!pblk_is_oob_meta_supported(pblk))
>> +		return;
>> +
>> +	for (i = 0; j = 0, i < nr_lbas; i++) {
>> 		u64 lba = lba_list[i];
>> 		u64 meta_lba;
>>
>> 		if (lba == ADDR_EMPTY)
>> 			continue;
>>
>> -		meta_lba = le64_to_cpu(meta_lba_list[j].lba);
>> +		meta_lba = le64_to_cpu(
>> +				pblk_get_meta_at(pblk, meta_lba_list, i)->lba);
>>
>> 		if (lba != meta_lba) {
>> #ifdef CONFIG_NVM_PBLK_DEBUG
>> @@ -235,7 +244,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>> 			     struct bio *orig_bio, unsigned int bio_init_idx,
>> 			     unsigned long *read_bitmap)
>> {
>> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
>> +	void *meta_list = rqd->meta_list;
>> 	struct bio *new_bio;
>> 	struct bio_vec src_bv, dst_bv;
>> 	void *ppa_ptr = NULL;
>> @@ -261,7 +270,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>> 	}
>>
>> 	for (i = 0; i < nr_secs; i++)
>> -		lba_list_mem[i] = meta_list[i].lba;
>> +		lba_list_mem[i] = pblk_get_meta_at(pblk, meta_list, i)->lba;
>>
>> 	new_bio->bi_iter.bi_sector = 0; /* internal bio */
>> 	bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>> @@ -300,8 +309,8 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>> 	}
>>
>> 	for (i = 0; i < nr_secs; i++) {
>> -		lba_list_media[i] = meta_list[i].lba;
>> -		meta_list[i].lba = lba_list_mem[i];
>> +		lba_list_media[i] = pblk_get_meta_at(pblk, meta_list, i)->lba;
>> +		pblk_get_meta_at(pblk, meta_list, i)->lba = lba_list_mem[i];
>> 	}
>>
>> 	/* Fill the holes in the original bio */
>> @@ -313,7 +322,8 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>
>> 		kref_put(&line->ref, pblk_line_put);
>>
>> -		meta_list[hole].lba = lba_list_media[i];
>> +		pblk_get_meta_at(pblk, meta_list, hole)->lba =
>> +						lba_list_media[i];
>>
>> 		src_bv = new_bio->bi_io_vec[i++];
>> 		dst_bv = orig_bio->bi_io_vec[bio_init_idx + hole];
>> @@ -354,7 +364,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>> 			 sector_t lba, unsigned long *read_bitmap)
>> {
>> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
>> +	void *meta_list = rqd->meta_list;
>> 	struct ppa_addr ppa;
>>
>> 	pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
>> @@ -366,7 +376,8 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>> retry:
>> 	if (pblk_ppa_empty(ppa)) {
>> 		WARN_ON(test_and_set_bit(0, read_bitmap));
>> -		meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
>> +		pblk_get_meta_at(pblk, meta_list, 0)->lba =
>> +						cpu_to_le64(ADDR_EMPTY);
>> 		return;
>> 	}
>>
>> @@ -380,7 +391,7 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>> 		}
>>
>> 		WARN_ON(test_and_set_bit(0, read_bitmap));
>> -		meta_list[0].lba = cpu_to_le64(lba);
>> +		pblk_get_meta_at(pblk, meta_list, 0)->lba = cpu_to_le64(lba);
>>
>> #ifdef CONFIG_NVM_PBLK_DEBUG
>> 		atomic_long_inc(&pblk->cache_reads);
>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>> index b1a91cb3ca4d..0007e8011476 100644
>> --- a/drivers/lightnvm/pblk-recovery.c
>> +++ b/drivers/lightnvm/pblk-recovery.c
>> @@ -98,7 +98,7 @@ static int pblk_calc_sec_in_line(struct pblk *pblk, struct pblk_line *line)
>>
>> struct pblk_recov_alloc {
>> 	struct ppa_addr *ppa_list;
>> -	struct pblk_sec_meta *meta_list;
>> +	void *meta_list;
>> 	struct nvm_rq *rqd;
>> 	void *data;
>> 	dma_addr_t dma_ppa_list;
>> @@ -111,7 +111,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>> 	struct nvm_tgt_dev *dev = pblk->dev;
>> 	struct nvm_geo *geo = &dev->geo;
>> 	struct ppa_addr *ppa_list;
>> -	struct pblk_sec_meta *meta_list;
>> +	void *meta_list;
>> 	struct nvm_rq *rqd;
>> 	struct bio *bio;
>> 	void *data;
>> @@ -199,7 +199,8 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>> 	}
>>
>> 	for (i = 0; i < rqd->nr_ppas; i++) {
>> -		u64 lba = le64_to_cpu(meta_list[i].lba);
>> +		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>> +							meta_list, i)->lba);
>>
>> 		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
>> 			continue;
>> @@ -240,7 +241,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>> 	struct nvm_tgt_dev *dev = pblk->dev;
>> 	struct nvm_geo *geo = &dev->geo;
>> 	struct ppa_addr *ppa_list;
>> -	struct pblk_sec_meta *meta_list;
>> +	void *meta_list;
>> 	struct pblk_pad_rq *pad_rq;
>> 	struct nvm_rq *rqd;
>> 	struct bio *bio;
>> @@ -332,7 +333,8 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>> 			dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
>>
>> 			pblk_map_invalidate(pblk, dev_ppa);
>> -			lba_list[w_ptr] = meta_list[i].lba = addr_empty;
>> +			lba_list[w_ptr] = pblk_get_meta_at(pblk,
>> +						meta_list, i)->lba = addr_empty;
>> 			rqd->ppa_list[i] = dev_ppa;
>> 		}
>> 	}
>> @@ -389,7 +391,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>> 	struct nvm_tgt_dev *dev = pblk->dev;
>> 	struct nvm_geo *geo = &dev->geo;
>> 	struct ppa_addr *ppa_list;
>> -	struct pblk_sec_meta *meta_list;
>> +	void *meta_list;
>> 	struct nvm_rq *rqd;
>> 	struct bio *bio;
>> 	void *data;
>> @@ -473,7 +475,8 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>> 	if (!rec_round++ && !rqd->error) {
>> 		rec_round = 0;
>> 		for (i = 0; i < rqd->nr_ppas; i++, r_ptr++) {
>> -			u64 lba = le64_to_cpu(meta_list[i].lba);
>> +			u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>> +							meta_list, i)->lba);
>>
>> 			if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
>> 				continue;
>> @@ -523,7 +526,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>> 	struct nvm_tgt_dev *dev = pblk->dev;
>> 	struct nvm_geo *geo = &dev->geo;
>> 	struct ppa_addr *ppa_list;
>> -	struct pblk_sec_meta *meta_list;
>> +	void *meta_list;
>> 	struct nvm_rq *rqd;
>> 	struct bio *bio;
>> 	void *data;
>> @@ -619,7 +622,8 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>> 	}
>>
>> 	for (i = 0; i < rqd->nr_ppas; i++) {
>> -		u64 lba = le64_to_cpu(meta_list[i].lba);
>> +		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>> +							meta_list, i)->lba);
>>
>> 		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
>> 			continue;
>> @@ -641,7 +645,7 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>> 	struct nvm_geo *geo = &dev->geo;
>> 	struct nvm_rq *rqd;
>> 	struct ppa_addr *ppa_list;
>> -	struct pblk_sec_meta *meta_list;
>> +	void *meta_list;
>> 	struct pblk_recov_alloc p;
>> 	void *data;
>> 	dma_addr_t dma_ppa_list, dma_meta_list;
>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>> index c072955d72c2..f82c3a0b0de5 100644
>> --- a/drivers/lightnvm/pblk.h
>> +++ b/drivers/lightnvm/pblk.h
>> @@ -1420,4 +1420,33 @@ static inline void pblk_setup_uuid(struct pblk *pblk)
>> 	uuid_le_gen(&uuid);
>> 	memcpy(pblk->instance_uuid, uuid.b, 16);
>> }
>> +
>> +static inline int pblk_is_oob_meta_supported(struct pblk *pblk)
>> +{
>> +	struct nvm_tgt_dev *dev = pblk->dev;
>> +	struct nvm_geo *geo = &dev->geo;
>> +
>> +	/* Pblk uses OOB meta to store LBA of given physical sector.
>> +	 * The LBA is eventually used in recovery mode and/or for handling
>> +	 * telemetry events (e.g., relocate sector).
>> +	 */
>> +
>> +	return (geo->sos >= sizeof(struct pblk_sec_meta));
>> +}
>> +
> 
> In principle, we need 8 bytes for storing the lba in pblk, if the OOB
> area is not big enough, I'm ok with making this optional, but if so,
> there should be a comment that some of the pfail recovery will not be
> supported.
> 
> If you are not facing this problem, then I would suggest failing pblk
> creation in case the metadata size is not big store the lba.
> 

I belive that all the potential problems should be solved with 
functionality added in patch 4 (except of verifying LBA on read 
completion) - rest like dirty shutdown recovery, etc. should have the 
same level of confidence - if I'm missing sth else let me know.

>> +static inline struct pblk_sec_meta *pblk_get_meta_at(struct pblk *pblk,
>> +						void *meta_ptr, int index)
>> +{
>> +	struct nvm_tgt_dev *dev = pblk->dev;
>> +	struct nvm_geo *geo = &dev->geo;
>> +
>> +	if (pblk_is_oob_meta_supported(pblk))
>> +		/* We need to have oob meta layout the same as on drive */
>> +		return meta_ptr + geo->sos * index;
>> +	else
>> +		/* We can create virtual oob meta layout since drive does
>> +		 * not have real oob metadata
>> +		 */
>> +		return meta_ptr + sizeof(struct pblk_sec_meta) * index;
>> +}
> 
> The dereference that this helper forces to is quite ugly. Could you use a
> helper to set the value and another to get it instead?

Sure, make sense to split that.

Igor

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata
  2018-06-18 20:53     ` Igor Konopko
@ 2018-06-19  7:44       ` Javier Gonzalez
  0 siblings, 0 replies; 25+ messages in thread
From: Javier Gonzalez @ 2018-06-19  7:44 UTC (permalink / raw)
  To: Konopko, Igor J
  Cc: Matias Bjørling, linux-block, michal.sorn, marcin.dziegielewski

[-- Attachment #1: Type: text/plain, Size: 17138 bytes --]

> 
> On 18 Jun 2018, at 22.53, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> 
> 
> On 18.06.2018 07:23, Javier Gonzalez wrote:
>>> On 16 Jun 2018, at 00.27, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>> 
>>> Currently pblk assumes that size of OOB metadata on drive is always
>>> equal to size of pblk_sec_meta struct. This commit add helpers which will
>>> allow to handle different sizes of OOB metadata on drive.
>>> 
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-core.c     | 10 +++++----
>>> drivers/lightnvm/pblk-map.c      | 21 ++++++++++++-------
>>> drivers/lightnvm/pblk-read.c     | 45 +++++++++++++++++++++++++---------------
>>> drivers/lightnvm/pblk-recovery.c | 24 ++++++++++++---------
>>> drivers/lightnvm/pblk.h          | 29 ++++++++++++++++++++++++++
>>> 5 files changed, 91 insertions(+), 38 deletions(-)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>> index 66ab1036f2fb..8a0ac466872f 100644
>>> --- a/drivers/lightnvm/pblk-core.c
>>> +++ b/drivers/lightnvm/pblk-core.c
>>> @@ -685,7 +685,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>>> 	rqd.nr_ppas = rq_ppas;
>>> 
>>> 	if (dir == PBLK_WRITE) {
>>> -		struct pblk_sec_meta *meta_list = rqd.meta_list;
>>> +		void *meta_list = rqd.meta_list;
>>> 
>>> 		rqd.flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
>>> 		for (i = 0; i < rqd.nr_ppas; ) {
>>> @@ -693,7 +693,8 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>>> 			paddr = __pblk_alloc_page(pblk, line, min);
>>> 			spin_unlock(&line->lock);
>>> 			for (j = 0; j < min; j++, i++, paddr++) {
>>> -				meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
>>> +				pblk_get_meta_at(pblk, meta_list, i)->lba =
>>> +					cpu_to_le64(ADDR_EMPTY);
>>> 				rqd.ppa_list[i] =
>>> 					addr_to_gen_ppa(pblk, paddr, id);
>>> 			}
>>> @@ -825,14 +826,15 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
>>> 	rqd.nr_ppas = lm->smeta_sec;
>>> 
>>> 	for (i = 0; i < lm->smeta_sec; i++, paddr++) {
>>> -		struct pblk_sec_meta *meta_list = rqd.meta_list;
>>> +		void *meta_list = rqd.meta_list;
>>> 
>>> 		rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>> 
>>> 		if (dir == PBLK_WRITE) {
>>> 			__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
>>> 
>>> -			meta_list[i].lba = lba_list[paddr] = addr_empty;
>>> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
>>> +				lba_list[paddr] = addr_empty;
>>> 		}
>>> 	}
>>> 
>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>> index 953ca31dda68..92c40b546c4e 100644
>>> --- a/drivers/lightnvm/pblk-map.c
>>> +++ b/drivers/lightnvm/pblk-map.c
>>> @@ -21,7 +21,7 @@
>>> static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>> 			      struct ppa_addr *ppa_list,
>>> 			      unsigned long *lun_bitmap,
>>> -			      struct pblk_sec_meta *meta_list,
>>> +			      void *meta_list,
>>> 			      unsigned int valid_secs)
>>> {
>>> 	struct pblk_line *line = pblk_line_get_data(pblk);
>>> @@ -67,14 +67,17 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>> 			kref_get(&line->ref);
>>> 			w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
>>> 			w_ctx->ppa = ppa_list[i];
>>> -			meta_list[i].lba = cpu_to_le64(w_ctx->lba);
>>> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
>>> +							cpu_to_le64(w_ctx->lba);
>>> 			lba_list[paddr] = cpu_to_le64(w_ctx->lba);
>>> 			if (lba_list[paddr] != addr_empty)
>>> 				line->nr_valid_lbas++;
>>> 			else
>>> 				atomic64_inc(&pblk->pad_wa);
>>> 		} else {
>>> -			lba_list[paddr] = meta_list[i].lba = addr_empty;
>>> +			lba_list[paddr] =
>>> +				pblk_get_meta_at(pblk, meta_list, i)->lba =
>>> +					addr_empty;
>>> 			__pblk_map_invalidate(pblk, line, paddr);
>>> 		}
>>> 	}
>>> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>> 		 unsigned long *lun_bitmap, unsigned int valid_secs,
>>> 		 unsigned int off)
>>> {
>>> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
>>> +	void *meta_list = rqd->meta_list;
>>> 	unsigned int map_secs;
>>> 	int min = pblk->min_write_pgs;
>>> 	int i;
>>> @@ -95,7 +98,9 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>> 	for (i = off; i < rqd->nr_ppas; i += min) {
>>> 		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>> 		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
>>> -					lun_bitmap, &meta_list[i], map_secs)) {
>>> +					lun_bitmap,
>>> +					pblk_get_meta_at(pblk, meta_list, i),
>>> +					map_secs)) {
>>> 			bio_put(rqd->bio);
>>> 			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>> 			pblk_pipeline_stop(pblk);
>>> @@ -111,7 +116,7 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>> 	struct nvm_tgt_dev *dev = pblk->dev;
>>> 	struct nvm_geo *geo = &dev->geo;
>>> 	struct pblk_line_meta *lm = &pblk->lm;
>>> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
>>> +	void *meta_list = rqd->meta_list;
>>> 	struct pblk_line *e_line, *d_line;
>>> 	unsigned int map_secs;
>>> 	int min = pblk->min_write_pgs;
>>> @@ -120,7 +125,9 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>> 	for (i = 0; i < rqd->nr_ppas; i += min) {
>>> 		map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>> 		if (pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i],
>>> -					lun_bitmap, &meta_list[i], map_secs)) {
>>> +					lun_bitmap,
>>> +					pblk_get_meta_at(pblk, meta_list, i),
>>> +					map_secs)) {
>>> 			bio_put(rqd->bio);
>>> 			pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>> 			pblk_pipeline_stop(pblk);
>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>> index 6e93c489ce57..81cf79ea2dc6 100644
>>> --- a/drivers/lightnvm/pblk-read.c
>>> +++ b/drivers/lightnvm/pblk-read.c
>>> @@ -42,7 +42,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>> 				 struct bio *bio, sector_t blba,
>>> 				 unsigned long *read_bitmap)
>>> {
>>> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
>>> +	void *meta_list = rqd->meta_list;
>>> 	struct ppa_addr ppas[PBLK_MAX_REQ_ADDRS];
>>> 	int nr_secs = rqd->nr_ppas;
>>> 	bool advanced_bio = false;
>>> @@ -57,7 +57,8 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>> retry:
>>> 		if (pblk_ppa_empty(p)) {
>>> 			WARN_ON(test_and_set_bit(i, read_bitmap));
>>> -			meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
>>> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
>>> +					cpu_to_le64(ADDR_EMPTY);
>>> 
>>> 			if (unlikely(!advanced_bio)) {
>>> 				bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
>>> @@ -77,7 +78,8 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>> 				goto retry;
>>> 			}
>>> 			WARN_ON(test_and_set_bit(i, read_bitmap));
>>> -			meta_list[i].lba = cpu_to_le64(lba);
>>> +			pblk_get_meta_at(pblk, meta_list, i)->lba =
>>> +					cpu_to_le64(lba);
>>> 			advanced_bio = true;
>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>> 			atomic_long_inc(&pblk->cache_reads);
>>> @@ -106,13 +108,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>> 				sector_t blba)
>>> {
>>> -	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>> +	void *meta_lba_list = rqd->meta_list;
>>> 	int nr_lbas = rqd->nr_ppas;
>>> 	int i;
>>> 
>>> -	for (i = 0; i < nr_lbas; i++) {
>>> -		u64 lba = le64_to_cpu(meta_lba_list[i].lba);
>>> +	if (!pblk_is_oob_meta_supported(pblk))
>>> +		return;
>>> 
>>> +	for (i = 0; i < nr_lbas; i++) {
>>> +		u64 lba = le64_to_cpu(
>>> +				pblk_get_meta_at(pblk, meta_lba_list, i)->lba);
>>> 		if (lba == ADDR_EMPTY)
>>> 			continue;
>>> 
>>> @@ -136,17 +141,21 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>> 				 u64 *lba_list, int nr_lbas)
>>> {
>>> -	struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>> +	void *meta_lba_list = rqd->meta_list;
>>> 	int i, j;
>>> 
>>> -	for (i = 0, j = 0; i < nr_lbas; i++) {
>>> +	if (!pblk_is_oob_meta_supported(pblk))
>>> +		return;
>>> +
>>> +	for (i = 0; j = 0, i < nr_lbas; i++) {
>>> 		u64 lba = lba_list[i];
>>> 		u64 meta_lba;
>>> 
>>> 		if (lba == ADDR_EMPTY)
>>> 			continue;
>>> 
>>> -		meta_lba = le64_to_cpu(meta_lba_list[j].lba);
>>> +		meta_lba = le64_to_cpu(
>>> +				pblk_get_meta_at(pblk, meta_lba_list, i)->lba);
>>> 
>>> 		if (lba != meta_lba) {
>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>> @@ -235,7 +244,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>> 			     struct bio *orig_bio, unsigned int bio_init_idx,
>>> 			     unsigned long *read_bitmap)
>>> {
>>> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
>>> +	void *meta_list = rqd->meta_list;
>>> 	struct bio *new_bio;
>>> 	struct bio_vec src_bv, dst_bv;
>>> 	void *ppa_ptr = NULL;
>>> @@ -261,7 +270,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>> 	}
>>> 
>>> 	for (i = 0; i < nr_secs; i++)
>>> -		lba_list_mem[i] = meta_list[i].lba;
>>> +		lba_list_mem[i] = pblk_get_meta_at(pblk, meta_list, i)->lba;
>>> 
>>> 	new_bio->bi_iter.bi_sector = 0; /* internal bio */
>>> 	bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>>> @@ -300,8 +309,8 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>> 	}
>>> 
>>> 	for (i = 0; i < nr_secs; i++) {
>>> -		lba_list_media[i] = meta_list[i].lba;
>>> -		meta_list[i].lba = lba_list_mem[i];
>>> +		lba_list_media[i] = pblk_get_meta_at(pblk, meta_list, i)->lba;
>>> +		pblk_get_meta_at(pblk, meta_list, i)->lba = lba_list_mem[i];
>>> 	}
>>> 
>>> 	/* Fill the holes in the original bio */
>>> @@ -313,7 +322,8 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>> 
>>> 		kref_put(&line->ref, pblk_line_put);
>>> 
>>> -		meta_list[hole].lba = lba_list_media[i];
>>> +		pblk_get_meta_at(pblk, meta_list, hole)->lba =
>>> +						lba_list_media[i];
>>> 
>>> 		src_bv = new_bio->bi_io_vec[i++];
>>> 		dst_bv = orig_bio->bi_io_vec[bio_init_idx + hole];
>>> @@ -354,7 +364,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>> 			 sector_t lba, unsigned long *read_bitmap)
>>> {
>>> -	struct pblk_sec_meta *meta_list = rqd->meta_list;
>>> +	void *meta_list = rqd->meta_list;
>>> 	struct ppa_addr ppa;
>>> 
>>> 	pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
>>> @@ -366,7 +376,8 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>> retry:
>>> 	if (pblk_ppa_empty(ppa)) {
>>> 		WARN_ON(test_and_set_bit(0, read_bitmap));
>>> -		meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
>>> +		pblk_get_meta_at(pblk, meta_list, 0)->lba =
>>> +						cpu_to_le64(ADDR_EMPTY);
>>> 		return;
>>> 	}
>>> 
>>> @@ -380,7 +391,7 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>> 		}
>>> 
>>> 		WARN_ON(test_and_set_bit(0, read_bitmap));
>>> -		meta_list[0].lba = cpu_to_le64(lba);
>>> +		pblk_get_meta_at(pblk, meta_list, 0)->lba = cpu_to_le64(lba);
>>> 
>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>> 		atomic_long_inc(&pblk->cache_reads);
>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>>> index b1a91cb3ca4d..0007e8011476 100644
>>> --- a/drivers/lightnvm/pblk-recovery.c
>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>> @@ -98,7 +98,7 @@ static int pblk_calc_sec_in_line(struct pblk *pblk, struct pblk_line *line)
>>> 
>>> struct pblk_recov_alloc {
>>> 	struct ppa_addr *ppa_list;
>>> -	struct pblk_sec_meta *meta_list;
>>> +	void *meta_list;
>>> 	struct nvm_rq *rqd;
>>> 	void *data;
>>> 	dma_addr_t dma_ppa_list;
>>> @@ -111,7 +111,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>>> 	struct nvm_tgt_dev *dev = pblk->dev;
>>> 	struct nvm_geo *geo = &dev->geo;
>>> 	struct ppa_addr *ppa_list;
>>> -	struct pblk_sec_meta *meta_list;
>>> +	void *meta_list;
>>> 	struct nvm_rq *rqd;
>>> 	struct bio *bio;
>>> 	void *data;
>>> @@ -199,7 +199,8 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>>> 	}
>>> 
>>> 	for (i = 0; i < rqd->nr_ppas; i++) {
>>> -		u64 lba = le64_to_cpu(meta_list[i].lba);
>>> +		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>> +							meta_list, i)->lba);
>>> 
>>> 		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
>>> 			continue;
>>> @@ -240,7 +241,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>>> 	struct nvm_tgt_dev *dev = pblk->dev;
>>> 	struct nvm_geo *geo = &dev->geo;
>>> 	struct ppa_addr *ppa_list;
>>> -	struct pblk_sec_meta *meta_list;
>>> +	void *meta_list;
>>> 	struct pblk_pad_rq *pad_rq;
>>> 	struct nvm_rq *rqd;
>>> 	struct bio *bio;
>>> @@ -332,7 +333,8 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>>> 			dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
>>> 
>>> 			pblk_map_invalidate(pblk, dev_ppa);
>>> -			lba_list[w_ptr] = meta_list[i].lba = addr_empty;
>>> +			lba_list[w_ptr] = pblk_get_meta_at(pblk,
>>> +						meta_list, i)->lba = addr_empty;
>>> 			rqd->ppa_list[i] = dev_ppa;
>>> 		}
>>> 	}
>>> @@ -389,7 +391,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>>> 	struct nvm_tgt_dev *dev = pblk->dev;
>>> 	struct nvm_geo *geo = &dev->geo;
>>> 	struct ppa_addr *ppa_list;
>>> -	struct pblk_sec_meta *meta_list;
>>> +	void *meta_list;
>>> 	struct nvm_rq *rqd;
>>> 	struct bio *bio;
>>> 	void *data;
>>> @@ -473,7 +475,8 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>>> 	if (!rec_round++ && !rqd->error) {
>>> 		rec_round = 0;
>>> 		for (i = 0; i < rqd->nr_ppas; i++, r_ptr++) {
>>> -			u64 lba = le64_to_cpu(meta_list[i].lba);
>>> +			u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>> +							meta_list, i)->lba);
>>> 
>>> 			if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
>>> 				continue;
>>> @@ -523,7 +526,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>> 	struct nvm_tgt_dev *dev = pblk->dev;
>>> 	struct nvm_geo *geo = &dev->geo;
>>> 	struct ppa_addr *ppa_list;
>>> -	struct pblk_sec_meta *meta_list;
>>> +	void *meta_list;
>>> 	struct nvm_rq *rqd;
>>> 	struct bio *bio;
>>> 	void *data;
>>> @@ -619,7 +622,8 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>> 	}
>>> 
>>> 	for (i = 0; i < rqd->nr_ppas; i++) {
>>> -		u64 lba = le64_to_cpu(meta_list[i].lba);
>>> +		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>> +							meta_list, i)->lba);
>>> 
>>> 		if (lba == ADDR_EMPTY || lba > pblk->rl.nr_secs)
>>> 			continue;
>>> @@ -641,7 +645,7 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>> 	struct nvm_geo *geo = &dev->geo;
>>> 	struct nvm_rq *rqd;
>>> 	struct ppa_addr *ppa_list;
>>> -	struct pblk_sec_meta *meta_list;
>>> +	void *meta_list;
>>> 	struct pblk_recov_alloc p;
>>> 	void *data;
>>> 	dma_addr_t dma_ppa_list, dma_meta_list;
>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>> index c072955d72c2..f82c3a0b0de5 100644
>>> --- a/drivers/lightnvm/pblk.h
>>> +++ b/drivers/lightnvm/pblk.h
>>> @@ -1420,4 +1420,33 @@ static inline void pblk_setup_uuid(struct pblk *pblk)
>>> 	uuid_le_gen(&uuid);
>>> 	memcpy(pblk->instance_uuid, uuid.b, 16);
>>> }
>>> +
>>> +static inline int pblk_is_oob_meta_supported(struct pblk *pblk)
>>> +{
>>> +	struct nvm_tgt_dev *dev = pblk->dev;
>>> +	struct nvm_geo *geo = &dev->geo;
>>> +
>>> +	/* Pblk uses OOB meta to store LBA of given physical sector.
>>> +	 * The LBA is eventually used in recovery mode and/or for handling
>>> +	 * telemetry events (e.g., relocate sector).
>>> +	 */
>>> +
>>> +	return (geo->sos >= sizeof(struct pblk_sec_meta));
>>> +}
>>> +
>> In principle, we need 8 bytes for storing the lba in pblk, if the OOB
>> area is not big enough, I'm ok with making this optional, but if so,
>> there should be a comment that some of the pfail recovery will not be
>> supported.
>> If you are not facing this problem, then I would suggest failing pblk
>> creation in case the metadata size is not big store the lba.
> 
> I belive that all the potential problems should be solved with
> functionality added in patch 4 (except of verifying LBA on read
> completion) - rest like dirty shutdown recovery, etc. should have the
> same level of confidence - if I'm missing sth else let me know.
> 

I'll look at the other patches now. The case I'm thinking is if the
device reports a metadata size less than 8 bytes.

Regarding the lba check, it is not mandatory, but it is an usual check
that can safe debug time in case that the upper layers see data
corruption.


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] lightnvm: pblk: Disable interleaved metadata in pblk
  2018-06-18 14:29     ` Javier Gonzalez
  2018-06-18 20:51       ` Igor Konopko
@ 2018-06-19  8:24       ` Matias Bjørling
  1 sibling, 0 replies; 25+ messages in thread
From: Matias Bjørling @ 2018-06-19  8:24 UTC (permalink / raw)
  To: Javier Gonzalez
  Cc: Konopko, Igor J, linux-block, michal.sorn, marcin.dziegielewski

On Mon, Jun 18, 2018 at 4:29 PM, Javier Gonzalez <javier@cnexlabs.com> wrot=
e:
>> On 16 Jun 2018, at 21.38, Matias Bj=C3=B8rling <mb@lightnvm.io> wrote:
>>
>> On 06/16/2018 12:27 AM, Igor Konopko wrote:
>>> Currently pblk and lightnvm does only check for size
>>> of OOB metadata and does not care wheather this meta
>>> is located in separate buffer or is interleaved with
>>> data in single buffer.
>>> In reality only the first scenario is supported, where
>>> second mode will break pblk functionality during any
>>> IO operation.
>>> The goal of this patch is to block creation of pblk
>>> devices in case of interleaved metadata
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>>  drivers/lightnvm/pblk-init.c | 6 ++++++
>>>  drivers/nvme/host/lightnvm.c | 1 +
>>>  include/linux/lightnvm.h     | 1 +
>>>  3 files changed, 8 insertions(+)
>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.=
c
>>> index 5eb641da46ed..483a6d479e7d 100644
>>> --- a/drivers/lightnvm/pblk-init.c
>>> +++ b/drivers/lightnvm/pblk-init.c
>>> @@ -1238,6 +1238,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, =
struct gendisk *tdisk,
>>>              return ERR_PTR(-EINVAL);
>>>      }
>>>  +   if (geo->ext) {
>>> +            pr_err("pblk: extended (interleaved) metadata in data buff=
er"
>>> +                    " not supported\n");
>>> +            return ERR_PTR(-EINVAL);
>>> +    }
>>> +
>>>      pblk =3D kzalloc(sizeof(struct pblk), GFP_KERNEL);
>>>      if (!pblk)
>>>              return ERR_PTR(-ENOMEM);
>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.=
c
>>> index 670478abc754..872ab854ccf5 100644
>>> --- a/drivers/nvme/host/lightnvm.c
>>> +++ b/drivers/nvme/host/lightnvm.c
>>> @@ -979,6 +979,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
>>>      geo->csecs =3D 1 << ns->lba_shift;
>>>      geo->sos =3D ns->ms;
>>> +    geo->ext =3D ns->ext;
>>>  }
>>>    int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
>>> index 72a55d71917e..b13e64e2112f 100644
>>> --- a/include/linux/lightnvm.h
>>> +++ b/include/linux/lightnvm.h
>>> @@ -350,6 +350,7 @@ struct nvm_geo {
>>>      u32     clba;           /* sectors per chunk */
>>>      u16     csecs;          /* sector size */
>>>      u16     sos;            /* out-of-band area size */
>>> +    u16     ext;            /* metadata in extended data buffer */
>>>      /* device write constrains */
>>>      u32     ws_min;         /* minimum write size */
>>
>> I think bool type would be better here. Can it be placesd a bit down, ju=
st over the 1.2 stuff?
>>
>> Also, feel free to fix up the checkpatch stuff in patch 1 & 3 & 5.
>
> Apart from Matias' comments, it looks good to me.
>
> Traditionally, we have separated subsystem and target patches to make
> sure there is no coupling between pblk and lightnvm, but if Matias is ok
> with starting having patches covering all at once, then good for me too.
>

I often object when a patch can logically be split into two and should
be two distinct parts. In this case, it fits together.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] lightnvm: More flexible approach to metadata
  2018-06-15 22:27 [PATCH 0/5] lightnvm: More flexible approach to metadata Igor Konopko
                   ` (4 preceding siblings ...)
  2018-06-15 22:27 ` [PATCH 5/5] lightnvm: pblk: Disable interleaved " Igor Konopko
@ 2018-06-19 10:28 ` Javier Gonzalez
  5 siblings, 0 replies; 25+ messages in thread
From: Javier Gonzalez @ 2018-06-19 10:28 UTC (permalink / raw)
  To: Konopko, Igor J
  Cc: Matias Bjørling, linux-block, michal.sorn, marcin.dziegielewski

[-- Attachment #1: Type: text/plain, Size: 10326 bytes --]


> On 16 Jun 2018, at 00.27, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> This series of patches introduce some more flexibility in pblk
> related to OOB meta:
> -ability to use different sizes of metadata (previously fixed 16b)
> -ability to use pblk on drives without metadata
> -ensuring that extended (interleaved) metadata is not in use
> 
> I belive that most of this patches, maybe except of number 4 (Support
> for packed metadata) are rather simple, so waiting for comments
> especially about this one.
> 
> Igor Konopko (5):
>  lightnvm: pblk: Helpers for OOB metadata
>  lightnvm: pblk: Remove resv field for sec meta
>  lightnvm: Flexible DMA pool entry size
>  lightnvm: pblk: Support for packed metadata in pblk.
>  lightnvm: pblk: Disable interleaved metadata in pblk
> 
> drivers/lightnvm/core.c          | 33 ++++++++++-----
> drivers/lightnvm/pblk-core.c     | 86 +++++++++++++++++++++++++++++++---------
> drivers/lightnvm/pblk-init.c     | 52 +++++++++++++++++++++++-
> drivers/lightnvm/pblk-map.c      | 21 ++++++----
> drivers/lightnvm/pblk-rb.c       |  3 ++
> drivers/lightnvm/pblk-read.c     | 85 +++++++++++++++++++++++++--------------
> drivers/lightnvm/pblk-recovery.c | 67 +++++++++++++++++++++----------
> drivers/lightnvm/pblk-sysfs.c    |  7 ++++
> drivers/lightnvm/pblk-write.c    | 22 ++++++----
> drivers/lightnvm/pblk.h          | 46 +++++++++++++++++++--
> drivers/nvme/host/lightnvm.c     |  7 +++-
> include/linux/lightnvm.h         |  9 +++--
> 12 files changed, 333 insertions(+), 105 deletions(-)
> 
> --
> 2.14.3

I get a number of errors when running the series. A simple bisect points
to the first patch being the one introducing the (first) regression.
Here you have the trace attached. I could easily reproduce it mounting
ext4 and running a RocksDB's db_bench.

[   80.302731] Workqueue: pblk-read-end-wq pblk_line_put_ws
[   80.302733] RIP: 0010:__pblk_line_put+0xc3/0xd0
[   80.302733] Code: 89 55 70 48 89 4b 20 48 89 43 28 48 89 10 83 45 64 01 c6 07 00 0f 1f 40 00 48 89 de 4c 89 ef 5b 5d 41 5c 41 5d e9 5d a5 00 00 <0f> 0b e9 60 ff ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 55 53 48 89
[   80.302755] RSP: 0018:ffffb17102733e40 EFLAGS: 00010293
[   80.302756] RAX: 0000000000000000 RBX: ffff8d34529003e8 RCX: ffff8d3467068020
[   80.302757] RDX: 0000000000000001 RSI: ffff8d34529003e8 RDI: ffff8d34529004a8
[   80.302758] RBP: ffff8d34622ae800 R08: 0000000000000000 R09: 8080808080808080
[   80.302758] R10: 0000000000000018 R11: fefefefefefefeff R12: ffff8d34529004a8
[   80.302759] R13: 0000000000000000 R14: ffff8d346532c600 R15: 0ffff8d345f59990
[   80.302760] FS:  0000000000000000(0000) GS:ffff8d34778c0000(0000) knlGS:0000000000000000
[   80.302761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   80.302762] CR2: 00007f4d37e00000 CR3: 000000020780a002 CR4: 00000000003606e0
[   80.302763] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   80.302764] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   80.302764] Call Trace:
[   80.302768]  pblk_line_put_ws+0x1a/0x30
[   80.302771]  process_one_work+0x15e/0x3d0
[   80.302773]  worker_thread+0x4c/0x440
[   80.302774]  kthread+0xf8/0x130
[   80.302776]  ? rescuer_thread+0x350/0x350
[   80.302777]  ? kthread_associate_blkcg+0x90/0x90
[   80.302779]  ret_from_fork+0x35/0x40
[   80.302781] ---[ end trace c4ab4ef1527265f6 ]---
[   81.551907] WARNING: CPU: 6 PID: 5045 at drivers/lightnvm/pblk-core.c:162 __pblk_map_invalidate+0x10b/0x130
[   81.551908] Modules linked in:
[   81.551910] CPU: 6 PID: 5045 Comm: rocksdb:bg0 Tainted: G        W         4.17.0--00884b2fb689 #2569
[   81.551911] Hardware name: Supermicro Super Server/X11SSH-F, BIOS 2.1 12/11/2017
[   81.551912] RIP: 0010:__pblk_map_invalidate+0x10b/0x130
[   81.551912] Code: 48 89 de 4c 89 e7 e8 f4 fd ff ff 49 89 c5 e9 62 ff ff ff 48 c7 c7 ec 8e 5e a4 c6 05 50 65 10 01 01 e8 29 eb 88 ff 0f 0b eb c5 <0f> 0b e9 1b ff ff ff c6 07 00 0f 1f 40 00 4c 89 e7 c6 07 00 0f 1f
[   81.551927] RSP: 0018:ffffb17106d23808 EFLAGS: 00010246
[   81.551928] RAX: 0000000000000000 RBX: ffff8d34529003e8 RCX: 0000000000000300
[   81.551929] RDX: 0000000000000001 RSI: ffff8d34529003e8 RDI: ffff8d34529004a8
[   81.551929] RBP: ffff8d34529004a8 R08: 0000000000000018 R09: ffff8d345d6e96d0
[   81.551930] R10: ffffb17106d237c8 R11: 0000000000000040 R12: ffff8d34622ae800
[   81.551930] R13: 0000000000000b40 R14: ffff8d34622aec10 R15: ffffb1710615d7c0
[   81.551931] FS:  00007f4d6707e700(0000) GS:ffff8d3477980000(0000) knlGS:0000000000000000
[   81.551932] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   81.551932] CR2: 00007f4d57e00000 CR3: 000000040c204006 CR4: 00000000003606e0
[   81.551933] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   81.551933] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   81.551934] Call Trace:
[   81.551937]  pblk_update_map_dev+0x69/0x370
[   81.551938]  __pblk_rb_update_l2p+0x52/0x160
[   81.551939]  __pblk_rb_may_write+0x40/0x50
[   81.551941]  pblk_rb_may_write_user+0x6c/0xe0
[   81.551942]  pblk_write_to_cache+0xa1/0x270
[   81.551944]  ? pblk_make_rq+0x6d/0x110
[   81.551944]  pblk_make_rq+0x6d/0x110
[   81.551947]  generic_make_request+0x1e8/0x410
[   81.551948]  ? submit_bio+0x6c/0x140
[   81.551950]  ? iov_iter_get_pages+0xbd/0x340
[   81.551951]  submit_bio+0x6c/0x140
[   81.551952]  ? bio_add_page+0x1b/0x50
[   81.551954]  do_blockdev_direct_IO+0x2011/0x29f0
[   81.551957]  ? ext4_dio_get_block_unwritten_sync+0x50/0x50
[   81.551958]  ? ext4_direct_IO+0x288/0x740
[   81.551959]  ext4_direct_IO+0x288/0x740
[   81.551961]  generic_file_direct_write+0xc4/0x160
[   81.551962]  __generic_file_write_iter+0xb6/0x1e0
[   81.551964]  ? __switch_to_asm+0x40/0x70
[   81.551965]  ? __switch_to_asm+0x34/0x70
[   81.551966]  ext4_file_write_iter+0xc7/0x400
[   81.551967]  ? __switch_to_asm+0x34/0x70
[   81.551968]  ? __switch_to_asm+0x40/0x70
[   81.551968]  ? __switch_to_asm+0x34/0x70
[   81.551969]  ? __switch_to_asm+0x40/0x70
[   81.551970]  ? __switch_to_asm+0x34/0x70
[   81.551971]  __vfs_write+0x112/0x1a0
[   81.551973]  vfs_write+0xb3/0x1a0
[   81.551974]  ksys_pwrite64+0x71/0x90
[   81.551976]  ? exit_to_usermode_loop+0x5c/0xb0
[   81.551977]  do_syscall_64+0x55/0x100
[   81.551978]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   81.551980] RIP: 0033:0x7f4dd04a2da3
[   81.551980] Code: 49 89 ca b8 12 00 00 00 0f 05 48 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 cb f3 ff ff 48 89 04 24 49 89 ca b8 12 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 11 f4 ff ff 48 89 d0 48 83 c4 08 48 3d 01
[   81.551995] RSP: 002b:00007f4d6707a8f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
[   81.551996] RAX: ffffffffffffffda RBX: 0000000006a05000 RCX: 00007f4dd04a2da3
[   81.551997] RDX: 0000000006a05000 RSI: 00007f4d4ffff000 RDI: 000000000000000f
[   81.551997] RBP: 00007f4d6707a950 R08: 00007f4d60000d70 R09: 00007f4d6707aad0
[   81.551998] R10: 0000000008000000 R11: 0000000000000293 R12: 0000000008000000
[   81.551998] R13: 00007f4d4ffff000 R14: 00007f4d60000bc0 R15: 00007f4d6707a9a0
[   81.551999] ---[ end trace c4ab4ef1527265f7 ]---
[   81.555052] WARNING: CPU: 6 PID: 5045 at drivers/lightnvm/pblk-core.c:162 __pblk_map_invalidate+0x10b/0x130
[   81.555053] Modules linked in:
[   81.555055] CPU: 6 PID: 5045 Comm: rocksdb:bg0 Tainted: G        W         4.17.0--00884b2fb689 #2569
[   81.555056] Hardware name: Supermicro Super Server/X11SSH-F, BIOS 2.1 12/11/2017
[   81.555057] RIP: 0010:__pblk_map_invalidate+0x10b/0x130
[   81.555057] Code: 48 89 de 4c 89 e7 e8 f4 fd ff ff 49 89 c5 e9 62 ff ff ff 48 c7 c7 ec 8e 5e a4 c6 05 50 65 10 01 01 e8 29 eb 88 ff 0f 0b eb c5 <0f> 0b e9 1b ff ff ff c6 07 00 0f 1f 40 00 4c 89 e7 c6 07 00 0f 1f
[   81.555072] RSP: 0018:ffffb17106d23808 EFLAGS: 00010246
[   81.555073] RAX: 0000000000000000 RBX: ffff8d34529003e8 RCX: 0000000000000300
[   81.555073] RDX: 0000000000000001 RSI: ffff8d34529003e8 RDI: ffff8d34529004a8
[   81.555074] RBP: ffff8d34529004a8 R08: 0000000000000018 R09: ffff8d345d6e96d0
[   81.555074] R10: ffffb17106d237c8 R11: 0000000000000040 R12: ffff8d34622ae800
[   81.555075] R13: 0000000000000d41 R14: ffff8d34622aec10 R15: ffffb17106166808
[   81.555076] FS:  00007f4d6707e700(0000) GS:ffff8d3477980000(0000) knlGS:0000000000000000
[   81.555076] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   81.555077] CR2: 00007f4d57e00000 CR3: 000000040c204006 CR4: 00000000003606e0
[   81.555078] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   81.555078] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   81.555078] Call Trace:
[   81.555081]  pblk_update_map_dev+0x69/0x370
[   81.555083]  __pblk_rb_update_l2p+0x52/0x160
[   81.555084]  __pblk_rb_may_write+0x40/0x50
[   81.555085]  pblk_rb_may_write_user+0x6c/0xe0
[   81.555087]  pblk_write_to_cache+0xa1/0x270
[   81.555088]  ? pblk_make_rq+0x6d/0x110
[   81.555089]  pblk_make_rq+0x6d/0x110
[   81.555092]  generic_make_request+0x1e8/0x410
[   81.555093]  ? submit_bio+0x6c/0x140
[   81.555095]  ? iov_iter_get_pages+0xbd/0x340
[   81.555096]  submit_bio+0x6c/0x140
[   81.555098]  ? bio_add_page+0x1b/0x50
[   81.555100]  do_blockdev_direct_IO+0x2011/0x29f0
[   81.555102]  ? ext4_dio_get_block_unwritten_sync+0x50/0x50
[   81.555104]  ? ext4_direct_IO+0x288/0x740
[   81.555104]  ext4_direct_IO+0x288/0x740
[   81.555107]  generic_file_direct_write+0xc4/0x160
[   81.555108]  __generic_file_write_iter+0xb6/0x1e0
[   81.555110]  ? __switch_to_asm+0x40/0x70
[   81.555111]  ? __switch_to_asm+0x34/0x70
[   81.555112]  ext4_file_write_iter+0xc7/0x400
[   81.555113]  ? __switch_to_asm+0x34/0x70
[   81.555114]  ? __switch_to_asm+0x40/0x70
[   81.555114]  ? __switch_to_asm+0x34/0x70
[   81.555115]  ? __switch_to_asm+0x40/0x70
[   81.555115]  ? __switch_to_asm+0x34/0x70
[   81.555117]  __vfs_write+0x112/0x1a0
[   81.555119]  vfs_write+0xb3/0x1a0
[   81.555121]  ksys_pwrite64+0x71/0x90
[   81.555123]  ? exit_to_usermode_loop+0x5c/0xb0
[   81.555124]  do_syscall_64+0x55/0x100
[   81.555125]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   81.555127] RIP: 0033:0x7f4dd04a2da3

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk.
  2018-06-15 22:27 ` [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk Igor Konopko
  2018-06-16 19:45   ` Matias Bjørling
@ 2018-06-19 11:08   ` Javier Gonzalez
  2018-06-19 12:42     ` Matias Bjørling
  1 sibling, 1 reply; 25+ messages in thread
From: Javier Gonzalez @ 2018-06-19 11:08 UTC (permalink / raw)
  To: Konopko, Igor J
  Cc: Matias Bjørling, linux-block, michal.sorn, marcin.dziegielewski

[-- Attachment #1: Type: text/plain, Size: 16597 bytes --]

> On 16 Jun 2018, at 00.27, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> In current pblk implementation, l2p mapping for not closed lines
> is always stored only in OOB metadata and recovered from it.
> 
> Such a solution does not provide data integrity when drives does
> not have such a OOB metadata space.
> 
> The goal of this patch is to add support for so called packed
> metadata, which store l2p mapping for open lines in last sector
> of every write unit.
> 
> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
> ---
> drivers/lightnvm/pblk-core.c     | 52 ++++++++++++++++++++++++++++++++++++----
> drivers/lightnvm/pblk-init.c     | 37 ++++++++++++++++++++++++++--
> drivers/lightnvm/pblk-rb.c       |  3 +++
> drivers/lightnvm/pblk-recovery.c | 25 +++++++++++++++----
> drivers/lightnvm/pblk-sysfs.c    |  7 ++++++
> drivers/lightnvm/pblk-write.c    | 14 +++++++----
> drivers/lightnvm/pblk.h          |  5 +++-
> 7 files changed, 128 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index c092ee93a18d..375c6430612e 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -340,7 +340,7 @@ void pblk_write_should_kick(struct pblk *pblk)
> {
> 	unsigned int secs_avail = pblk_rb_read_count(&pblk->rwb);
> 
> -	if (secs_avail >= pblk->min_write_pgs)
> +	if (secs_avail >= pblk->min_write_pgs_data)
> 		pblk_write_kick(pblk);
> }
> 
> @@ -371,7 +371,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line)
> 	struct pblk_line_meta *lm = &pblk->lm;
> 	struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> 	struct list_head *move_list = NULL;
> -	int vsc = le32_to_cpu(*line->vsc);
> +	int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
> +			* (pblk->min_write_pgs - pblk->min_write_pgs_data);
> +	int vsc = le32_to_cpu(*line->vsc) + packed_meta;
> 
> 	lockdep_assert_held(&line->lock);
> 
> @@ -540,12 +542,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
> }
> 
> int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
> -		   unsigned long secs_to_flush)
> +		   unsigned long secs_to_flush, bool skip_meta)
> {
> 	int max = pblk->sec_per_write;
> 	int min = pblk->min_write_pgs;
> 	int secs_to_sync = 0;
> 
> +	if (skip_meta)
> +		min = max = pblk->min_write_pgs_data;
> +
> 	if (secs_avail >= max)
> 		secs_to_sync = max;
> 	else if (secs_avail >= min)
> @@ -663,7 +668,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
> next_rq:
> 	memset(&rqd, 0, sizeof(struct nvm_rq));
> 
> -	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
> +	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
> 	rq_len = rq_ppas * geo->csecs;
> 
> 	bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
> @@ -2091,3 +2096,42 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
> 	}
> 	spin_unlock(&pblk->trans_lock);
> }
> +
> +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
> +{
> +	void *meta_list = rqd->meta_list;
> +	void *page;
> +	int i = 0;
> +
> +	if (pblk_is_oob_meta_supported(pblk))
> +		return;
> +
> +	/* We need to zero out metadata corresponding to packed meta page */
> +	pblk_get_meta_at(pblk, meta_list, rqd->nr_ppas - 1)->lba = ADDR_EMPTY;
> +
> +	page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
> +	/* We need to fill last page of request (packed metadata)
> +	 * with data from oob meta buffer.
> +	 */
> +	for (; i < rqd->nr_ppas; i++)
> +		memcpy(page + (i * sizeof(struct pblk_sec_meta)),
> +			pblk_get_meta_at(pblk, meta_list, i),
> +			sizeof(struct pblk_sec_meta));
> +}
> +
> +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
> +{
> +	void *meta_list = rqd->meta_list;
> +	void *page;
> +	int i = 0;
> +
> +	if (pblk_is_oob_meta_supported(pblk))
> +		return;
> +
> +	page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
> +	/* We need to fill oob meta buffer with data from packed metadata */
> +	for (; i < rqd->nr_ppas; i++)
> +		memcpy(pblk_get_meta_at(pblk, meta_list, i),
> +			page + (i * sizeof(struct pblk_sec_meta)),
> +			sizeof(struct pblk_sec_meta));
> +}
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index f05112230a52..5eb641da46ed 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -372,8 +372,40 @@ static int pblk_core_init(struct pblk *pblk)
> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns;
> 	pblk->max_write_pgs = min_t(int, max_write_ppas, NVM_MAX_VLBA);
> +	pblk->min_write_pgs_data = pblk->min_write_pgs;
> 	pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
> 
> +	if (!pblk_is_oob_meta_supported(pblk)) {
> +		/* For drives which does not have OOB metadata feature
> +		 * in order to support recovery feature we need to use
> +		 * so called packed metadata. Packed metada will store
> +		 * the same information as OOB metadata (l2p table mapping,
> +		 * but in the form of the single page at the end of
> +		 * every write request.
> +		 */
> +		if (pblk->min_write_pgs
> +			* sizeof(struct pblk_sec_meta) > PAGE_SIZE) {
> +			/* We want to keep all the packed metadata on single
> +			 * page per write requests. So we need to ensure that
> +			 * it will fit.
> +			 *
> +			 * This is more like sanity check, since there is
> +			 * no device with such a big minimal write size
> +			 * (above 1 metabytes).
> +			 */
> +			pr_err("pblk: Not supported min write size\n");
> +			return -EINVAL;
> +		}
> +		/* For packed meta approach we do some simplification.
> +		 * On read path we always issue requests which size
> +		 * equal to max_write_pgs, with all pages filled with
> +		 * user payload except of last one page which will be
> +		 * filled with packed metadata.
> +		 */
> +		pblk->max_write_pgs = pblk->min_write_pgs;
> +		pblk->min_write_pgs_data = pblk->min_write_pgs - 1;
> +	}
> +
> 	if (pblk->max_write_pgs > PBLK_MAX_REQ_ADDRS) {
> 		pr_err("pblk: vector list too big(%u > %u)\n",
> 				pblk->max_write_pgs, PBLK_MAX_REQ_ADDRS);
> @@ -668,7 +700,7 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
> 	struct pblk_line_meta *lm = &pblk->lm;
> 	struct nvm_geo *geo = &dev->geo;
> 	sector_t provisioned;
> -	int sec_meta, blk_meta;
> +	int sec_meta, blk_meta, clba;
> 
> 	if (geo->op == NVM_TARGET_DEFAULT_OP)
> 		pblk->op = PBLK_DEFAULT_OP;
> @@ -691,7 +723,8 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
> 	sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
> 	blk_meta = DIV_ROUND_UP(sec_meta, geo->clba);
> 
> -	pblk->capacity = (provisioned - blk_meta) * geo->clba;
> +	clba = (geo->clba / pblk->min_write_pgs) * pblk->min_write_pgs_data;
> +	pblk->capacity = (provisioned - blk_meta) * clba;
> 
> 	atomic_set(&pblk->rl.free_blocks, nr_free_blks);
> 	atomic_set(&pblk->rl.free_user_blocks, nr_free_blks);
> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
> index a81a97e8ea6d..081e73e7978f 100644
> --- a/drivers/lightnvm/pblk-rb.c
> +++ b/drivers/lightnvm/pblk-rb.c
> @@ -528,6 +528,9 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
> 		to_read = count;
> 	}
> 
> +	/* Add space for packed metadata if in use*/
> +	pad += (pblk->min_write_pgs - pblk->min_write_pgs_data);
> +
> 	c_ctx->sentry = pos;
> 	c_ctx->nr_valid = to_read;
> 	c_ctx->nr_padded = pad;
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index f5853fc77a0c..0fab18fe30d9 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -138,7 +138,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
> next_read_rq:
> 	memset(rqd, 0, pblk_g_rq_size);
> 
> -	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
> +	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
> 	if (!rq_ppas)
> 		rq_ppas = pblk->min_write_pgs;
> 	rq_len = rq_ppas * geo->csecs;
> @@ -198,6 +198,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
> 		return -EINTR;
> 	}
> 
> +	pblk_get_packed_meta(pblk, rqd);
> 	for (i = 0; i < rqd->nr_ppas; i++) {
> 		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
> 							meta_list, i)->lba);
> @@ -272,7 +273,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
> 	kref_init(&pad_rq->ref);
> 
> next_pad_rq:
> -	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
> +	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
> 	if (rq_ppas < pblk->min_write_pgs) {
> 		pr_err("pblk: corrupted pad line %d\n", line->id);
> 		goto fail_free_pad;
> @@ -418,7 +419,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
> next_rq:
> 	memset(rqd, 0, pblk_g_rq_size);
> 
> -	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
> +	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
> 	if (!rq_ppas)
> 		rq_ppas = pblk->min_write_pgs;
> 	rq_len = rq_ppas * geo->csecs;
> @@ -475,6 +476,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
> 	 */
> 	if (!rec_round++ && !rqd->error) {
> 		rec_round = 0;
> +		pblk_get_packed_meta(pblk, rqd);
> 		for (i = 0; i < rqd->nr_ppas; i++, r_ptr++) {
> 			u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
> 							meta_list, i)->lba);
> @@ -492,6 +494,12 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
> 		int ret;
> 
> 		bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
> +		if (!pblk_is_oob_meta_supported(pblk) && bit > 0) {
> +			/* This case should not happen since we always read in
> +			 * the same unit here as we wrote in writer thread.
> +			 */
> +			pr_err("pblk: Inconsistent packed metadata read\n");
> +		}
> 		nr_error_bits = rqd->nr_ppas - bit;
> 
> 		/* Roll back failed sectors */
> @@ -550,7 +558,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
> next_rq:
> 	memset(rqd, 0, pblk_g_rq_size);
> 
> -	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
> +	rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
> 	if (!rq_ppas)
> 		rq_ppas = pblk->min_write_pgs;
> 	rq_len = rq_ppas * geo->csecs;
> @@ -608,6 +616,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
> 		int nr_error_bits, bit;
> 
> 		bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
> +		if (!pblk_is_oob_meta_supported(pblk)) {
> +			/* For packed metadata we do not handle partially
> +			 * written requests here, since metadata is always
> +			 * in last page on the requests.
> +			 */
> +			bit = 0;
> +			*done = 0;
> +		}
> 		nr_error_bits = rqd->nr_ppas - bit;
> 
> 		/* Roll back failed sectors */
> @@ -622,6 +638,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
> 			*done = 0;
> 	}
> 
> +	pblk_get_packed_meta(pblk, rqd);
> 	for (i = 0; i < rqd->nr_ppas; i++) {
> 		u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
> 							meta_list, i)->lba);
> diff --git a/drivers/lightnvm/pblk-sysfs.c b/drivers/lightnvm/pblk-sysfs.c
> index b0e5e93a9d5f..aa7b4164ce9e 100644
> --- a/drivers/lightnvm/pblk-sysfs.c
> +++ b/drivers/lightnvm/pblk-sysfs.c
> @@ -473,6 +473,13 @@ static ssize_t pblk_sysfs_set_sec_per_write(struct pblk *pblk,
> 	if (kstrtouint(page, 0, &sec_per_write))
> 		return -EINVAL;
> 
> +	if (!pblk_is_oob_meta_supported(pblk)) {
> +		/* For packed metadata case it is
> +		 * not allowed to change sec_per_write.
> +		 */
> +		return -EINVAL;
> +	}
> +
> 	if (sec_per_write < pblk->min_write_pgs
> 				|| sec_per_write > pblk->max_write_pgs
> 				|| sec_per_write % pblk->min_write_pgs != 0)
> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
> index 6552db35f916..bb45c7e6c375 100644
> --- a/drivers/lightnvm/pblk-write.c
> +++ b/drivers/lightnvm/pblk-write.c
> @@ -354,7 +354,7 @@ static int pblk_calc_secs_to_sync(struct pblk *pblk, unsigned int secs_avail,
> {
> 	int secs_to_sync;
> 
> -	secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush);
> +	secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush, true);
> 
> #ifdef CONFIG_NVM_PBLK_DEBUG
> 	if ((!secs_to_sync && secs_to_flush)
> @@ -522,6 +522,11 @@ static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
> 		return NVM_IO_ERR;
> 	}
> 
> +	/* This is the first place when we have write requests mapped
> +	 * and we can fill packed metadata with l2p mappings.
> +	 */
> +	pblk_set_packed_meta(pblk, rqd);
> +
> 	meta_line = pblk_should_submit_meta_io(pblk, rqd);
> 
> 	/* Submit data write for current data line */
> @@ -572,7 +577,7 @@ static int pblk_submit_write(struct pblk *pblk)
> 	struct bio *bio;
> 	struct nvm_rq *rqd;
> 	unsigned int secs_avail, secs_to_sync, secs_to_com;
> -	unsigned int secs_to_flush;
> +	unsigned int secs_to_flush, packed_meta_pgs;
> 	unsigned long pos;
> 	unsigned int resubmit;
> 
> @@ -608,7 +613,7 @@ static int pblk_submit_write(struct pblk *pblk)
> 			return 1;
> 
> 		secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
> -		if (!secs_to_flush && secs_avail < pblk->min_write_pgs)
> +		if (!secs_to_flush && secs_avail < pblk->min_write_pgs_data)
> 			return 1;
> 
> 		secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail,
> @@ -623,7 +628,8 @@ static int pblk_submit_write(struct pblk *pblk)
> 		pos = pblk_rb_read_commit(&pblk->rwb, secs_to_com);
> 	}
> 
> -	bio = bio_alloc(GFP_KERNEL, secs_to_sync);
> +	packed_meta_pgs = (pblk->min_write_pgs - pblk->min_write_pgs_data);
> +	bio = bio_alloc(GFP_KERNEL, secs_to_sync + packed_meta_pgs);
> 
> 	bio->bi_iter.bi_sector = 0; /* internal bio */
> 	bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 4c61ede5b207..c95ecd8bcf79 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -605,6 +605,7 @@ struct pblk {
> 	int state;			/* pblk line state */
> 
> 	int min_write_pgs; /* Minimum amount of pages required by controller */
> +	int min_write_pgs_data; /* Minimum amount of payload pages */
> 	int max_write_pgs; /* Maximum amount of pages supported by controller */
> 
> 	sector_t capacity; /* Device capacity when bad blocks are subtracted */
> @@ -798,7 +799,7 @@ void pblk_dealloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
> u64 pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
> u64 __pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
> int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
> -		   unsigned long secs_to_flush);
> +		   unsigned long secs_to_flush, bool skip_meta);
> void pblk_up_page(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas);
> void pblk_down_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
> 		  unsigned long *lun_bitmap);
> @@ -823,6 +824,8 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
> 			  u64 *lba_list, int nr_secs);
> void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
> 			 sector_t blba, int nr_secs);
> +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
> +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
> 
> /*
>  * pblk user I/O write path
> --
> 2.14.3

The functionality is good. A couple of comments though:
  - Have you considered the case when the device reports ws_min = ws_opt
    = 4096? Maybe checks preventing this case would be a good idea.
  - Have you checked for any conflicts in the write recovery path? You
    would need to deal with more corner cases (e.g., a write failure in
    the metadata page would require re-sending always all other sectors).
  - There are also corner cases on scan recovery: what if the metadata
    page cannot be read? In this case, data loss (at a drive level) will
    happen, but you will need to guarantee that reading garbage will not
    corrupt other data. In the OOB area case this is not a problem
    because data and metadata are always good/not good simultaneously.
  - I think it would be good to update the WA counters, since there is a
    significant write and space amplification in using (1 / ws_opt)
    pages for metadata.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk.
  2018-06-19 11:08   ` Javier Gonzalez
@ 2018-06-19 12:42     ` Matias Bjørling
  2018-06-19 12:47       ` Javier Gonzalez
  0 siblings, 1 reply; 25+ messages in thread
From: Matias Bjørling @ 2018-06-19 12:42 UTC (permalink / raw)
  To: Javier Gonzalez
  Cc: Konopko, Igor J, linux-block, michal.sorn, marcin.dziegielewski

On Tue, Jun 19, 2018 at 1:08 PM, Javier Gonzalez <javier@cnexlabs.com> wrote:
>> On 16 Jun 2018, at 00.27, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>
>> In current pblk implementation, l2p mapping for not closed lines
>> is always stored only in OOB metadata and recovered from it.
>>
>> Such a solution does not provide data integrity when drives does
>> not have such a OOB metadata space.
>>
>> The goal of this patch is to add support for so called packed
>> metadata, which store l2p mapping for open lines in last sector
>> of every write unit.
>>
>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>> ---
>> drivers/lightnvm/pblk-core.c     | 52 ++++++++++++++++++++++++++++++++++++----
>> drivers/lightnvm/pblk-init.c     | 37 ++++++++++++++++++++++++++--
>> drivers/lightnvm/pblk-rb.c       |  3 +++
>> drivers/lightnvm/pblk-recovery.c | 25 +++++++++++++++----
>> drivers/lightnvm/pblk-sysfs.c    |  7 ++++++
>> drivers/lightnvm/pblk-write.c    | 14 +++++++----
>> drivers/lightnvm/pblk.h          |  5 +++-
>> 7 files changed, 128 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index c092ee93a18d..375c6430612e 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -340,7 +340,7 @@ void pblk_write_should_kick(struct pblk *pblk)
>> {
>>       unsigned int secs_avail = pblk_rb_read_count(&pblk->rwb);
>>
>> -     if (secs_avail >= pblk->min_write_pgs)
>> +     if (secs_avail >= pblk->min_write_pgs_data)
>>               pblk_write_kick(pblk);
>> }
>>
>> @@ -371,7 +371,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line)
>>       struct pblk_line_meta *lm = &pblk->lm;
>>       struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>       struct list_head *move_list = NULL;
>> -     int vsc = le32_to_cpu(*line->vsc);
>> +     int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
>> +                     * (pblk->min_write_pgs - pblk->min_write_pgs_data);
>> +     int vsc = le32_to_cpu(*line->vsc) + packed_meta;
>>
>>       lockdep_assert_held(&line->lock);
>>
>> @@ -540,12 +542,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
>> }
>>
>> int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
>> -                unsigned long secs_to_flush)
>> +                unsigned long secs_to_flush, bool skip_meta)
>> {
>>       int max = pblk->sec_per_write;
>>       int min = pblk->min_write_pgs;
>>       int secs_to_sync = 0;
>>
>> +     if (skip_meta)
>> +             min = max = pblk->min_write_pgs_data;
>> +
>>       if (secs_avail >= max)
>>               secs_to_sync = max;
>>       else if (secs_avail >= min)
>> @@ -663,7 +668,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>> next_rq:
>>       memset(&rqd, 0, sizeof(struct nvm_rq));
>>
>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>       rq_len = rq_ppas * geo->csecs;
>>
>>       bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
>> @@ -2091,3 +2096,42 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>>       }
>>       spin_unlock(&pblk->trans_lock);
>> }
>> +
>> +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
>> +{
>> +     void *meta_list = rqd->meta_list;
>> +     void *page;
>> +     int i = 0;
>> +
>> +     if (pblk_is_oob_meta_supported(pblk))
>> +             return;
>> +
>> +     /* We need to zero out metadata corresponding to packed meta page */
>> +     pblk_get_meta_at(pblk, meta_list, rqd->nr_ppas - 1)->lba = ADDR_EMPTY;
>> +
>> +     page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
>> +     /* We need to fill last page of request (packed metadata)
>> +      * with data from oob meta buffer.
>> +      */
>> +     for (; i < rqd->nr_ppas; i++)
>> +             memcpy(page + (i * sizeof(struct pblk_sec_meta)),
>> +                     pblk_get_meta_at(pblk, meta_list, i),
>> +                     sizeof(struct pblk_sec_meta));
>> +}
>> +
>> +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
>> +{
>> +     void *meta_list = rqd->meta_list;
>> +     void *page;
>> +     int i = 0;
>> +
>> +     if (pblk_is_oob_meta_supported(pblk))
>> +             return;
>> +
>> +     page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
>> +     /* We need to fill oob meta buffer with data from packed metadata */
>> +     for (; i < rqd->nr_ppas; i++)
>> +             memcpy(pblk_get_meta_at(pblk, meta_list, i),
>> +                     page + (i * sizeof(struct pblk_sec_meta)),
>> +                     sizeof(struct pblk_sec_meta));
>> +}
>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>> index f05112230a52..5eb641da46ed 100644
>> --- a/drivers/lightnvm/pblk-init.c
>> +++ b/drivers/lightnvm/pblk-init.c
>> @@ -372,8 +372,40 @@ static int pblk_core_init(struct pblk *pblk)
>>       pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>>       max_write_ppas = pblk->min_write_pgs * geo->all_luns;
>>       pblk->max_write_pgs = min_t(int, max_write_ppas, NVM_MAX_VLBA);
>> +     pblk->min_write_pgs_data = pblk->min_write_pgs;
>>       pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
>>
>> +     if (!pblk_is_oob_meta_supported(pblk)) {
>> +             /* For drives which does not have OOB metadata feature
>> +              * in order to support recovery feature we need to use
>> +              * so called packed metadata. Packed metada will store
>> +              * the same information as OOB metadata (l2p table mapping,
>> +              * but in the form of the single page at the end of
>> +              * every write request.
>> +              */
>> +             if (pblk->min_write_pgs
>> +                     * sizeof(struct pblk_sec_meta) > PAGE_SIZE) {
>> +                     /* We want to keep all the packed metadata on single
>> +                      * page per write requests. So we need to ensure that
>> +                      * it will fit.
>> +                      *
>> +                      * This is more like sanity check, since there is
>> +                      * no device with such a big minimal write size
>> +                      * (above 1 metabytes).
>> +                      */
>> +                     pr_err("pblk: Not supported min write size\n");
>> +                     return -EINVAL;
>> +             }
>> +             /* For packed meta approach we do some simplification.
>> +              * On read path we always issue requests which size
>> +              * equal to max_write_pgs, with all pages filled with
>> +              * user payload except of last one page which will be
>> +              * filled with packed metadata.
>> +              */
>> +             pblk->max_write_pgs = pblk->min_write_pgs;
>> +             pblk->min_write_pgs_data = pblk->min_write_pgs - 1;
>> +     }
>> +
>>       if (pblk->max_write_pgs > PBLK_MAX_REQ_ADDRS) {
>>               pr_err("pblk: vector list too big(%u > %u)\n",
>>                               pblk->max_write_pgs, PBLK_MAX_REQ_ADDRS);
>> @@ -668,7 +700,7 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
>>       struct pblk_line_meta *lm = &pblk->lm;
>>       struct nvm_geo *geo = &dev->geo;
>>       sector_t provisioned;
>> -     int sec_meta, blk_meta;
>> +     int sec_meta, blk_meta, clba;
>>
>>       if (geo->op == NVM_TARGET_DEFAULT_OP)
>>               pblk->op = PBLK_DEFAULT_OP;
>> @@ -691,7 +723,8 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
>>       sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
>>       blk_meta = DIV_ROUND_UP(sec_meta, geo->clba);
>>
>> -     pblk->capacity = (provisioned - blk_meta) * geo->clba;
>> +     clba = (geo->clba / pblk->min_write_pgs) * pblk->min_write_pgs_data;
>> +     pblk->capacity = (provisioned - blk_meta) * clba;
>>
>>       atomic_set(&pblk->rl.free_blocks, nr_free_blks);
>>       atomic_set(&pblk->rl.free_user_blocks, nr_free_blks);
>> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
>> index a81a97e8ea6d..081e73e7978f 100644
>> --- a/drivers/lightnvm/pblk-rb.c
>> +++ b/drivers/lightnvm/pblk-rb.c
>> @@ -528,6 +528,9 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
>>               to_read = count;
>>       }
>>
>> +     /* Add space for packed metadata if in use*/
>> +     pad += (pblk->min_write_pgs - pblk->min_write_pgs_data);
>> +
>>       c_ctx->sentry = pos;
>>       c_ctx->nr_valid = to_read;
>>       c_ctx->nr_padded = pad;
>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>> index f5853fc77a0c..0fab18fe30d9 100644
>> --- a/drivers/lightnvm/pblk-recovery.c
>> +++ b/drivers/lightnvm/pblk-recovery.c
>> @@ -138,7 +138,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>> next_read_rq:
>>       memset(rqd, 0, pblk_g_rq_size);
>>
>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>       if (!rq_ppas)
>>               rq_ppas = pblk->min_write_pgs;
>>       rq_len = rq_ppas * geo->csecs;
>> @@ -198,6 +198,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>>               return -EINTR;
>>       }
>>
>> +     pblk_get_packed_meta(pblk, rqd);
>>       for (i = 0; i < rqd->nr_ppas; i++) {
>>               u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>                                                       meta_list, i)->lba);
>> @@ -272,7 +273,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>>       kref_init(&pad_rq->ref);
>>
>> next_pad_rq:
>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>       if (rq_ppas < pblk->min_write_pgs) {
>>               pr_err("pblk: corrupted pad line %d\n", line->id);
>>               goto fail_free_pad;
>> @@ -418,7 +419,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>> next_rq:
>>       memset(rqd, 0, pblk_g_rq_size);
>>
>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>       if (!rq_ppas)
>>               rq_ppas = pblk->min_write_pgs;
>>       rq_len = rq_ppas * geo->csecs;
>> @@ -475,6 +476,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>>        */
>>       if (!rec_round++ && !rqd->error) {
>>               rec_round = 0;
>> +             pblk_get_packed_meta(pblk, rqd);
>>               for (i = 0; i < rqd->nr_ppas; i++, r_ptr++) {
>>                       u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>                                                       meta_list, i)->lba);
>> @@ -492,6 +494,12 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>>               int ret;
>>
>>               bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
>> +             if (!pblk_is_oob_meta_supported(pblk) && bit > 0) {
>> +                     /* This case should not happen since we always read in
>> +                      * the same unit here as we wrote in writer thread.
>> +                      */
>> +                     pr_err("pblk: Inconsistent packed metadata read\n");
>> +             }
>>               nr_error_bits = rqd->nr_ppas - bit;
>>
>>               /* Roll back failed sectors */
>> @@ -550,7 +558,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>> next_rq:
>>       memset(rqd, 0, pblk_g_rq_size);
>>
>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>       if (!rq_ppas)
>>               rq_ppas = pblk->min_write_pgs;
>>       rq_len = rq_ppas * geo->csecs;
>> @@ -608,6 +616,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>               int nr_error_bits, bit;
>>
>>               bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
>> +             if (!pblk_is_oob_meta_supported(pblk)) {
>> +                     /* For packed metadata we do not handle partially
>> +                      * written requests here, since metadata is always
>> +                      * in last page on the requests.
>> +                      */
>> +                     bit = 0;
>> +                     *done = 0;
>> +             }
>>               nr_error_bits = rqd->nr_ppas - bit;
>>
>>               /* Roll back failed sectors */
>> @@ -622,6 +638,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>                       *done = 0;
>>       }
>>
>> +     pblk_get_packed_meta(pblk, rqd);
>>       for (i = 0; i < rqd->nr_ppas; i++) {
>>               u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>                                                       meta_list, i)->lba);
>> diff --git a/drivers/lightnvm/pblk-sysfs.c b/drivers/lightnvm/pblk-sysfs.c
>> index b0e5e93a9d5f..aa7b4164ce9e 100644
>> --- a/drivers/lightnvm/pblk-sysfs.c
>> +++ b/drivers/lightnvm/pblk-sysfs.c
>> @@ -473,6 +473,13 @@ static ssize_t pblk_sysfs_set_sec_per_write(struct pblk *pblk,
>>       if (kstrtouint(page, 0, &sec_per_write))
>>               return -EINVAL;
>>
>> +     if (!pblk_is_oob_meta_supported(pblk)) {
>> +             /* For packed metadata case it is
>> +              * not allowed to change sec_per_write.
>> +              */
>> +             return -EINVAL;
>> +     }
>> +
>>       if (sec_per_write < pblk->min_write_pgs
>>                               || sec_per_write > pblk->max_write_pgs
>>                               || sec_per_write % pblk->min_write_pgs != 0)
>> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
>> index 6552db35f916..bb45c7e6c375 100644
>> --- a/drivers/lightnvm/pblk-write.c
>> +++ b/drivers/lightnvm/pblk-write.c
>> @@ -354,7 +354,7 @@ static int pblk_calc_secs_to_sync(struct pblk *pblk, unsigned int secs_avail,
>> {
>>       int secs_to_sync;
>>
>> -     secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush);
>> +     secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush, true);
>>
>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>       if ((!secs_to_sync && secs_to_flush)
>> @@ -522,6 +522,11 @@ static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
>>               return NVM_IO_ERR;
>>       }
>>
>> +     /* This is the first place when we have write requests mapped
>> +      * and we can fill packed metadata with l2p mappings.
>> +      */
>> +     pblk_set_packed_meta(pblk, rqd);
>> +
>>       meta_line = pblk_should_submit_meta_io(pblk, rqd);
>>
>>       /* Submit data write for current data line */
>> @@ -572,7 +577,7 @@ static int pblk_submit_write(struct pblk *pblk)
>>       struct bio *bio;
>>       struct nvm_rq *rqd;
>>       unsigned int secs_avail, secs_to_sync, secs_to_com;
>> -     unsigned int secs_to_flush;
>> +     unsigned int secs_to_flush, packed_meta_pgs;
>>       unsigned long pos;
>>       unsigned int resubmit;
>>
>> @@ -608,7 +613,7 @@ static int pblk_submit_write(struct pblk *pblk)
>>                       return 1;
>>
>>               secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
>> -             if (!secs_to_flush && secs_avail < pblk->min_write_pgs)
>> +             if (!secs_to_flush && secs_avail < pblk->min_write_pgs_data)
>>                       return 1;
>>
>>               secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail,
>> @@ -623,7 +628,8 @@ static int pblk_submit_write(struct pblk *pblk)
>>               pos = pblk_rb_read_commit(&pblk->rwb, secs_to_com);
>>       }
>>
>> -     bio = bio_alloc(GFP_KERNEL, secs_to_sync);
>> +     packed_meta_pgs = (pblk->min_write_pgs - pblk->min_write_pgs_data);
>> +     bio = bio_alloc(GFP_KERNEL, secs_to_sync + packed_meta_pgs);
>>
>>       bio->bi_iter.bi_sector = 0; /* internal bio */
>>       bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>> index 4c61ede5b207..c95ecd8bcf79 100644
>> --- a/drivers/lightnvm/pblk.h
>> +++ b/drivers/lightnvm/pblk.h
>> @@ -605,6 +605,7 @@ struct pblk {
>>       int state;                      /* pblk line state */
>>
>>       int min_write_pgs; /* Minimum amount of pages required by controller */
>> +     int min_write_pgs_data; /* Minimum amount of payload pages */
>>       int max_write_pgs; /* Maximum amount of pages supported by controller */
>>
>>       sector_t capacity; /* Device capacity when bad blocks are subtracted */
>> @@ -798,7 +799,7 @@ void pblk_dealloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>> u64 pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>> u64 __pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>> int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
>> -                unsigned long secs_to_flush);
>> +                unsigned long secs_to_flush, bool skip_meta);
>> void pblk_up_page(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas);
>> void pblk_down_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
>>                 unsigned long *lun_bitmap);
>> @@ -823,6 +824,8 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>>                         u64 *lba_list, int nr_secs);
>> void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
>>                        sector_t blba, int nr_secs);
>> +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
>> +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
>>
>> /*
>>  * pblk user I/O write path
>> --
>> 2.14.3
>
> The functionality is good. A couple of comments though:
>   - Have you considered the case when the device reports ws_min = ws_opt
>     = 4096? Maybe checks preventing this case would be a good idea.
>   - Have you checked for any conflicts in the write recovery path? You
>     would need to deal with more corner cases (e.g., a write failure in
>     the metadata page would require re-sending always all other sectors).
>   - There are also corner cases on scan recovery: what if the metadata
>     page cannot be read? In this case, data loss (at a drive level) will
>     happen, but you will need to guarantee that reading garbage will not
>     corrupt other data. In the OOB area case this is not a problem
>     because data and metadata are always good/not good simultaneously.

What I understood from Igor, is that for example for 256KB writes (as
WS_MIN), the last 4 KB would be metadata. In that case, it would be
either success or failure. This may also fix your second point.

>   - I think it would be good to update the WA counters, since there is a
>     significant write and space amplification in using (1 / ws_opt)
>     pages for metadata.
>
> Javier

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk.
  2018-06-19 12:42     ` Matias Bjørling
@ 2018-06-19 12:47       ` Javier Gonzalez
  2018-06-19 22:20         ` Igor Konopko
  0 siblings, 1 reply; 25+ messages in thread
From: Javier Gonzalez @ 2018-06-19 12:47 UTC (permalink / raw)
  To: Matias Bjørling
  Cc: Konopko, Igor J, linux-block, michal.sorn, marcin.dziegielewski

[-- Attachment #1: Type: text/plain, Size: 20279 bytes --]

> On 19 Jun 2018, at 14.42, Matias Bjørling <mb@lightnvm.io> wrote:
> 
> On Tue, Jun 19, 2018 at 1:08 PM, Javier Gonzalez <javier@cnexlabs.com> wrote:
>>> On 16 Jun 2018, at 00.27, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>> 
>>> In current pblk implementation, l2p mapping for not closed lines
>>> is always stored only in OOB metadata and recovered from it.
>>> 
>>> Such a solution does not provide data integrity when drives does
>>> not have such a OOB metadata space.
>>> 
>>> The goal of this patch is to add support for so called packed
>>> metadata, which store l2p mapping for open lines in last sector
>>> of every write unit.
>>> 
>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>> ---
>>> drivers/lightnvm/pblk-core.c     | 52 ++++++++++++++++++++++++++++++++++++----
>>> drivers/lightnvm/pblk-init.c     | 37 ++++++++++++++++++++++++++--
>>> drivers/lightnvm/pblk-rb.c       |  3 +++
>>> drivers/lightnvm/pblk-recovery.c | 25 +++++++++++++++----
>>> drivers/lightnvm/pblk-sysfs.c    |  7 ++++++
>>> drivers/lightnvm/pblk-write.c    | 14 +++++++----
>>> drivers/lightnvm/pblk.h          |  5 +++-
>>> 7 files changed, 128 insertions(+), 15 deletions(-)
>>> 
>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>> index c092ee93a18d..375c6430612e 100644
>>> --- a/drivers/lightnvm/pblk-core.c
>>> +++ b/drivers/lightnvm/pblk-core.c
>>> @@ -340,7 +340,7 @@ void pblk_write_should_kick(struct pblk *pblk)
>>> {
>>>      unsigned int secs_avail = pblk_rb_read_count(&pblk->rwb);
>>> 
>>> -     if (secs_avail >= pblk->min_write_pgs)
>>> +     if (secs_avail >= pblk->min_write_pgs_data)
>>>              pblk_write_kick(pblk);
>>> }
>>> 
>>> @@ -371,7 +371,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line)
>>>      struct pblk_line_meta *lm = &pblk->lm;
>>>      struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>>      struct list_head *move_list = NULL;
>>> -     int vsc = le32_to_cpu(*line->vsc);
>>> +     int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
>>> +                     * (pblk->min_write_pgs - pblk->min_write_pgs_data);
>>> +     int vsc = le32_to_cpu(*line->vsc) + packed_meta;
>>> 
>>>      lockdep_assert_held(&line->lock);
>>> 
>>> @@ -540,12 +542,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
>>> }
>>> 
>>> int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
>>> -                unsigned long secs_to_flush)
>>> +                unsigned long secs_to_flush, bool skip_meta)
>>> {
>>>      int max = pblk->sec_per_write;
>>>      int min = pblk->min_write_pgs;
>>>      int secs_to_sync = 0;
>>> 
>>> +     if (skip_meta)
>>> +             min = max = pblk->min_write_pgs_data;
>>> +
>>>      if (secs_avail >= max)
>>>              secs_to_sync = max;
>>>      else if (secs_avail >= min)
>>> @@ -663,7 +668,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>>> next_rq:
>>>      memset(&rqd, 0, sizeof(struct nvm_rq));
>>> 
>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>      rq_len = rq_ppas * geo->csecs;
>>> 
>>>      bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
>>> @@ -2091,3 +2096,42 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>>>      }
>>>      spin_unlock(&pblk->trans_lock);
>>> }
>>> +
>>> +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
>>> +{
>>> +     void *meta_list = rqd->meta_list;
>>> +     void *page;
>>> +     int i = 0;
>>> +
>>> +     if (pblk_is_oob_meta_supported(pblk))
>>> +             return;
>>> +
>>> +     /* We need to zero out metadata corresponding to packed meta page */
>>> +     pblk_get_meta_at(pblk, meta_list, rqd->nr_ppas - 1)->lba = ADDR_EMPTY;
>>> +
>>> +     page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
>>> +     /* We need to fill last page of request (packed metadata)
>>> +      * with data from oob meta buffer.
>>> +      */
>>> +     for (; i < rqd->nr_ppas; i++)
>>> +             memcpy(page + (i * sizeof(struct pblk_sec_meta)),
>>> +                     pblk_get_meta_at(pblk, meta_list, i),
>>> +                     sizeof(struct pblk_sec_meta));
>>> +}
>>> +
>>> +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
>>> +{
>>> +     void *meta_list = rqd->meta_list;
>>> +     void *page;
>>> +     int i = 0;
>>> +
>>> +     if (pblk_is_oob_meta_supported(pblk))
>>> +             return;
>>> +
>>> +     page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
>>> +     /* We need to fill oob meta buffer with data from packed metadata */
>>> +     for (; i < rqd->nr_ppas; i++)
>>> +             memcpy(pblk_get_meta_at(pblk, meta_list, i),
>>> +                     page + (i * sizeof(struct pblk_sec_meta)),
>>> +                     sizeof(struct pblk_sec_meta));
>>> +}
>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>> index f05112230a52..5eb641da46ed 100644
>>> --- a/drivers/lightnvm/pblk-init.c
>>> +++ b/drivers/lightnvm/pblk-init.c
>>> @@ -372,8 +372,40 @@ static int pblk_core_init(struct pblk *pblk)
>>>      pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>>>      max_write_ppas = pblk->min_write_pgs * geo->all_luns;
>>>      pblk->max_write_pgs = min_t(int, max_write_ppas, NVM_MAX_VLBA);
>>> +     pblk->min_write_pgs_data = pblk->min_write_pgs;
>>>      pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
>>> 
>>> +     if (!pblk_is_oob_meta_supported(pblk)) {
>>> +             /* For drives which does not have OOB metadata feature
>>> +              * in order to support recovery feature we need to use
>>> +              * so called packed metadata. Packed metada will store
>>> +              * the same information as OOB metadata (l2p table mapping,
>>> +              * but in the form of the single page at the end of
>>> +              * every write request.
>>> +              */
>>> +             if (pblk->min_write_pgs
>>> +                     * sizeof(struct pblk_sec_meta) > PAGE_SIZE) {
>>> +                     /* We want to keep all the packed metadata on single
>>> +                      * page per write requests. So we need to ensure that
>>> +                      * it will fit.
>>> +                      *
>>> +                      * This is more like sanity check, since there is
>>> +                      * no device with such a big minimal write size
>>> +                      * (above 1 metabytes).
>>> +                      */
>>> +                     pr_err("pblk: Not supported min write size\n");
>>> +                     return -EINVAL;
>>> +             }
>>> +             /* For packed meta approach we do some simplification.
>>> +              * On read path we always issue requests which size
>>> +              * equal to max_write_pgs, with all pages filled with
>>> +              * user payload except of last one page which will be
>>> +              * filled with packed metadata.
>>> +              */
>>> +             pblk->max_write_pgs = pblk->min_write_pgs;
>>> +             pblk->min_write_pgs_data = pblk->min_write_pgs - 1;
>>> +     }
>>> +
>>>      if (pblk->max_write_pgs > PBLK_MAX_REQ_ADDRS) {
>>>              pr_err("pblk: vector list too big(%u > %u)\n",
>>>                              pblk->max_write_pgs, PBLK_MAX_REQ_ADDRS);
>>> @@ -668,7 +700,7 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
>>>      struct pblk_line_meta *lm = &pblk->lm;
>>>      struct nvm_geo *geo = &dev->geo;
>>>      sector_t provisioned;
>>> -     int sec_meta, blk_meta;
>>> +     int sec_meta, blk_meta, clba;
>>> 
>>>      if (geo->op == NVM_TARGET_DEFAULT_OP)
>>>              pblk->op = PBLK_DEFAULT_OP;
>>> @@ -691,7 +723,8 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
>>>      sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
>>>      blk_meta = DIV_ROUND_UP(sec_meta, geo->clba);
>>> 
>>> -     pblk->capacity = (provisioned - blk_meta) * geo->clba;
>>> +     clba = (geo->clba / pblk->min_write_pgs) * pblk->min_write_pgs_data;
>>> +     pblk->capacity = (provisioned - blk_meta) * clba;
>>> 
>>>      atomic_set(&pblk->rl.free_blocks, nr_free_blks);
>>>      atomic_set(&pblk->rl.free_user_blocks, nr_free_blks);
>>> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
>>> index a81a97e8ea6d..081e73e7978f 100644
>>> --- a/drivers/lightnvm/pblk-rb.c
>>> +++ b/drivers/lightnvm/pblk-rb.c
>>> @@ -528,6 +528,9 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
>>>              to_read = count;
>>>      }
>>> 
>>> +     /* Add space for packed metadata if in use*/
>>> +     pad += (pblk->min_write_pgs - pblk->min_write_pgs_data);
>>> +
>>>      c_ctx->sentry = pos;
>>>      c_ctx->nr_valid = to_read;
>>>      c_ctx->nr_padded = pad;
>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>>> index f5853fc77a0c..0fab18fe30d9 100644
>>> --- a/drivers/lightnvm/pblk-recovery.c
>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>> @@ -138,7 +138,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>>> next_read_rq:
>>>      memset(rqd, 0, pblk_g_rq_size);
>>> 
>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>      if (!rq_ppas)
>>>              rq_ppas = pblk->min_write_pgs;
>>>      rq_len = rq_ppas * geo->csecs;
>>> @@ -198,6 +198,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>>>              return -EINTR;
>>>      }
>>> 
>>> +     pblk_get_packed_meta(pblk, rqd);
>>>      for (i = 0; i < rqd->nr_ppas; i++) {
>>>              u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>>                                                      meta_list, i)->lba);
>>> @@ -272,7 +273,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>>>      kref_init(&pad_rq->ref);
>>> 
>>> next_pad_rq:
>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>      if (rq_ppas < pblk->min_write_pgs) {
>>>              pr_err("pblk: corrupted pad line %d\n", line->id);
>>>              goto fail_free_pad;
>>> @@ -418,7 +419,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>>> next_rq:
>>>      memset(rqd, 0, pblk_g_rq_size);
>>> 
>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>      if (!rq_ppas)
>>>              rq_ppas = pblk->min_write_pgs;
>>>      rq_len = rq_ppas * geo->csecs;
>>> @@ -475,6 +476,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>>>       */
>>>      if (!rec_round++ && !rqd->error) {
>>>              rec_round = 0;
>>> +             pblk_get_packed_meta(pblk, rqd);
>>>              for (i = 0; i < rqd->nr_ppas; i++, r_ptr++) {
>>>                      u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>>                                                      meta_list, i)->lba);
>>> @@ -492,6 +494,12 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>>>              int ret;
>>> 
>>>              bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
>>> +             if (!pblk_is_oob_meta_supported(pblk) && bit > 0) {
>>> +                     /* This case should not happen since we always read in
>>> +                      * the same unit here as we wrote in writer thread.
>>> +                      */
>>> +                     pr_err("pblk: Inconsistent packed metadata read\n");
>>> +             }
>>>              nr_error_bits = rqd->nr_ppas - bit;
>>> 
>>>              /* Roll back failed sectors */
>>> @@ -550,7 +558,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>> next_rq:
>>>      memset(rqd, 0, pblk_g_rq_size);
>>> 
>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>      if (!rq_ppas)
>>>              rq_ppas = pblk->min_write_pgs;
>>>      rq_len = rq_ppas * geo->csecs;
>>> @@ -608,6 +616,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>              int nr_error_bits, bit;
>>> 
>>>              bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
>>> +             if (!pblk_is_oob_meta_supported(pblk)) {
>>> +                     /* For packed metadata we do not handle partially
>>> +                      * written requests here, since metadata is always
>>> +                      * in last page on the requests.
>>> +                      */
>>> +                     bit = 0;
>>> +                     *done = 0;
>>> +             }
>>>              nr_error_bits = rqd->nr_ppas - bit;
>>> 
>>>              /* Roll back failed sectors */
>>> @@ -622,6 +638,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>                      *done = 0;
>>>      }
>>> 
>>> +     pblk_get_packed_meta(pblk, rqd);
>>>      for (i = 0; i < rqd->nr_ppas; i++) {
>>>              u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>>                                                      meta_list, i)->lba);
>>> diff --git a/drivers/lightnvm/pblk-sysfs.c b/drivers/lightnvm/pblk-sysfs.c
>>> index b0e5e93a9d5f..aa7b4164ce9e 100644
>>> --- a/drivers/lightnvm/pblk-sysfs.c
>>> +++ b/drivers/lightnvm/pblk-sysfs.c
>>> @@ -473,6 +473,13 @@ static ssize_t pblk_sysfs_set_sec_per_write(struct pblk *pblk,
>>>      if (kstrtouint(page, 0, &sec_per_write))
>>>              return -EINVAL;
>>> 
>>> +     if (!pblk_is_oob_meta_supported(pblk)) {
>>> +             /* For packed metadata case it is
>>> +              * not allowed to change sec_per_write.
>>> +              */
>>> +             return -EINVAL;
>>> +     }
>>> +
>>>      if (sec_per_write < pblk->min_write_pgs
>>>                              || sec_per_write > pblk->max_write_pgs
>>>                              || sec_per_write % pblk->min_write_pgs != 0)
>>> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
>>> index 6552db35f916..bb45c7e6c375 100644
>>> --- a/drivers/lightnvm/pblk-write.c
>>> +++ b/drivers/lightnvm/pblk-write.c
>>> @@ -354,7 +354,7 @@ static int pblk_calc_secs_to_sync(struct pblk *pblk, unsigned int secs_avail,
>>> {
>>>      int secs_to_sync;
>>> 
>>> -     secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush);
>>> +     secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush, true);
>>> 
>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>      if ((!secs_to_sync && secs_to_flush)
>>> @@ -522,6 +522,11 @@ static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
>>>              return NVM_IO_ERR;
>>>      }
>>> 
>>> +     /* This is the first place when we have write requests mapped
>>> +      * and we can fill packed metadata with l2p mappings.
>>> +      */
>>> +     pblk_set_packed_meta(pblk, rqd);
>>> +
>>>      meta_line = pblk_should_submit_meta_io(pblk, rqd);
>>> 
>>>      /* Submit data write for current data line */
>>> @@ -572,7 +577,7 @@ static int pblk_submit_write(struct pblk *pblk)
>>>      struct bio *bio;
>>>      struct nvm_rq *rqd;
>>>      unsigned int secs_avail, secs_to_sync, secs_to_com;
>>> -     unsigned int secs_to_flush;
>>> +     unsigned int secs_to_flush, packed_meta_pgs;
>>>      unsigned long pos;
>>>      unsigned int resubmit;
>>> 
>>> @@ -608,7 +613,7 @@ static int pblk_submit_write(struct pblk *pblk)
>>>                      return 1;
>>> 
>>>              secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
>>> -             if (!secs_to_flush && secs_avail < pblk->min_write_pgs)
>>> +             if (!secs_to_flush && secs_avail < pblk->min_write_pgs_data)
>>>                      return 1;
>>> 
>>>              secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail,
>>> @@ -623,7 +628,8 @@ static int pblk_submit_write(struct pblk *pblk)
>>>              pos = pblk_rb_read_commit(&pblk->rwb, secs_to_com);
>>>      }
>>> 
>>> -     bio = bio_alloc(GFP_KERNEL, secs_to_sync);
>>> +     packed_meta_pgs = (pblk->min_write_pgs - pblk->min_write_pgs_data);
>>> +     bio = bio_alloc(GFP_KERNEL, secs_to_sync + packed_meta_pgs);
>>> 
>>>      bio->bi_iter.bi_sector = 0; /* internal bio */
>>>      bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>> index 4c61ede5b207..c95ecd8bcf79 100644
>>> --- a/drivers/lightnvm/pblk.h
>>> +++ b/drivers/lightnvm/pblk.h
>>> @@ -605,6 +605,7 @@ struct pblk {
>>>      int state;                      /* pblk line state */
>>> 
>>>      int min_write_pgs; /* Minimum amount of pages required by controller */
>>> +     int min_write_pgs_data; /* Minimum amount of payload pages */
>>>      int max_write_pgs; /* Maximum amount of pages supported by controller */
>>> 
>>>      sector_t capacity; /* Device capacity when bad blocks are subtracted */
>>> @@ -798,7 +799,7 @@ void pblk_dealloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>>> u64 pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>>> u64 __pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>>> int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
>>> -                unsigned long secs_to_flush);
>>> +                unsigned long secs_to_flush, bool skip_meta);
>>> void pblk_up_page(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas);
>>> void pblk_down_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
>>>                unsigned long *lun_bitmap);
>>> @@ -823,6 +824,8 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>>>                        u64 *lba_list, int nr_secs);
>>> void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
>>>                       sector_t blba, int nr_secs);
>>> +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
>>> +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
>>> 
>>> /*
>>> * pblk user I/O write path
>>> --
>>> 2.14.3
>> 
>> The functionality is good. A couple of comments though:
>>  - Have you considered the case when the device reports ws_min = ws_opt
>>    = 4096? Maybe checks preventing this case would be a good idea.
>>  - Have you checked for any conflicts in the write recovery path? You
>>    would need to deal with more corner cases (e.g., a write failure in
>>    the metadata page would require re-sending always all other sectors).
>>  - There are also corner cases on scan recovery: what if the metadata
>>    page cannot be read? In this case, data loss (at a drive level) will
>>    happen, but you will need to guarantee that reading garbage will not
>>    corrupt other data. In the OOB area case this is not a problem
>>    because data and metadata are always good/not good simultaneously.
> 
> What I understood from Igor, is that for example for 256KB writes (as
> WS_MIN), the last 4 KB would be metadata. In that case, it would be
> either success or failure. This may also fix your second point.

On the write patch this depends on the controller - it can either fail
the whole write or only fail the actual page failing (e.g., in the case
that we have multi plane). This is not defined at the spec level. On
pblk we recover everything now since it is the worst case, but in the
moment we have page dependencies, these should be explicit.

On the read path, a single 4KB sector can fail, so the case needs
to be handled either way.

> 
>>  - I think it would be good to update the WA counters, since there is a
>>    significant write and space amplification in using (1 / ws_opt)
>>    pages for metadata.
>> 
>> Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk.
  2018-06-19 12:47       ` Javier Gonzalez
@ 2018-06-19 22:20         ` Igor Konopko
  2018-06-20  7:13           ` Javier Gonzalez
  0 siblings, 1 reply; 25+ messages in thread
From: Igor Konopko @ 2018-06-19 22:20 UTC (permalink / raw)
  To: Javier Gonzalez, Matias Bjørling
  Cc: linux-block, michal.sorn, marcin.dziegielewski



On 19.06.2018 05:47, Javier Gonzalez wrote:
>> On 19 Jun 2018, at 14.42, Matias Bjørling <mb@lightnvm.io> wrote:
>>
>> On Tue, Jun 19, 2018 at 1:08 PM, Javier Gonzalez <javier@cnexlabs.com> wrote:
>>>> On 16 Jun 2018, at 00.27, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>
>>>> In current pblk implementation, l2p mapping for not closed lines
>>>> is always stored only in OOB metadata and recovered from it.
>>>>
>>>> Such a solution does not provide data integrity when drives does
>>>> not have such a OOB metadata space.
>>>>
>>>> The goal of this patch is to add support for so called packed
>>>> metadata, which store l2p mapping for open lines in last sector
>>>> of every write unit.
>>>>
>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>> ---
>>>> drivers/lightnvm/pblk-core.c     | 52 ++++++++++++++++++++++++++++++++++++----
>>>> drivers/lightnvm/pblk-init.c     | 37 ++++++++++++++++++++++++++--
>>>> drivers/lightnvm/pblk-rb.c       |  3 +++
>>>> drivers/lightnvm/pblk-recovery.c | 25 +++++++++++++++----
>>>> drivers/lightnvm/pblk-sysfs.c    |  7 ++++++
>>>> drivers/lightnvm/pblk-write.c    | 14 +++++++----
>>>> drivers/lightnvm/pblk.h          |  5 +++-
>>>> 7 files changed, 128 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>> index c092ee93a18d..375c6430612e 100644
>>>> --- a/drivers/lightnvm/pblk-core.c
>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>> @@ -340,7 +340,7 @@ void pblk_write_should_kick(struct pblk *pblk)
>>>> {
>>>>       unsigned int secs_avail = pblk_rb_read_count(&pblk->rwb);
>>>>
>>>> -     if (secs_avail >= pblk->min_write_pgs)
>>>> +     if (secs_avail >= pblk->min_write_pgs_data)
>>>>               pblk_write_kick(pblk);
>>>> }
>>>>
>>>> @@ -371,7 +371,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line)
>>>>       struct pblk_line_meta *lm = &pblk->lm;
>>>>       struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>>>       struct list_head *move_list = NULL;
>>>> -     int vsc = le32_to_cpu(*line->vsc);
>>>> +     int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
>>>> +                     * (pblk->min_write_pgs - pblk->min_write_pgs_data);
>>>> +     int vsc = le32_to_cpu(*line->vsc) + packed_meta;
>>>>
>>>>       lockdep_assert_held(&line->lock);
>>>>
>>>> @@ -540,12 +542,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
>>>> }
>>>>
>>>> int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
>>>> -                unsigned long secs_to_flush)
>>>> +                unsigned long secs_to_flush, bool skip_meta)
>>>> {
>>>>       int max = pblk->sec_per_write;
>>>>       int min = pblk->min_write_pgs;
>>>>       int secs_to_sync = 0;
>>>>
>>>> +     if (skip_meta)
>>>> +             min = max = pblk->min_write_pgs_data;
>>>> +
>>>>       if (secs_avail >= max)
>>>>               secs_to_sync = max;
>>>>       else if (secs_avail >= min)
>>>> @@ -663,7 +668,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>>>> next_rq:
>>>>       memset(&rqd, 0, sizeof(struct nvm_rq));
>>>>
>>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>>       rq_len = rq_ppas * geo->csecs;
>>>>
>>>>       bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
>>>> @@ -2091,3 +2096,42 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>>>>       }
>>>>       spin_unlock(&pblk->trans_lock);
>>>> }
>>>> +
>>>> +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
>>>> +{
>>>> +     void *meta_list = rqd->meta_list;
>>>> +     void *page;
>>>> +     int i = 0;
>>>> +
>>>> +     if (pblk_is_oob_meta_supported(pblk))
>>>> +             return;
>>>> +
>>>> +     /* We need to zero out metadata corresponding to packed meta page */
>>>> +     pblk_get_meta_at(pblk, meta_list, rqd->nr_ppas - 1)->lba = ADDR_EMPTY;
>>>> +
>>>> +     page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
>>>> +     /* We need to fill last page of request (packed metadata)
>>>> +      * with data from oob meta buffer.
>>>> +      */
>>>> +     for (; i < rqd->nr_ppas; i++)
>>>> +             memcpy(page + (i * sizeof(struct pblk_sec_meta)),
>>>> +                     pblk_get_meta_at(pblk, meta_list, i),
>>>> +                     sizeof(struct pblk_sec_meta));
>>>> +}
>>>> +
>>>> +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
>>>> +{
>>>> +     void *meta_list = rqd->meta_list;
>>>> +     void *page;
>>>> +     int i = 0;
>>>> +
>>>> +     if (pblk_is_oob_meta_supported(pblk))
>>>> +             return;
>>>> +
>>>> +     page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
>>>> +     /* We need to fill oob meta buffer with data from packed metadata */
>>>> +     for (; i < rqd->nr_ppas; i++)
>>>> +             memcpy(pblk_get_meta_at(pblk, meta_list, i),
>>>> +                     page + (i * sizeof(struct pblk_sec_meta)),
>>>> +                     sizeof(struct pblk_sec_meta));
>>>> +}
>>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>>> index f05112230a52..5eb641da46ed 100644
>>>> --- a/drivers/lightnvm/pblk-init.c
>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>> @@ -372,8 +372,40 @@ static int pblk_core_init(struct pblk *pblk)
>>>>       pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>>>>       max_write_ppas = pblk->min_write_pgs * geo->all_luns;
>>>>       pblk->max_write_pgs = min_t(int, max_write_ppas, NVM_MAX_VLBA);
>>>> +     pblk->min_write_pgs_data = pblk->min_write_pgs;
>>>>       pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
>>>>
>>>> +     if (!pblk_is_oob_meta_supported(pblk)) {
>>>> +             /* For drives which does not have OOB metadata feature
>>>> +              * in order to support recovery feature we need to use
>>>> +              * so called packed metadata. Packed metada will store
>>>> +              * the same information as OOB metadata (l2p table mapping,
>>>> +              * but in the form of the single page at the end of
>>>> +              * every write request.
>>>> +              */
>>>> +             if (pblk->min_write_pgs
>>>> +                     * sizeof(struct pblk_sec_meta) > PAGE_SIZE) {
>>>> +                     /* We want to keep all the packed metadata on single
>>>> +                      * page per write requests. So we need to ensure that
>>>> +                      * it will fit.
>>>> +                      *
>>>> +                      * This is more like sanity check, since there is
>>>> +                      * no device with such a big minimal write size
>>>> +                      * (above 1 metabytes).
>>>> +                      */
>>>> +                     pr_err("pblk: Not supported min write size\n");
>>>> +                     return -EINVAL;
>>>> +             }
>>>> +             /* For packed meta approach we do some simplification.
>>>> +              * On read path we always issue requests which size
>>>> +              * equal to max_write_pgs, with all pages filled with
>>>> +              * user payload except of last one page which will be
>>>> +              * filled with packed metadata.
>>>> +              */
>>>> +             pblk->max_write_pgs = pblk->min_write_pgs;
>>>> +             pblk->min_write_pgs_data = pblk->min_write_pgs - 1;
>>>> +     }
>>>> +
>>>>       if (pblk->max_write_pgs > PBLK_MAX_REQ_ADDRS) {
>>>>               pr_err("pblk: vector list too big(%u > %u)\n",
>>>>                               pblk->max_write_pgs, PBLK_MAX_REQ_ADDRS);
>>>> @@ -668,7 +700,7 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
>>>>       struct pblk_line_meta *lm = &pblk->lm;
>>>>       struct nvm_geo *geo = &dev->geo;
>>>>       sector_t provisioned;
>>>> -     int sec_meta, blk_meta;
>>>> +     int sec_meta, blk_meta, clba;
>>>>
>>>>       if (geo->op == NVM_TARGET_DEFAULT_OP)
>>>>               pblk->op = PBLK_DEFAULT_OP;
>>>> @@ -691,7 +723,8 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
>>>>       sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
>>>>       blk_meta = DIV_ROUND_UP(sec_meta, geo->clba);
>>>>
>>>> -     pblk->capacity = (provisioned - blk_meta) * geo->clba;
>>>> +     clba = (geo->clba / pblk->min_write_pgs) * pblk->min_write_pgs_data;
>>>> +     pblk->capacity = (provisioned - blk_meta) * clba;
>>>>
>>>>       atomic_set(&pblk->rl.free_blocks, nr_free_blks);
>>>>       atomic_set(&pblk->rl.free_user_blocks, nr_free_blks);
>>>> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
>>>> index a81a97e8ea6d..081e73e7978f 100644
>>>> --- a/drivers/lightnvm/pblk-rb.c
>>>> +++ b/drivers/lightnvm/pblk-rb.c
>>>> @@ -528,6 +528,9 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
>>>>               to_read = count;
>>>>       }
>>>>
>>>> +     /* Add space for packed metadata if in use*/
>>>> +     pad += (pblk->min_write_pgs - pblk->min_write_pgs_data);
>>>> +
>>>>       c_ctx->sentry = pos;
>>>>       c_ctx->nr_valid = to_read;
>>>>       c_ctx->nr_padded = pad;
>>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>>>> index f5853fc77a0c..0fab18fe30d9 100644
>>>> --- a/drivers/lightnvm/pblk-recovery.c
>>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>>> @@ -138,7 +138,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>>>> next_read_rq:
>>>>       memset(rqd, 0, pblk_g_rq_size);
>>>>
>>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>>       if (!rq_ppas)
>>>>               rq_ppas = pblk->min_write_pgs;
>>>>       rq_len = rq_ppas * geo->csecs;
>>>> @@ -198,6 +198,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>>>>               return -EINTR;
>>>>       }
>>>>
>>>> +     pblk_get_packed_meta(pblk, rqd);
>>>>       for (i = 0; i < rqd->nr_ppas; i++) {
>>>>               u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>>>                                                       meta_list, i)->lba);
>>>> @@ -272,7 +273,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>>>>       kref_init(&pad_rq->ref);
>>>>
>>>> next_pad_rq:
>>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>>       if (rq_ppas < pblk->min_write_pgs) {
>>>>               pr_err("pblk: corrupted pad line %d\n", line->id);
>>>>               goto fail_free_pad;
>>>> @@ -418,7 +419,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>>>> next_rq:
>>>>       memset(rqd, 0, pblk_g_rq_size);
>>>>
>>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>>       if (!rq_ppas)
>>>>               rq_ppas = pblk->min_write_pgs;
>>>>       rq_len = rq_ppas * geo->csecs;
>>>> @@ -475,6 +476,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>>>>        */
>>>>       if (!rec_round++ && !rqd->error) {
>>>>               rec_round = 0;
>>>> +             pblk_get_packed_meta(pblk, rqd);
>>>>               for (i = 0; i < rqd->nr_ppas; i++, r_ptr++) {
>>>>                       u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>>>                                                       meta_list, i)->lba);
>>>> @@ -492,6 +494,12 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>>>>               int ret;
>>>>
>>>>               bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
>>>> +             if (!pblk_is_oob_meta_supported(pblk) && bit > 0) {
>>>> +                     /* This case should not happen since we always read in
>>>> +                      * the same unit here as we wrote in writer thread.
>>>> +                      */
>>>> +                     pr_err("pblk: Inconsistent packed metadata read\n");
>>>> +             }
>>>>               nr_error_bits = rqd->nr_ppas - bit;
>>>>
>>>>               /* Roll back failed sectors */
>>>> @@ -550,7 +558,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>> next_rq:
>>>>       memset(rqd, 0, pblk_g_rq_size);
>>>>
>>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>>       if (!rq_ppas)
>>>>               rq_ppas = pblk->min_write_pgs;
>>>>       rq_len = rq_ppas * geo->csecs;
>>>> @@ -608,6 +616,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>               int nr_error_bits, bit;
>>>>
>>>>               bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
>>>> +             if (!pblk_is_oob_meta_supported(pblk)) {
>>>> +                     /* For packed metadata we do not handle partially
>>>> +                      * written requests here, since metadata is always
>>>> +                      * in last page on the requests.
>>>> +                      */
>>>> +                     bit = 0;
>>>> +                     *done = 0;
>>>> +             }
>>>>               nr_error_bits = rqd->nr_ppas - bit;
>>>>
>>>>               /* Roll back failed sectors */
>>>> @@ -622,6 +638,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>                       *done = 0;
>>>>       }
>>>>
>>>> +     pblk_get_packed_meta(pblk, rqd);
>>>>       for (i = 0; i < rqd->nr_ppas; i++) {
>>>>               u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>>>                                                       meta_list, i)->lba);
>>>> diff --git a/drivers/lightnvm/pblk-sysfs.c b/drivers/lightnvm/pblk-sysfs.c
>>>> index b0e5e93a9d5f..aa7b4164ce9e 100644
>>>> --- a/drivers/lightnvm/pblk-sysfs.c
>>>> +++ b/drivers/lightnvm/pblk-sysfs.c
>>>> @@ -473,6 +473,13 @@ static ssize_t pblk_sysfs_set_sec_per_write(struct pblk *pblk,
>>>>       if (kstrtouint(page, 0, &sec_per_write))
>>>>               return -EINVAL;
>>>>
>>>> +     if (!pblk_is_oob_meta_supported(pblk)) {
>>>> +             /* For packed metadata case it is
>>>> +              * not allowed to change sec_per_write.
>>>> +              */
>>>> +             return -EINVAL;
>>>> +     }
>>>> +
>>>>       if (sec_per_write < pblk->min_write_pgs
>>>>                               || sec_per_write > pblk->max_write_pgs
>>>>                               || sec_per_write % pblk->min_write_pgs != 0)
>>>> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
>>>> index 6552db35f916..bb45c7e6c375 100644
>>>> --- a/drivers/lightnvm/pblk-write.c
>>>> +++ b/drivers/lightnvm/pblk-write.c
>>>> @@ -354,7 +354,7 @@ static int pblk_calc_secs_to_sync(struct pblk *pblk, unsigned int secs_avail,
>>>> {
>>>>       int secs_to_sync;
>>>>
>>>> -     secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush);
>>>> +     secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush, true);
>>>>
>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>       if ((!secs_to_sync && secs_to_flush)
>>>> @@ -522,6 +522,11 @@ static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
>>>>               return NVM_IO_ERR;
>>>>       }
>>>>
>>>> +     /* This is the first place when we have write requests mapped
>>>> +      * and we can fill packed metadata with l2p mappings.
>>>> +      */
>>>> +     pblk_set_packed_meta(pblk, rqd);
>>>> +
>>>>       meta_line = pblk_should_submit_meta_io(pblk, rqd);
>>>>
>>>>       /* Submit data write for current data line */
>>>> @@ -572,7 +577,7 @@ static int pblk_submit_write(struct pblk *pblk)
>>>>       struct bio *bio;
>>>>       struct nvm_rq *rqd;
>>>>       unsigned int secs_avail, secs_to_sync, secs_to_com;
>>>> -     unsigned int secs_to_flush;
>>>> +     unsigned int secs_to_flush, packed_meta_pgs;
>>>>       unsigned long pos;
>>>>       unsigned int resubmit;
>>>>
>>>> @@ -608,7 +613,7 @@ static int pblk_submit_write(struct pblk *pblk)
>>>>                       return 1;
>>>>
>>>>               secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
>>>> -             if (!secs_to_flush && secs_avail < pblk->min_write_pgs)
>>>> +             if (!secs_to_flush && secs_avail < pblk->min_write_pgs_data)
>>>>                       return 1;
>>>>
>>>>               secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail,
>>>> @@ -623,7 +628,8 @@ static int pblk_submit_write(struct pblk *pblk)
>>>>               pos = pblk_rb_read_commit(&pblk->rwb, secs_to_com);
>>>>       }
>>>>
>>>> -     bio = bio_alloc(GFP_KERNEL, secs_to_sync);
>>>> +     packed_meta_pgs = (pblk->min_write_pgs - pblk->min_write_pgs_data);
>>>> +     bio = bio_alloc(GFP_KERNEL, secs_to_sync + packed_meta_pgs);
>>>>
>>>>       bio->bi_iter.bi_sector = 0; /* internal bio */
>>>>       bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>>> index 4c61ede5b207..c95ecd8bcf79 100644
>>>> --- a/drivers/lightnvm/pblk.h
>>>> +++ b/drivers/lightnvm/pblk.h
>>>> @@ -605,6 +605,7 @@ struct pblk {
>>>>       int state;                      /* pblk line state */
>>>>
>>>>       int min_write_pgs; /* Minimum amount of pages required by controller */
>>>> +     int min_write_pgs_data; /* Minimum amount of payload pages */
>>>>       int max_write_pgs; /* Maximum amount of pages supported by controller */
>>>>
>>>>       sector_t capacity; /* Device capacity when bad blocks are subtracted */
>>>> @@ -798,7 +799,7 @@ void pblk_dealloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>>>> u64 pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>>>> u64 __pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>>>> int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
>>>> -                unsigned long secs_to_flush);
>>>> +                unsigned long secs_to_flush, bool skip_meta);
>>>> void pblk_up_page(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas);
>>>> void pblk_down_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
>>>>                 unsigned long *lun_bitmap);
>>>> @@ -823,6 +824,8 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>>>>                         u64 *lba_list, int nr_secs);
>>>> void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
>>>>                        sector_t blba, int nr_secs);
>>>> +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
>>>> +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
>>>>
>>>> /*
>>>> * pblk user I/O write path
>>>> --
>>>> 2.14.3
>>>
>>> The functionality is good. A couple of comments though:
>>>   - Have you considered the case when the device reports ws_min = ws_opt
>>>     = 4096? Maybe checks preventing this case would be a good idea.

Yes, definitely such a checks needs to be added.

>>>   - Have you checked for any conflicts in the write recovery path? You
>>>     would need to deal with more corner cases (e.g., a write failure in
>>>     the metadata page would require re-sending always all other sectors).

I'll analyze write recover path once again - it is kind of new in the 
code, so I need to check that.

>>>   - There are also corner cases on scan recovery: what if the metadata
>>>     page cannot be read? In this case, data loss (at a drive level) will
>>>     happen, but you will need to guarantee that reading garbage will not
>>>     corrupt other data. In the OOB area case this is not a problem
>>>     because data and metadata are always good/not good simultaneously.
>>
>> What I understood from Igor, is that for example for 256KB writes (as
>> WS_MIN), the last 4 KB would be metadata. In that case, it would be
>> either success or failure. This may also fix your second point.
> 
> On the write patch this depends on the controller - it can either fail
> the whole write or only fail the actual page failing (e.g., in the case
> that we have multi plane). This is not defined at the spec level. On
> pblk we recover everything now since it is the worst case, but in the
> moment we have page dependencies, these should be explicit.

> On the read path, a single 4KB sector can fail, so the case needs
> to be handled either way.
> 

So generally the main goal is to do not corrupt any other data - this is 
obvious for sure. Still OOB meta approach is the preferred one, but 
looking on NVMe drives history this is very rarely available feature, so 
my goal was to add handling for the lack of OOB case.

>>
>>>   - I think it would be good to update the WA counters, since there is a
>>>     significant write and space amplification in using (1 / ws_opt)
>>>     pages for metadata.

Make sense.

Generally thanks for the comments for all patches in that series - I'll 
work on v2 and resend.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk.
  2018-06-19 22:20         ` Igor Konopko
@ 2018-06-20  7:13           ` Javier Gonzalez
  0 siblings, 0 replies; 25+ messages in thread
From: Javier Gonzalez @ 2018-06-20  7:13 UTC (permalink / raw)
  To: Konopko, Igor J
  Cc: Matias Bjørling, linux-block, michal.sorn, marcin.dziegielewski

[-- Attachment #1: Type: text/plain, Size: 22264 bytes --]


> On 20 Jun 2018, at 00.20, Igor Konopko <igor.j.konopko@intel.com> wrote:
> 
> On 19.06.2018 05:47, Javier Gonzalez wrote:
>>> On 19 Jun 2018, at 14.42, Matias Bjørling <mb@lightnvm.io> wrote:
>>> 
>>> On Tue, Jun 19, 2018 at 1:08 PM, Javier Gonzalez <javier@cnexlabs.com> wrote:
>>>>> On 16 Jun 2018, at 00.27, Igor Konopko <igor.j.konopko@intel.com> wrote:
>>>>> 
>>>>> In current pblk implementation, l2p mapping for not closed lines
>>>>> is always stored only in OOB metadata and recovered from it.
>>>>> 
>>>>> Such a solution does not provide data integrity when drives does
>>>>> not have such a OOB metadata space.
>>>>> 
>>>>> The goal of this patch is to add support for so called packed
>>>>> metadata, which store l2p mapping for open lines in last sector
>>>>> of every write unit.
>>>>> 
>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
>>>>> ---
>>>>> drivers/lightnvm/pblk-core.c     | 52 ++++++++++++++++++++++++++++++++++++----
>>>>> drivers/lightnvm/pblk-init.c     | 37 ++++++++++++++++++++++++++--
>>>>> drivers/lightnvm/pblk-rb.c       |  3 +++
>>>>> drivers/lightnvm/pblk-recovery.c | 25 +++++++++++++++----
>>>>> drivers/lightnvm/pblk-sysfs.c    |  7 ++++++
>>>>> drivers/lightnvm/pblk-write.c    | 14 +++++++----
>>>>> drivers/lightnvm/pblk.h          |  5 +++-
>>>>> 7 files changed, 128 insertions(+), 15 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>>> index c092ee93a18d..375c6430612e 100644
>>>>> --- a/drivers/lightnvm/pblk-core.c
>>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>>> @@ -340,7 +340,7 @@ void pblk_write_should_kick(struct pblk *pblk)
>>>>> {
>>>>>      unsigned int secs_avail = pblk_rb_read_count(&pblk->rwb);
>>>>> 
>>>>> -     if (secs_avail >= pblk->min_write_pgs)
>>>>> +     if (secs_avail >= pblk->min_write_pgs_data)
>>>>>              pblk_write_kick(pblk);
>>>>> }
>>>>> 
>>>>> @@ -371,7 +371,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line)
>>>>>      struct pblk_line_meta *lm = &pblk->lm;
>>>>>      struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>>>>      struct list_head *move_list = NULL;
>>>>> -     int vsc = le32_to_cpu(*line->vsc);
>>>>> +     int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
>>>>> +                     * (pblk->min_write_pgs - pblk->min_write_pgs_data);
>>>>> +     int vsc = le32_to_cpu(*line->vsc) + packed_meta;
>>>>> 
>>>>>      lockdep_assert_held(&line->lock);
>>>>> 
>>>>> @@ -540,12 +542,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
>>>>> }
>>>>> 
>>>>> int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
>>>>> -                unsigned long secs_to_flush)
>>>>> +                unsigned long secs_to_flush, bool skip_meta)
>>>>> {
>>>>>      int max = pblk->sec_per_write;
>>>>>      int min = pblk->min_write_pgs;
>>>>>      int secs_to_sync = 0;
>>>>> 
>>>>> +     if (skip_meta)
>>>>> +             min = max = pblk->min_write_pgs_data;
>>>>> +
>>>>>      if (secs_avail >= max)
>>>>>              secs_to_sync = max;
>>>>>      else if (secs_avail >= min)
>>>>> @@ -663,7 +668,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>>>>> next_rq:
>>>>>      memset(&rqd, 0, sizeof(struct nvm_rq));
>>>>> 
>>>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>>>      rq_len = rq_ppas * geo->csecs;
>>>>> 
>>>>>      bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
>>>>> @@ -2091,3 +2096,42 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>>>>>      }
>>>>>      spin_unlock(&pblk->trans_lock);
>>>>> }
>>>>> +
>>>>> +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
>>>>> +{
>>>>> +     void *meta_list = rqd->meta_list;
>>>>> +     void *page;
>>>>> +     int i = 0;
>>>>> +
>>>>> +     if (pblk_is_oob_meta_supported(pblk))
>>>>> +             return;
>>>>> +
>>>>> +     /* We need to zero out metadata corresponding to packed meta page */
>>>>> +     pblk_get_meta_at(pblk, meta_list, rqd->nr_ppas - 1)->lba = ADDR_EMPTY;
>>>>> +
>>>>> +     page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
>>>>> +     /* We need to fill last page of request (packed metadata)
>>>>> +      * with data from oob meta buffer.
>>>>> +      */
>>>>> +     for (; i < rqd->nr_ppas; i++)
>>>>> +             memcpy(page + (i * sizeof(struct pblk_sec_meta)),
>>>>> +                     pblk_get_meta_at(pblk, meta_list, i),
>>>>> +                     sizeof(struct pblk_sec_meta));
>>>>> +}
>>>>> +
>>>>> +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
>>>>> +{
>>>>> +     void *meta_list = rqd->meta_list;
>>>>> +     void *page;
>>>>> +     int i = 0;
>>>>> +
>>>>> +     if (pblk_is_oob_meta_supported(pblk))
>>>>> +             return;
>>>>> +
>>>>> +     page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
>>>>> +     /* We need to fill oob meta buffer with data from packed metadata */
>>>>> +     for (; i < rqd->nr_ppas; i++)
>>>>> +             memcpy(pblk_get_meta_at(pblk, meta_list, i),
>>>>> +                     page + (i * sizeof(struct pblk_sec_meta)),
>>>>> +                     sizeof(struct pblk_sec_meta));
>>>>> +}
>>>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>>>> index f05112230a52..5eb641da46ed 100644
>>>>> --- a/drivers/lightnvm/pblk-init.c
>>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>>> @@ -372,8 +372,40 @@ static int pblk_core_init(struct pblk *pblk)
>>>>>      pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>>>>>      max_write_ppas = pblk->min_write_pgs * geo->all_luns;
>>>>>      pblk->max_write_pgs = min_t(int, max_write_ppas, NVM_MAX_VLBA);
>>>>> +     pblk->min_write_pgs_data = pblk->min_write_pgs;
>>>>>      pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
>>>>> 
>>>>> +     if (!pblk_is_oob_meta_supported(pblk)) {
>>>>> +             /* For drives which does not have OOB metadata feature
>>>>> +              * in order to support recovery feature we need to use
>>>>> +              * so called packed metadata. Packed metada will store
>>>>> +              * the same information as OOB metadata (l2p table mapping,
>>>>> +              * but in the form of the single page at the end of
>>>>> +              * every write request.
>>>>> +              */
>>>>> +             if (pblk->min_write_pgs
>>>>> +                     * sizeof(struct pblk_sec_meta) > PAGE_SIZE) {
>>>>> +                     /* We want to keep all the packed metadata on single
>>>>> +                      * page per write requests. So we need to ensure that
>>>>> +                      * it will fit.
>>>>> +                      *
>>>>> +                      * This is more like sanity check, since there is
>>>>> +                      * no device with such a big minimal write size
>>>>> +                      * (above 1 metabytes).
>>>>> +                      */
>>>>> +                     pr_err("pblk: Not supported min write size\n");
>>>>> +                     return -EINVAL;
>>>>> +             }
>>>>> +             /* For packed meta approach we do some simplification.
>>>>> +              * On read path we always issue requests which size
>>>>> +              * equal to max_write_pgs, with all pages filled with
>>>>> +              * user payload except of last one page which will be
>>>>> +              * filled with packed metadata.
>>>>> +              */
>>>>> +             pblk->max_write_pgs = pblk->min_write_pgs;
>>>>> +             pblk->min_write_pgs_data = pblk->min_write_pgs - 1;
>>>>> +     }
>>>>> +
>>>>>      if (pblk->max_write_pgs > PBLK_MAX_REQ_ADDRS) {
>>>>>              pr_err("pblk: vector list too big(%u > %u)\n",
>>>>>                              pblk->max_write_pgs, PBLK_MAX_REQ_ADDRS);
>>>>> @@ -668,7 +700,7 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
>>>>>      struct pblk_line_meta *lm = &pblk->lm;
>>>>>      struct nvm_geo *geo = &dev->geo;
>>>>>      sector_t provisioned;
>>>>> -     int sec_meta, blk_meta;
>>>>> +     int sec_meta, blk_meta, clba;
>>>>> 
>>>>>      if (geo->op == NVM_TARGET_DEFAULT_OP)
>>>>>              pblk->op = PBLK_DEFAULT_OP;
>>>>> @@ -691,7 +723,8 @@ static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
>>>>>      sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
>>>>>      blk_meta = DIV_ROUND_UP(sec_meta, geo->clba);
>>>>> 
>>>>> -     pblk->capacity = (provisioned - blk_meta) * geo->clba;
>>>>> +     clba = (geo->clba / pblk->min_write_pgs) * pblk->min_write_pgs_data;
>>>>> +     pblk->capacity = (provisioned - blk_meta) * clba;
>>>>> 
>>>>>      atomic_set(&pblk->rl.free_blocks, nr_free_blks);
>>>>>      atomic_set(&pblk->rl.free_user_blocks, nr_free_blks);
>>>>> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
>>>>> index a81a97e8ea6d..081e73e7978f 100644
>>>>> --- a/drivers/lightnvm/pblk-rb.c
>>>>> +++ b/drivers/lightnvm/pblk-rb.c
>>>>> @@ -528,6 +528,9 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
>>>>>              to_read = count;
>>>>>      }
>>>>> 
>>>>> +     /* Add space for packed metadata if in use*/
>>>>> +     pad += (pblk->min_write_pgs - pblk->min_write_pgs_data);
>>>>> +
>>>>>      c_ctx->sentry = pos;
>>>>>      c_ctx->nr_valid = to_read;
>>>>>      c_ctx->nr_padded = pad;
>>>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>>>>> index f5853fc77a0c..0fab18fe30d9 100644
>>>>> --- a/drivers/lightnvm/pblk-recovery.c
>>>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>>>> @@ -138,7 +138,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>>>>> next_read_rq:
>>>>>      memset(rqd, 0, pblk_g_rq_size);
>>>>> 
>>>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>>>      if (!rq_ppas)
>>>>>              rq_ppas = pblk->min_write_pgs;
>>>>>      rq_len = rq_ppas * geo->csecs;
>>>>> @@ -198,6 +198,7 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>              return -EINTR;
>>>>>      }
>>>>> 
>>>>> +     pblk_get_packed_meta(pblk, rqd);
>>>>>      for (i = 0; i < rqd->nr_ppas; i++) {
>>>>>              u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>>>>                                                      meta_list, i)->lba);
>>>>> @@ -272,7 +273,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>      kref_init(&pad_rq->ref);
>>>>> 
>>>>> next_pad_rq:
>>>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>>>      if (rq_ppas < pblk->min_write_pgs) {
>>>>>              pr_err("pblk: corrupted pad line %d\n", line->id);
>>>>>              goto fail_free_pad;
>>>>> @@ -418,7 +419,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>>>>> next_rq:
>>>>>      memset(rqd, 0, pblk_g_rq_size);
>>>>> 
>>>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>>>      if (!rq_ppas)
>>>>>              rq_ppas = pblk->min_write_pgs;
>>>>>      rq_len = rq_ppas * geo->csecs;
>>>>> @@ -475,6 +476,7 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>       */
>>>>>      if (!rec_round++ && !rqd->error) {
>>>>>              rec_round = 0;
>>>>> +             pblk_get_packed_meta(pblk, rqd);
>>>>>              for (i = 0; i < rqd->nr_ppas; i++, r_ptr++) {
>>>>>                      u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>>>>                                                      meta_list, i)->lba);
>>>>> @@ -492,6 +494,12 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>              int ret;
>>>>> 
>>>>>              bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
>>>>> +             if (!pblk_is_oob_meta_supported(pblk) && bit > 0) {
>>>>> +                     /* This case should not happen since we always read in
>>>>> +                      * the same unit here as we wrote in writer thread.
>>>>> +                      */
>>>>> +                     pr_err("pblk: Inconsistent packed metadata read\n");
>>>>> +             }
>>>>>              nr_error_bits = rqd->nr_ppas - bit;
>>>>> 
>>>>>              /* Roll back failed sectors */
>>>>> @@ -550,7 +558,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>> next_rq:
>>>>>      memset(rqd, 0, pblk_g_rq_size);
>>>>> 
>>>>> -     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
>>>>> +     rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
>>>>>      if (!rq_ppas)
>>>>>              rq_ppas = pblk->min_write_pgs;
>>>>>      rq_len = rq_ppas * geo->csecs;
>>>>> @@ -608,6 +616,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>              int nr_error_bits, bit;
>>>>> 
>>>>>              bit = find_first_bit((void *)&rqd->ppa_status, rqd->nr_ppas);
>>>>> +             if (!pblk_is_oob_meta_supported(pblk)) {
>>>>> +                     /* For packed metadata we do not handle partially
>>>>> +                      * written requests here, since metadata is always
>>>>> +                      * in last page on the requests.
>>>>> +                      */
>>>>> +                     bit = 0;
>>>>> +                     *done = 0;
>>>>> +             }
>>>>>              nr_error_bits = rqd->nr_ppas - bit;
>>>>> 
>>>>>              /* Roll back failed sectors */
>>>>> @@ -622,6 +638,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>                      *done = 0;
>>>>>      }
>>>>> 
>>>>> +     pblk_get_packed_meta(pblk, rqd);
>>>>>      for (i = 0; i < rqd->nr_ppas; i++) {
>>>>>              u64 lba = le64_to_cpu(pblk_get_meta_at(pblk,
>>>>>                                                      meta_list, i)->lba);
>>>>> diff --git a/drivers/lightnvm/pblk-sysfs.c b/drivers/lightnvm/pblk-sysfs.c
>>>>> index b0e5e93a9d5f..aa7b4164ce9e 100644
>>>>> --- a/drivers/lightnvm/pblk-sysfs.c
>>>>> +++ b/drivers/lightnvm/pblk-sysfs.c
>>>>> @@ -473,6 +473,13 @@ static ssize_t pblk_sysfs_set_sec_per_write(struct pblk *pblk,
>>>>>      if (kstrtouint(page, 0, &sec_per_write))
>>>>>              return -EINVAL;
>>>>> 
>>>>> +     if (!pblk_is_oob_meta_supported(pblk)) {
>>>>> +             /* For packed metadata case it is
>>>>> +              * not allowed to change sec_per_write.
>>>>> +              */
>>>>> +             return -EINVAL;
>>>>> +     }
>>>>> +
>>>>>      if (sec_per_write < pblk->min_write_pgs
>>>>>                              || sec_per_write > pblk->max_write_pgs
>>>>>                              || sec_per_write % pblk->min_write_pgs != 0)
>>>>> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
>>>>> index 6552db35f916..bb45c7e6c375 100644
>>>>> --- a/drivers/lightnvm/pblk-write.c
>>>>> +++ b/drivers/lightnvm/pblk-write.c
>>>>> @@ -354,7 +354,7 @@ static int pblk_calc_secs_to_sync(struct pblk *pblk, unsigned int secs_avail,
>>>>> {
>>>>>      int secs_to_sync;
>>>>> 
>>>>> -     secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush);
>>>>> +     secs_to_sync = pblk_calc_secs(pblk, secs_avail, secs_to_flush, true);
>>>>> 
>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>      if ((!secs_to_sync && secs_to_flush)
>>>>> @@ -522,6 +522,11 @@ static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
>>>>>              return NVM_IO_ERR;
>>>>>      }
>>>>> 
>>>>> +     /* This is the first place when we have write requests mapped
>>>>> +      * and we can fill packed metadata with l2p mappings.
>>>>> +      */
>>>>> +     pblk_set_packed_meta(pblk, rqd);
>>>>> +
>>>>>      meta_line = pblk_should_submit_meta_io(pblk, rqd);
>>>>> 
>>>>>      /* Submit data write for current data line */
>>>>> @@ -572,7 +577,7 @@ static int pblk_submit_write(struct pblk *pblk)
>>>>>      struct bio *bio;
>>>>>      struct nvm_rq *rqd;
>>>>>      unsigned int secs_avail, secs_to_sync, secs_to_com;
>>>>> -     unsigned int secs_to_flush;
>>>>> +     unsigned int secs_to_flush, packed_meta_pgs;
>>>>>      unsigned long pos;
>>>>>      unsigned int resubmit;
>>>>> 
>>>>> @@ -608,7 +613,7 @@ static int pblk_submit_write(struct pblk *pblk)
>>>>>                      return 1;
>>>>> 
>>>>>              secs_to_flush = pblk_rb_flush_point_count(&pblk->rwb);
>>>>> -             if (!secs_to_flush && secs_avail < pblk->min_write_pgs)
>>>>> +             if (!secs_to_flush && secs_avail < pblk->min_write_pgs_data)
>>>>>                      return 1;
>>>>> 
>>>>>              secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail,
>>>>> @@ -623,7 +628,8 @@ static int pblk_submit_write(struct pblk *pblk)
>>>>>              pos = pblk_rb_read_commit(&pblk->rwb, secs_to_com);
>>>>>      }
>>>>> 
>>>>> -     bio = bio_alloc(GFP_KERNEL, secs_to_sync);
>>>>> +     packed_meta_pgs = (pblk->min_write_pgs - pblk->min_write_pgs_data);
>>>>> +     bio = bio_alloc(GFP_KERNEL, secs_to_sync + packed_meta_pgs);
>>>>> 
>>>>>      bio->bi_iter.bi_sector = 0; /* internal bio */
>>>>>      bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>>>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>>>> index 4c61ede5b207..c95ecd8bcf79 100644
>>>>> --- a/drivers/lightnvm/pblk.h
>>>>> +++ b/drivers/lightnvm/pblk.h
>>>>> @@ -605,6 +605,7 @@ struct pblk {
>>>>>      int state;                      /* pblk line state */
>>>>> 
>>>>>      int min_write_pgs; /* Minimum amount of pages required by controller */
>>>>> +     int min_write_pgs_data; /* Minimum amount of payload pages */
>>>>>      int max_write_pgs; /* Maximum amount of pages supported by controller */
>>>>> 
>>>>>      sector_t capacity; /* Device capacity when bad blocks are subtracted */
>>>>> @@ -798,7 +799,7 @@ void pblk_dealloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>>>>> u64 pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>>>>> u64 __pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
>>>>> int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
>>>>> -                unsigned long secs_to_flush);
>>>>> +                unsigned long secs_to_flush, bool skip_meta);
>>>>> void pblk_up_page(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas);
>>>>> void pblk_down_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
>>>>>                unsigned long *lun_bitmap);
>>>>> @@ -823,6 +824,8 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
>>>>>                        u64 *lba_list, int nr_secs);
>>>>> void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
>>>>>                       sector_t blba, int nr_secs);
>>>>> +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
>>>>> +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd);
>>>>> 
>>>>> /*
>>>>> * pblk user I/O write path
>>>>> --
>>>>> 2.14.3
>>>> 
>>>> The functionality is good. A couple of comments though:
>>>>  - Have you considered the case when the device reports ws_min = ws_opt
>>>>    = 4096? Maybe checks preventing this case would be a good idea.
> 
> Yes, definitely such a checks needs to be added.
> 
>>>>  - Have you checked for any conflicts in the write recovery path? You
>>>>    would need to deal with more corner cases (e.g., a write failure in
>>>>    the metadata page would require re-sending always all other sectors).
> 
> I'll analyze write recover path once again - it is kind of new in the
> code, so I need to check that.
> 

I don't think you will need a lot of changes; just make sure that you
have the dependency between the last sector and the rest of the write
unit.

An extra comment here, you will also need to handle the invalidation of
the metadata sector for GC to recycle it in case that all the sectors
related to it are invalid.

>>>>  - There are also corner cases on scan recovery: what if the metadata
>>>>    page cannot be read? In this case, data loss (at a drive level) will
>>>>    happen, but you will need to guarantee that reading garbage will not
>>>>    corrupt other data. In the OOB area case this is not a problem
>>>>    because data and metadata are always good/not good simultaneously.
>>> 
>>> What I understood from Igor, is that for example for 256KB writes (as
>>> WS_MIN), the last 4 KB would be metadata. In that case, it would be
>>> either success or failure. This may also fix your second point.
>> On the write patch this depends on the controller - it can either fail
>> the whole write or only fail the actual page failing (e.g., in the case
>> that we have multi plane). This is not defined at the spec level. On
>> pblk we recover everything now since it is the worst case, but in the
>> moment we have page dependencies, these should be explicit.
> 
>> On the read path, a single 4KB sector can fail, so the case needs
>> to be handled either way.
> 
> So generally the main goal is to do not corrupt any other data - this
> is obvious for sure. Still OOB meta approach is the preferred one, but
> looking on NVMe drives history this is very rarely available feature,
> so my goal was to add handling for the lack of OOB case.

It is a very valid use case. It just adds a couple of corner cases that
we need to handle.

> 
>>>>  - I think it would be good to update the WA counters, since there is a
>>>>    significant write and space amplification in using (1 / ws_opt)
>>>>    pages for metadata.
> 
> Make sense.
> 
> Generally thanks for the comments for all patches in that series -
> I'll work on v2 and resend.

Of course!

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-06-20  8:27 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-15 22:27 [PATCH 0/5] lightnvm: More flexible approach to metadata Igor Konopko
2018-06-15 22:27 ` [PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata Igor Konopko
2018-06-16 19:24   ` Matias Bjørling
2018-06-18 14:23   ` Javier Gonzalez
2018-06-18 20:53     ` Igor Konopko
2018-06-19  7:44       ` Javier Gonzalez
2018-06-15 22:27 ` [PATCH 2/5] lightnvm: pblk: Remove resv field for sec meta Igor Konopko
2018-06-16 19:27   ` Matias Bjørling
2018-06-18 14:25     ` Javier Gonzalez
2018-06-18 20:50       ` Igor Konopko
2018-06-15 22:27 ` [PATCH 3/5] lightnvm: Flexible DMA pool entry size Igor Konopko
2018-06-16 19:32   ` Matias Bjørling
2018-06-15 22:27 ` [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk Igor Konopko
2018-06-16 19:45   ` Matias Bjørling
2018-06-19 11:08   ` Javier Gonzalez
2018-06-19 12:42     ` Matias Bjørling
2018-06-19 12:47       ` Javier Gonzalez
2018-06-19 22:20         ` Igor Konopko
2018-06-20  7:13           ` Javier Gonzalez
2018-06-15 22:27 ` [PATCH 5/5] lightnvm: pblk: Disable interleaved " Igor Konopko
2018-06-16 19:38   ` Matias Bjørling
2018-06-18 14:29     ` Javier Gonzalez
2018-06-18 20:51       ` Igor Konopko
2018-06-19  8:24       ` Matias Bjørling
2018-06-19 10:28 ` [PATCH 0/5] lightnvm: More flexible approach to metadata Javier Gonzalez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.