linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] lightnvm: pblk: fix race condition on GC
@ 2019-01-27  6:54 Heiner Litz
  2019-01-29  8:13 ` Javier González
  0 siblings, 1 reply; 3+ messages in thread
From: Heiner Litz @ 2019-01-27  6:54 UTC (permalink / raw)
  To: mb; +Cc: javier, hans.holmberg, linux-block, linux-kernel, Heiner Litz

This patch fixes a race condition where a write is mapped to the last
sectors of a line. The write is synced to the device but the L2P is not
updated yet. When the line is garbage collected before the L2P update is
performed, the sectors are ignored by the GC logic and the line is freed
before all sectors are moved. When the L2P is finally updated, it contains
a mapping to a freed line, subsequent reads of the corresponding LBAs fail.

Note that looking up the L2P and checking the ppa in the write buffer needs
to be performed atomically, hence the refactor of pblk_lookup_l2p_rand.

Signed-off-by: Heiner Litz <hlitz@ucsc.edu>
---
 drivers/lightnvm/pblk-read.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 3789185144da..7c556b2218e4 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -529,13 +529,35 @@ static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
 	int valid_secs = 0;
 	int i;
 
-	pblk_lookup_l2p_rand(pblk, ppa_list_l2p, lba_list, nr_secs);
-
+	spin_lock(&pblk->trans_lock);
 	for (i = 0; i < nr_secs; i++) {
 		if (lba_list[i] == ADDR_EMPTY)
 			continue;
 
+		ppa_list_l2p[i] = pblk_trans_map_get(pblk, lba_list[i]);
 		ppa_gc = addr_to_gen_ppa(pblk, paddr_list_gc[i], line->id);
+
+		/* Obtain ppa from cache if the sector has been synced to the
+		   device but the L2P has not been updated yet */
+		if(pblk_addr_in_cache(ppa_list_l2p[i])) {
+			struct pblk_rb *rb = &pblk->rwb;
+			struct pblk_rb_entry *entry;
+			struct pblk_w_ctx *w_ctx;
+			u64 pos = pblk_addr_to_cacheline(ppa_list_l2p[i]);
+
+#ifdef CONFIG_NVM_PBLK_DEBUG
+			/* Ensure that the access will not cause an overflow */
+			BUG_ON(pos >= rb->nr_entries);
+#endif
+
+			entry = &rb->entries[pos];
+			w_ctx = &entry->w_ctx;
+			if (pblk_ppa_comp(w_ctx->ppa, ppa_gc)) {
+				rqd->ppa_list[valid_secs++] = ppa_gc;
+				continue;
+			}
+		}
+
 		if (!pblk_ppa_comp(ppa_list_l2p[i], ppa_gc)) {
 			paddr_list_gc[i] = lba_list[i] = ADDR_EMPTY;
 			continue;
@@ -543,6 +565,7 @@ static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
 
 		rqd->ppa_list[valid_secs++] = ppa_list_l2p[i];
 	}
+	spin_unlock(&pblk->trans_lock);
 
 #ifdef CONFIG_NVM_PBLK_DEBUG
 	atomic_long_add(valid_secs, &pblk->inflight_reads);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] lightnvm: pblk: fix race condition on GC
  2019-01-27  6:54 [PATCH] lightnvm: pblk: fix race condition on GC Heiner Litz
@ 2019-01-29  8:13 ` Javier González
       [not found]   ` <CAJbgVnU8G+brMW1anV1u79autsVzigwAoaaQKfZi_dizUFLfdw@mail.gmail.com>
  0 siblings, 1 reply; 3+ messages in thread
From: Javier González @ 2019-01-29  8:13 UTC (permalink / raw)
  To: Heiner Litz
  Cc: Matias Bjørling, Hans Holmberg, linux-block, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3349 bytes --]

> On 27 Jan 2019, at 07.54, Heiner Litz <hlitz@ucsc.edu> wrote:
> 
> This patch fixes a race condition where a write is mapped to the last
> sectors of a line. The write is synced to the device but the L2P is not
> updated yet. When the line is garbage collected before the L2P update is
> performed, the sectors are ignored by the GC logic and the line is freed
> before all sectors are moved. When the L2P is finally updated, it contains
> a mapping to a freed line, subsequent reads of the corresponding LBAs fail.

Hi Heiner,

This has been an interesting issue to debug - good catch!

> 
> Note that looking up the L2P and checking the ppa in the write buffer needs
> to be performed atomically, hence the refactor of pblk_lookup_l2p_rand.
> 
> Signed-off-by: Heiner Litz <hlitz@ucsc.edu>
> ---
> drivers/lightnvm/pblk-read.c | 27 +++++++++++++++++++++++++--
> 1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 3789185144da..7c556b2218e4 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -529,13 +529,35 @@ static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
> 	int valid_secs = 0;
> 	int i;
> 
> -	pblk_lookup_l2p_rand(pblk, ppa_list_l2p, lba_list, nr_secs);
> -
> +	spin_lock(&pblk->trans_lock);
> 	for (i = 0; i < nr_secs; i++) {
> 		if (lba_list[i] == ADDR_EMPTY)
> 			continue;
> 
> +		ppa_list_l2p[i] = pblk_trans_map_get(pblk, lba_list[i]);
> 		ppa_gc = addr_to_gen_ppa(pblk, paddr_list_gc[i], line->id);
> +
> +		/* Obtain ppa from cache if the sector has been synced to the
> +		   device but the L2P has not been updated yet */
> +		if(pblk_addr_in_cache(ppa_list_l2p[i])) {
> +			struct pblk_rb *rb = &pblk->rwb;
> +			struct pblk_rb_entry *entry;
> +			struct pblk_w_ctx *w_ctx;
> +			u64 pos = pblk_addr_to_cacheline(ppa_list_l2p[i]);
> +
> +#ifdef CONFIG_NVM_PBLK_DEBUG
> +			/* Ensure that the access will not cause an overflow */
> +			BUG_ON(pos >= rb->nr_entries);
> +#endif
> +
> +			entry = &rb->entries[pos];
> +			w_ctx = &entry->w_ctx;
> +			if (pblk_ppa_comp(w_ctx->ppa, ppa_gc)) {
> +				rqd->ppa_list[valid_secs++] = ppa_gc;
> +				continue;
> +			}
> +		}
> +
> 		if (!pblk_ppa_comp(ppa_list_l2p[i], ppa_gc)) {
> 			paddr_list_gc[i] = lba_list[i] = ADDR_EMPTY;
> 			continue;
> @@ -543,6 +565,7 @@ static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
> 
> 		rqd->ppa_list[valid_secs++] = ppa_list_l2p[i];
> 	}
> +	spin_unlock(&pblk->trans_lock);
> 
> #ifdef CONFIG_NVM_PBLK_DEBUG
> 	atomic_long_add(valid_secs, &pblk->inflight_reads);
> --
> 2.17.1


Here is a suggestion: Why not add an atomic counter to the
line stating the sectors that are synced in the L2P table and then
loosely wait (i.e., check and sleep / schedule) until the counter
reaches 0 on pblk_line_close_ws()? This way you guarantee that the line
does not close - and therefore never reaches the GC lists - before all
the L2P entries for that line point to the media. Any other form of
synchronization that puts the burden at pblk_line_close_ws() would also
work for me.

In essence, I would rather pay the price on a per-line basis than
blocking the trans_lock longer for each I/O.

Thoughts?

Javier


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] lightnvm: pblk: fix race condition on GC
       [not found]   ` <CAJbgVnU8G+brMW1anV1u79autsVzigwAoaaQKfZi_dizUFLfdw@mail.gmail.com>
@ 2019-01-29 18:39     ` Javier González
  0 siblings, 0 replies; 3+ messages in thread
From: Javier González @ 2019-01-29 18:39 UTC (permalink / raw)
  To: Heiner Litz
  Cc: Matias Bjørling, Hans Holmberg, linux-block, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4484 bytes --]


> On 29 Jan 2019, at 17.36, Heiner Litz <hlitz@ucsc.edu> wrote:
> 
> Javier,
> 
> On Tue, Jan 29, 2019 at 12:13 AM Javier González <javier@javigon.com> wrote:
> > On 27 Jan 2019, at 07.54, Heiner Litz <hlitz@ucsc.edu> wrote:
> >
> > This patch fixes a race condition where a write is mapped to the last
> > sectors of a line. The write is synced to the device but the L2P is not
> > updated yet. When the line is garbage collected before the L2P update is
> > performed, the sectors are ignored by the GC logic and the line is freed
> > before all sectors are moved. When the L2P is finally updated, it contains
> > a mapping to a freed line, subsequent reads of the corresponding LBAs fail.
> 
> Hi Heiner,
> 
> This has been an interesting issue to debug - good catch!
> 
> Felt more like a marathon than a catch ;)

Hehehe, I know. It is a good maraton then :)

> 
> 
> >
> > Note that looking up the L2P and checking the ppa in the write buffer needs
> > to be performed atomically, hence the refactor of pblk_lookup_l2p_rand.
> >
> > Signed-off-by: Heiner Litz <hlitz@ucsc.edu>
> > ---
> > drivers/lightnvm/pblk-read.c | 27 +++++++++++++++++++++++++--
> > 1 file changed, 25 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> > index 3789185144da..7c556b2218e4 100644
> > --- a/drivers/lightnvm/pblk-read.c
> > +++ b/drivers/lightnvm/pblk-read.c
> > @@ -529,13 +529,35 @@ static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
> >       int valid_secs = 0;
> >       int i;
> >
> > -     pblk_lookup_l2p_rand(pblk, ppa_list_l2p, lba_list, nr_secs);
> > -
> > +     spin_lock(&pblk->trans_lock);
> >       for (i = 0; i < nr_secs; i++) {
> >               if (lba_list[i] == ADDR_EMPTY)
> >                       continue;
> >
> > +             ppa_list_l2p[i] = pblk_trans_map_get(pblk, lba_list[i]);
> >               ppa_gc = addr_to_gen_ppa(pblk, paddr_list_gc[i], line->id);
> > +
> > +             /* Obtain ppa from cache if the sector has been synced to the
> > +                device but the L2P has not been updated yet */
> > +             if(pblk_addr_in_cache(ppa_list_l2p[i])) {
> > +                     struct pblk_rb *rb = &pblk->rwb;
> > +                     struct pblk_rb_entry *entry;
> > +                     struct pblk_w_ctx *w_ctx;
> > +                     u64 pos = pblk_addr_to_cacheline(ppa_list_l2p[i]);
> > +
> > +#ifdef CONFIG_NVM_PBLK_DEBUG
> > +                     /* Ensure that the access will not cause an overflow */
> > +                     BUG_ON(pos >= rb->nr_entries);
> > +#endif
> > +
> > +                     entry = &rb->entries[pos];
> > +                     w_ctx = &entry->w_ctx;
> > +                     if (pblk_ppa_comp(w_ctx->ppa, ppa_gc)) {
> > +                             rqd->ppa_list[valid_secs++] = ppa_gc;
> > +                             continue;
> > +                     }
> > +             }
> > +
> >               if (!pblk_ppa_comp(ppa_list_l2p[i], ppa_gc)) {
> >                       paddr_list_gc[i] = lba_list[i] = ADDR_EMPTY;
> >                       continue;
> > @@ -543,6 +565,7 @@ static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
> >
> >               rqd->ppa_list[valid_secs++] = ppa_list_l2p[i];
> >       }
> > +     spin_unlock(&pblk->trans_lock);
> >
> > #ifdef CONFIG_NVM_PBLK_DEBUG
> >       atomic_long_add(valid_secs, &pblk->inflight_reads);
> > --
> > 2.17.1
> 
> 
> Here is a suggestion: Why not add an atomic counter to the
> line stating the sectors that are synced in the L2P table and then
> loosely wait (i.e., check and sleep / schedule) until the counter
> reaches 0 on pblk_line_close_ws()? This way you guarantee that the line
> does not close - and therefore never reaches the GC lists - before all
> the L2P entries for that line point to the media. Any other form of
> synchronization that puts the burden at pblk_line_close_ws() would also
> work for me.
> 
> In essence, I would rather pay the price on a per-line basis than
> blocking the trans_lock longer for each I/O.
> 
> The patch only adds 2 well predictable branches to the loop so I think the
> impact would be minimal, but I still think your approach is cleaner.
> I suggest checking the proposed sync counter whenever selecting a GC
> candidate. I'll send out a V2
> 

Sounds good! Thanks!

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-01-29 18:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-27  6:54 [PATCH] lightnvm: pblk: fix race condition on GC Heiner Litz
2019-01-29  8:13 ` Javier González
     [not found]   ` <CAJbgVnU8G+brMW1anV1u79autsVzigwAoaaQKfZi_dizUFLfdw@mail.gmail.com>
2019-01-29 18:39     ` Javier González

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).