All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linux fsdevel mailing list <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>, Matthew Wilcox <willy@infradead.org>,
	virtio-fs-list <virtio-fs@redhat.com>,
	Sergio Lopez <slp@redhat.com>, Miklos Szeredi <miklos@szeredi.hu>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH][v2] dax: Fix missed wakeup during dax entry invalidation
Date: Mon, 19 Apr 2021 16:39:47 -0400	[thread overview]
Message-ID: <20210419203947.GG1472665@redhat.com> (raw)
In-Reply-To: <CAPcyv4jR5d+-99wVMm9SHxNBOsp0FUi7wzDNsefkZ1oqUZ7joQ@mail.gmail.com>

On Mon, Apr 19, 2021 at 12:48:58PM -0700, Dan Williams wrote:
> On Mon, Apr 19, 2021 at 11:45 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > This is V2 of the patch. Posted V1 here.
> >
> > https://lore.kernel.org/linux-fsdevel/20210416173524.GA1379987@redhat.com/
> >
> > Based on feedback from Dan and Jan, modified the patch to wake up
> > all waiters when dax entry is invalidated. This solves the issues
> > of missed wakeups.
> 
> Care to send a formal patch with this commentary moved below the --- line?
> 
> One style fixup below...
> 
> >
> > I am seeing missed wakeups which ultimately lead to a deadlock when I am
> > using virtiofs with DAX enabled and running "make -j". I had to mount
> > virtiofs as rootfs and also reduce to dax window size to 256M to reproduce
> > the problem consistently.
> >
> > So here is the problem. put_unlocked_entry() wakes up waiters only
> > if entry is not null as well as !dax_is_conflict(entry). But if I
> > call multiple instances of invalidate_inode_pages2() in parallel,
> > then I can run into a situation where there are waiters on
> > this index but nobody will wait these.
> >
> > invalidate_inode_pages2()
> >   invalidate_inode_pages2_range()
> >     invalidate_exceptional_entry2()
> >       dax_invalidate_mapping_entry_sync()
> >         __dax_invalidate_entry() {
> >                 xas_lock_irq(&xas);
> >                 entry = get_unlocked_entry(&xas, 0);
> >                 ...
> >                 ...
> >                 dax_disassociate_entry(entry, mapping, trunc);
> >                 xas_store(&xas, NULL);
> >                 ...
> >                 ...
> >                 put_unlocked_entry(&xas, entry);
> >                 xas_unlock_irq(&xas);
> >         }
> >
> > Say a fault in in progress and it has locked entry at offset say "0x1c".
> > Now say three instances of invalidate_inode_pages2() are in progress
> > (A, B, C) and they all try to invalidate entry at offset "0x1c". Given
> > dax entry is locked, all tree instances A, B, C will wait in wait queue.
> >
> > When dax fault finishes, say A is woken up. It will store NULL entry
> > at index "0x1c" and wake up B. When B comes along it will find "entry=0"
> > at page offset 0x1c and it will call put_unlocked_entry(&xas, 0). And
> > this means put_unlocked_entry() will not wake up next waiter, given
> > the current code. And that means C continues to wait and is not woken
> > up.
> >
> > This patch fixes the issue by waking up all waiters when a dax entry
> > has been invalidated. This seems to fix the deadlock I am facing
> > and I can make forward progress.
> >
> > Reported-by: Sergio Lopez <slp@redhat.com>
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/dax.c |   12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > Index: redhat-linux/fs/dax.c
> > ===================================================================
> > --- redhat-linux.orig/fs/dax.c  2021-04-16 14:16:44.332140543 -0400
> > +++ redhat-linux/fs/dax.c       2021-04-19 11:24:11.465213474 -0400
> > @@ -264,11 +264,11 @@ static void wait_entry_unlocked(struct x
> >         finish_wait(wq, &ewait.wait);
> >  }
> >
> > -static void put_unlocked_entry(struct xa_state *xas, void *entry)
> > +static void put_unlocked_entry(struct xa_state *xas, void *entry, bool wake_all)
> >  {
> >         /* If we were the only waiter woken, wake the next one */
> >         if (entry && !dax_is_conflict(entry))
> > -               dax_wake_entry(xas, entry, false);
> > +               dax_wake_entry(xas, entry, wake_all);
> >  }
> >
> >  /*
> > @@ -622,7 +622,7 @@ struct page *dax_layout_busy_page_range(
> >                         entry = get_unlocked_entry(&xas, 0);
> >                 if (entry)
> >                         page = dax_busy_page(entry);
> > -               put_unlocked_entry(&xas, entry);
> > +               put_unlocked_entry(&xas, entry, false);
> 
> I'm not a fan of raw true/false arguments because if you read this
> line in isolation you need to go read put_unlocked_entry() to recall
> what that argument means. So lets add something like:
> 
> /**
>  * enum dax_entry_wake_mode: waitqueue wakeup toggle
>  * @WAKE_NEXT: entry was not mutated
>  * @WAKE_ALL: entry was invalidated, or resized
>  */
> enum dax_entry_wake_mode {
>         WAKE_NEXT,
>         WAKE_ALL,
> }
> 
> ...and use that as the arg for dax_wake_entry(). So I'd expect this to
> be a 3 patch series, introduce dax_entry_wake_mode for
> dax_wake_entry(), introduce the argument for put_unlocked_entry()
> without changing the logic, and finally this bug fix. Feel free to add
> 'Fixes: ac401cc78242 ("dax: New fault locking")' in case you feel this
> needs to be backported.

Hi Dan,

I will make changes as you suggested and post another version.

I am wondering what to do with dax_wake_entry(). It also has a boolean
parameter wake_all. Should that be converted as well to make use of
enum dax_entry_wake_mode?

Thanks
Vivek


WARNING: multiple messages have this Message-ID (diff)
From: Vivek Goyal <vgoyal@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linux fsdevel mailing list <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>, Matthew Wilcox <willy@infradead.org>,
	virtio-fs-list <virtio-fs@redhat.com>,
	Sergio Lopez <slp@redhat.com>, Miklos Szeredi <miklos@szeredi.hu>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH][v2] dax: Fix missed wakeup during dax entry invalidation
Date: Mon, 19 Apr 2021 16:39:47 -0400	[thread overview]
Message-ID: <20210419203947.GG1472665@redhat.com> (raw)
In-Reply-To: <CAPcyv4jR5d+-99wVMm9SHxNBOsp0FUi7wzDNsefkZ1oqUZ7joQ@mail.gmail.com>

On Mon, Apr 19, 2021 at 12:48:58PM -0700, Dan Williams wrote:
> On Mon, Apr 19, 2021 at 11:45 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > This is V2 of the patch. Posted V1 here.
> >
> > https://lore.kernel.org/linux-fsdevel/20210416173524.GA1379987@redhat.com/
> >
> > Based on feedback from Dan and Jan, modified the patch to wake up
> > all waiters when dax entry is invalidated. This solves the issues
> > of missed wakeups.
> 
> Care to send a formal patch with this commentary moved below the --- line?
> 
> One style fixup below...
> 
> >
> > I am seeing missed wakeups which ultimately lead to a deadlock when I am
> > using virtiofs with DAX enabled and running "make -j". I had to mount
> > virtiofs as rootfs and also reduce to dax window size to 256M to reproduce
> > the problem consistently.
> >
> > So here is the problem. put_unlocked_entry() wakes up waiters only
> > if entry is not null as well as !dax_is_conflict(entry). But if I
> > call multiple instances of invalidate_inode_pages2() in parallel,
> > then I can run into a situation where there are waiters on
> > this index but nobody will wait these.
> >
> > invalidate_inode_pages2()
> >   invalidate_inode_pages2_range()
> >     invalidate_exceptional_entry2()
> >       dax_invalidate_mapping_entry_sync()
> >         __dax_invalidate_entry() {
> >                 xas_lock_irq(&xas);
> >                 entry = get_unlocked_entry(&xas, 0);
> >                 ...
> >                 ...
> >                 dax_disassociate_entry(entry, mapping, trunc);
> >                 xas_store(&xas, NULL);
> >                 ...
> >                 ...
> >                 put_unlocked_entry(&xas, entry);
> >                 xas_unlock_irq(&xas);
> >         }
> >
> > Say a fault in in progress and it has locked entry at offset say "0x1c".
> > Now say three instances of invalidate_inode_pages2() are in progress
> > (A, B, C) and they all try to invalidate entry at offset "0x1c". Given
> > dax entry is locked, all tree instances A, B, C will wait in wait queue.
> >
> > When dax fault finishes, say A is woken up. It will store NULL entry
> > at index "0x1c" and wake up B. When B comes along it will find "entry=0"
> > at page offset 0x1c and it will call put_unlocked_entry(&xas, 0). And
> > this means put_unlocked_entry() will not wake up next waiter, given
> > the current code. And that means C continues to wait and is not woken
> > up.
> >
> > This patch fixes the issue by waking up all waiters when a dax entry
> > has been invalidated. This seems to fix the deadlock I am facing
> > and I can make forward progress.
> >
> > Reported-by: Sergio Lopez <slp@redhat.com>
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/dax.c |   12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > Index: redhat-linux/fs/dax.c
> > ===================================================================
> > --- redhat-linux.orig/fs/dax.c  2021-04-16 14:16:44.332140543 -0400
> > +++ redhat-linux/fs/dax.c       2021-04-19 11:24:11.465213474 -0400
> > @@ -264,11 +264,11 @@ static void wait_entry_unlocked(struct x
> >         finish_wait(wq, &ewait.wait);
> >  }
> >
> > -static void put_unlocked_entry(struct xa_state *xas, void *entry)
> > +static void put_unlocked_entry(struct xa_state *xas, void *entry, bool wake_all)
> >  {
> >         /* If we were the only waiter woken, wake the next one */
> >         if (entry && !dax_is_conflict(entry))
> > -               dax_wake_entry(xas, entry, false);
> > +               dax_wake_entry(xas, entry, wake_all);
> >  }
> >
> >  /*
> > @@ -622,7 +622,7 @@ struct page *dax_layout_busy_page_range(
> >                         entry = get_unlocked_entry(&xas, 0);
> >                 if (entry)
> >                         page = dax_busy_page(entry);
> > -               put_unlocked_entry(&xas, entry);
> > +               put_unlocked_entry(&xas, entry, false);
> 
> I'm not a fan of raw true/false arguments because if you read this
> line in isolation you need to go read put_unlocked_entry() to recall
> what that argument means. So lets add something like:
> 
> /**
>  * enum dax_entry_wake_mode: waitqueue wakeup toggle
>  * @WAKE_NEXT: entry was not mutated
>  * @WAKE_ALL: entry was invalidated, or resized
>  */
> enum dax_entry_wake_mode {
>         WAKE_NEXT,
>         WAKE_ALL,
> }
> 
> ...and use that as the arg for dax_wake_entry(). So I'd expect this to
> be a 3 patch series, introduce dax_entry_wake_mode for
> dax_wake_entry(), introduce the argument for put_unlocked_entry()
> without changing the logic, and finally this bug fix. Feel free to add
> 'Fixes: ac401cc78242 ("dax: New fault locking")' in case you feel this
> needs to be backported.

Hi Dan,

I will make changes as you suggested and post another version.

I am wondering what to do with dax_wake_entry(). It also has a boolean
parameter wake_all. Should that be converted as well to make use of
enum dax_entry_wake_mode?

Thanks
Vivek
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Vivek Goyal <vgoyal@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>, Miklos Szeredi <miklos@szeredi.hu>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	virtio-fs-list <virtio-fs@redhat.com>,
	Linux fsdevel mailing list <linux-fsdevel@vger.kernel.org>
Subject: Re: [Virtio-fs] [PATCH][v2] dax: Fix missed wakeup during dax entry invalidation
Date: Mon, 19 Apr 2021 16:39:47 -0400	[thread overview]
Message-ID: <20210419203947.GG1472665@redhat.com> (raw)
In-Reply-To: <CAPcyv4jR5d+-99wVMm9SHxNBOsp0FUi7wzDNsefkZ1oqUZ7joQ@mail.gmail.com>

On Mon, Apr 19, 2021 at 12:48:58PM -0700, Dan Williams wrote:
> On Mon, Apr 19, 2021 at 11:45 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > This is V2 of the patch. Posted V1 here.
> >
> > https://lore.kernel.org/linux-fsdevel/20210416173524.GA1379987@redhat.com/
> >
> > Based on feedback from Dan and Jan, modified the patch to wake up
> > all waiters when dax entry is invalidated. This solves the issues
> > of missed wakeups.
> 
> Care to send a formal patch with this commentary moved below the --- line?
> 
> One style fixup below...
> 
> >
> > I am seeing missed wakeups which ultimately lead to a deadlock when I am
> > using virtiofs with DAX enabled and running "make -j". I had to mount
> > virtiofs as rootfs and also reduce to dax window size to 256M to reproduce
> > the problem consistently.
> >
> > So here is the problem. put_unlocked_entry() wakes up waiters only
> > if entry is not null as well as !dax_is_conflict(entry). But if I
> > call multiple instances of invalidate_inode_pages2() in parallel,
> > then I can run into a situation where there are waiters on
> > this index but nobody will wait these.
> >
> > invalidate_inode_pages2()
> >   invalidate_inode_pages2_range()
> >     invalidate_exceptional_entry2()
> >       dax_invalidate_mapping_entry_sync()
> >         __dax_invalidate_entry() {
> >                 xas_lock_irq(&xas);
> >                 entry = get_unlocked_entry(&xas, 0);
> >                 ...
> >                 ...
> >                 dax_disassociate_entry(entry, mapping, trunc);
> >                 xas_store(&xas, NULL);
> >                 ...
> >                 ...
> >                 put_unlocked_entry(&xas, entry);
> >                 xas_unlock_irq(&xas);
> >         }
> >
> > Say a fault in in progress and it has locked entry at offset say "0x1c".
> > Now say three instances of invalidate_inode_pages2() are in progress
> > (A, B, C) and they all try to invalidate entry at offset "0x1c". Given
> > dax entry is locked, all tree instances A, B, C will wait in wait queue.
> >
> > When dax fault finishes, say A is woken up. It will store NULL entry
> > at index "0x1c" and wake up B. When B comes along it will find "entry=0"
> > at page offset 0x1c and it will call put_unlocked_entry(&xas, 0). And
> > this means put_unlocked_entry() will not wake up next waiter, given
> > the current code. And that means C continues to wait and is not woken
> > up.
> >
> > This patch fixes the issue by waking up all waiters when a dax entry
> > has been invalidated. This seems to fix the deadlock I am facing
> > and I can make forward progress.
> >
> > Reported-by: Sergio Lopez <slp@redhat.com>
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/dax.c |   12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > Index: redhat-linux/fs/dax.c
> > ===================================================================
> > --- redhat-linux.orig/fs/dax.c  2021-04-16 14:16:44.332140543 -0400
> > +++ redhat-linux/fs/dax.c       2021-04-19 11:24:11.465213474 -0400
> > @@ -264,11 +264,11 @@ static void wait_entry_unlocked(struct x
> >         finish_wait(wq, &ewait.wait);
> >  }
> >
> > -static void put_unlocked_entry(struct xa_state *xas, void *entry)
> > +static void put_unlocked_entry(struct xa_state *xas, void *entry, bool wake_all)
> >  {
> >         /* If we were the only waiter woken, wake the next one */
> >         if (entry && !dax_is_conflict(entry))
> > -               dax_wake_entry(xas, entry, false);
> > +               dax_wake_entry(xas, entry, wake_all);
> >  }
> >
> >  /*
> > @@ -622,7 +622,7 @@ struct page *dax_layout_busy_page_range(
> >                         entry = get_unlocked_entry(&xas, 0);
> >                 if (entry)
> >                         page = dax_busy_page(entry);
> > -               put_unlocked_entry(&xas, entry);
> > +               put_unlocked_entry(&xas, entry, false);
> 
> I'm not a fan of raw true/false arguments because if you read this
> line in isolation you need to go read put_unlocked_entry() to recall
> what that argument means. So lets add something like:
> 
> /**
>  * enum dax_entry_wake_mode: waitqueue wakeup toggle
>  * @WAKE_NEXT: entry was not mutated
>  * @WAKE_ALL: entry was invalidated, or resized
>  */
> enum dax_entry_wake_mode {
>         WAKE_NEXT,
>         WAKE_ALL,
> }
> 
> ...and use that as the arg for dax_wake_entry(). So I'd expect this to
> be a 3 patch series, introduce dax_entry_wake_mode for
> dax_wake_entry(), introduce the argument for put_unlocked_entry()
> without changing the logic, and finally this bug fix. Feel free to add
> 'Fixes: ac401cc78242 ("dax: New fault locking")' in case you feel this
> needs to be backported.

Hi Dan,

I will make changes as you suggested and post another version.

I am wondering what to do with dax_wake_entry(). It also has a boolean
parameter wake_all. Should that be converted as well to make use of
enum dax_entry_wake_mode?

Thanks
Vivek


  reply	other threads:[~2021-04-19 20:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-19 18:45 [PATCH][v2] dax: Fix missed wakeup during dax entry invalidation Vivek Goyal
2021-04-19 18:45 ` [Virtio-fs] " Vivek Goyal
2021-04-19 18:45 ` Vivek Goyal
2021-04-19 19:48 ` Dan Williams
2021-04-19 19:48   ` [Virtio-fs] " Dan Williams
2021-04-19 19:48   ` Dan Williams
2021-04-19 20:39   ` Vivek Goyal [this message]
2021-04-19 20:39     ` [Virtio-fs] " Vivek Goyal
2021-04-19 20:39     ` Vivek Goyal
2021-04-19 20:42     ` Vivek Goyal
2021-04-19 20:42       ` [Virtio-fs] " Vivek Goyal
2021-04-19 20:42       ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210419203947.GG1472665@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=miklos@szeredi.hu \
    --cc=slp@redhat.com \
    --cc=virtio-fs@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.