All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nathan Chancellor <natechancellor@gmail.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org, Greg Thelen <gthelen@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 4.14/4.16] writeback: safer lock nesting
Date: Sun, 22 Apr 2018 03:47:36 -0700	[thread overview]
Message-ID: <20180422104736.GA29161@flashbox> (raw)
In-Reply-To: <20180422104257.GA9762@kroah.com>

On Sun, Apr 22, 2018 at 12:42:57PM +0200, Greg Kroah-Hartman wrote:
> On Sun, Apr 22, 2018 at 03:36:32AM -0700, Nathan Chancellor wrote:
> > From: Greg Thelen <gthelen@google.com>
> > 
> > commit 2e898e4c0a3897ccd434adac5abb8330194f527b upstream.
> > 
> > lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
> > the page's memcg is undergoing move accounting, which occurs when a
> > process leaves its memcg for a new one that has
> > memory.move_charge_at_immigrate set.
> > 
> > unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
> > the given inode is switching writeback domains.  Switches occur when
> > enough writes are issued from a new domain.
> > 
> > This existing pattern is thus suspicious:
> >     lock_page_memcg(page);
> >     unlocked_inode_to_wb_begin(inode, &locked);
> >     ...
> >     unlocked_inode_to_wb_end(inode, locked);
> >     unlock_page_memcg(page);
> > 
> > If both inode switch and process memcg migration are both in-flight then
> > unlocked_inode_to_wb_end() will unconditionally enable interrupts while
> > still holding the lock_page_memcg() irq spinlock.  This suggests the
> > possibility of deadlock if an interrupt occurs before unlock_page_memcg().
> > 
> >     truncate
> >     __cancel_dirty_page
> >     lock_page_memcg
> >     unlocked_inode_to_wb_begin
> >     unlocked_inode_to_wb_end
> >     <interrupts mistakenly enabled>
> >                                     <interrupt>
> >                                     end_page_writeback
> >                                     test_clear_page_writeback
> >                                     lock_page_memcg
> >                                     <deadlock>
> >     unlock_page_memcg
> > 
> > Due to configuration limitations this deadlock is not currently possible
> > because we don't mix cgroup writeback (a cgroupv2 feature) and
> > memory.move_charge_at_immigrate (a cgroupv1 feature).
> > 
> > If the kernel is hacked to always claim inode switching and memcg
> > moving_account, then this script triggers lockup in less than a minute:
> > 
> >   cd /mnt/cgroup/memory
> >   mkdir a b
> >   echo 1 > a/memory.move_charge_at_immigrate
> >   echo 1 > b/memory.move_charge_at_immigrate
> >   (
> >     echo $BASHPID > a/cgroup.procs
> >     while true; do
> >       dd if=/dev/zero of=/mnt/big bs=1M count=256
> >     done
> >   ) &
> >   while true; do
> >     sync
> >   done &
> >   sleep 1h &
> >   SLEEP=$!
> >   while true; do
> >     echo $SLEEP > a/cgroup.procs
> >     echo $SLEEP > b/cgroup.procs
> >   done
> > 
> > The deadlock does not seem possible, so it's debatable if there's any
> > reason to modify the kernel.  I suggest we should to prevent future
> > surprises.  And Wang Long said "this deadlock occurs three times in our
> > environment", so there's more reason to apply this, even to stable.
> > Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
> > see "[PATCH for-4.4] writeback: safer lock nesting"
> > https://lkml.org/lkml/2018/4/11/146
> > 
> > Wang Long said "this deadlock occurs three times in our environment"
> > 
> > [gthelen@google.com: v4]
> >   Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
> > [akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
> > Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
> > Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
> > Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
> > Signed-off-by: Greg Thelen <gthelen@google.com>
> > Reported-by: Wang Long <wanglong19@meituan.com>
> > Acked-by: Wang Long <wanglong19@meituan.com>
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Nicholas Piggin <npiggin@gmail.com>
> > Cc: <stable@vger.kernel.org>	[v4.2+]
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > [natechancellor: Adjust context due to lack of b93b016313b3b]
> > Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> 
> I tried a "simple" backport like this as well, and it too blew up when
> building :(
> 
> This patch dies with:
> 
> $ make M=mm
>   CC      mm/filemap.o
> In file included from ./include/linux/kernel.h:13:0,
>                  from ./include/linux/list.h:9,
>                  from ./include/linux/wait.h:7,
>                  from ./include/linux/wait_bit.h:8,
>                  from ./include/linux/fs.h:6,
>                  from ./include/linux/dax.h:5,
>                  from mm/filemap.c:14:
> ./include/linux/backing-dev.h: In function ‘unlocked_inode_to_wb_begin’:
> ./include/linux/backing-dev.h:373:52: error: ‘flags’ undeclared (first use in this function); did you mean ‘class’?
>    spin_lock_irqsave(&inode->i_mapping->tree_lock, *flags);
>                                                     ^
> ./include/linux/typecheck.h:11:9: note: in definition of macro ‘typecheck’
>   typeof(x) __dummy2; \
>          ^
> ./include/linux/spinlock.h:340:2: note: in expansion of macro ‘raw_spin_lock_irqsave’
>   raw_spin_lock_irqsave(spinlock_check(lock), flags); \
>   ^~~~~~~~~~~~~~~~~~~~~
> ./include/linux/backing-dev.h:373:3: note: in expansion of macro ‘spin_lock_irqsave’
>    spin_lock_irqsave(&inode->i_mapping->tree_lock, *flags);
>    ^~~~~~~~~~~~~~~~~
> ./include/linux/backing-dev.h:373:52: note: each undeclared identifier is reported only once for each function it appears in
>    spin_lock_irqsave(&inode->i_mapping->tree_lock, *flags);
>                                                     ^
> ./include/linux/typecheck.h:11:9: note: in definition of macro ‘typecheck’
>   typeof(x) __dummy2; \
>          ^
> ./include/linux/spinlock.h:340:2: note: in expansion of macro ‘raw_spin_lock_irqsave’
>   raw_spin_lock_irqsave(spinlock_check(lock), flags); \
>   ^~~~~~~~~~~~~~~~~~~~~
> ./include/linux/backing-dev.h:373:3: note: in expansion of macro ‘spin_lock_irqsave’
>    spin_lock_irqsave(&inode->i_mapping->tree_lock, *flags);
>    ^~~~~~~~~~~~~~~~~
> ./include/linux/typecheck.h:12:18: warning: comparison of distinct pointer types lacks a cast
>   (void)(&__dummy == &__dummy2); \
>                   ^
> ./include/linux/spinlock.h:221:3: note: in expansion of macro ‘typecheck’
>    typecheck(unsigned long, flags); \
>    ^~~~~~~~~
> ./include/linux/spinlock.h:340:2: note: in expansion of macro ‘raw_spin_lock_irqsave’
>   raw_spin_lock_irqsave(spinlock_check(lock), flags); \
>   ^~~~~~~~~~~~~~~~~~~~~
> ./include/linux/backing-dev.h:373:3: note: in expansion of macro ‘spin_lock_irqsave’
>    spin_lock_irqsave(&inode->i_mapping->tree_lock, *flags);
>    ^~~~~~~~~~~~~~~~~
> In file included from mm/filemap.c:29:0:
> ./include/linux/backing-dev.h: In function ‘unlocked_inode_to_wb_end’:
> ./include/linux/backing-dev.h:391:56: error: ‘flags’ undeclared (first use in this function); did you mean ‘class’?
>    spin_unlock_irqrestore(&inode->i_mapping->tree_lock, flags);
>                                                         ^~~~~
>                                                         class
> make[1]: *** [scripts/Makefile.build:325: mm/filemap.o] Error 1
> make: *** [Makefile:1561: _module_mm] Error 2
> 
> 
> Can you test-build your patches?  :)
> 
> thanks,
> 
> greg k-h

Yes, sorry, I normally do but it seemed like an easy resolution :(

I'll fix this up and resend, sorry for the noise!

Nathan

  reply	other threads:[~2018-04-22 10:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-22 10:36 Backport of 2e898e4c0a38 for 4.4, 4.9, 4.14, and 4.16 Nathan Chancellor
2018-04-22 10:36 ` [PATCH 4.14/4.16] writeback: safer lock nesting Nathan Chancellor
2018-04-22 10:42   ` Greg Kroah-Hartman
2018-04-22 10:47     ` Nathan Chancellor [this message]
2018-04-22 11:17     ` Nathan Chancellor
2018-04-22 10:36 ` [PATCH 4.4] " Nathan Chancellor
2018-04-22 10:36 ` [PATCH 4.9] " Nathan Chancellor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180422104736.GA29161@flashbox \
    --to=natechancellor@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=npiggin@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.