All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nathan Chancellor <natechancellor@gmail.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org, Greg Thelen <gthelen@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 4.4] writeback: safer lock nesting
Date: Sun, 22 Apr 2018 03:16:37 -0700	[thread overview]
Message-ID: <20180422101637.GA18834@flashbox> (raw)
In-Reply-To: <20180422101543.GA26346@kroah.com>

On Sun, Apr 22, 2018 at 12:15:43PM +0200, Greg Kroah-Hartman wrote:
> On Sun, Apr 22, 2018 at 03:07:59AM -0700, Nathan Chancellor wrote:
> > From: Greg Thelen <gthelen@google.com>
> > 
> > commit 2e898e4c0a3897ccd434adac5abb8330194f527b upstream.
> > 
> > lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
> > the page's memcg is undergoing move accounting, which occurs when a
> > process leaves its memcg for a new one that has
> > memory.move_charge_at_immigrate set.
> > 
> > unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
> > the given inode is switching writeback domains.  Switches occur when
> > enough writes are issued from a new domain.
> > 
> > This existing pattern is thus suspicious:
> >     lock_page_memcg(page);
> >     unlocked_inode_to_wb_begin(inode, &locked);
> >     ...
> >     unlocked_inode_to_wb_end(inode, locked);
> >     unlock_page_memcg(page);
> > 
> > If both inode switch and process memcg migration are both in-flight then
> > unlocked_inode_to_wb_end() will unconditionally enable interrupts while
> > still holding the lock_page_memcg() irq spinlock.  This suggests the
> > possibility of deadlock if an interrupt occurs before unlock_page_memcg().
> > 
> >     truncate
> >     __cancel_dirty_page
> >     lock_page_memcg
> >     unlocked_inode_to_wb_begin
> >     unlocked_inode_to_wb_end
> >     <interrupts mistakenly enabled>
> >                                     <interrupt>
> >                                     end_page_writeback
> >                                     test_clear_page_writeback
> >                                     lock_page_memcg
> >                                     <deadlock>
> >     unlock_page_memcg
> > 
> > Due to configuration limitations this deadlock is not currently possible
> > because we don't mix cgroup writeback (a cgroupv2 feature) and
> > memory.move_charge_at_immigrate (a cgroupv1 feature).
> > 
> > If the kernel is hacked to always claim inode switching and memcg
> > moving_account, then this script triggers lockup in less than a minute:
> > 
> >   cd /mnt/cgroup/memory
> >   mkdir a b
> >   echo 1 > a/memory.move_charge_at_immigrate
> >   echo 1 > b/memory.move_charge_at_immigrate
> >   (
> >     echo $BASHPID > a/cgroup.procs
> >     while true; do
> >       dd if=/dev/zero of=/mnt/big bs=1M count=256
> >     done
> >   ) &
> >   while true; do
> >     sync
> >   done &
> >   sleep 1h &
> >   SLEEP=$!
> >   while true; do
> >     echo $SLEEP > a/cgroup.procs
> >     echo $SLEEP > b/cgroup.procs
> >   done
> > 
> > The deadlock does not seem possible, so it's debatable if there's any
> > reason to modify the kernel.  I suggest we should to prevent future
> > surprises.  And Wang Long said "this deadlock occurs three times in our
> > environment", so there's more reason to apply this, even to stable.
> > Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
> > see "[PATCH for-4.4] writeback: safer lock nesting"
> > https://lkml.org/lkml/2018/4/11/146
> > 
> > Wang Long said "this deadlock occurs three times in our environment"
> > 
> > [gthelen@google.com: v4]
> >   Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
> > [akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
> > Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
> > Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
> > Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
> > Signed-off-by: Greg Thelen <gthelen@google.com>
> > Reported-by: Wang Long <wanglong19@meituan.com>
> > Acked-by: Wang Long <wanglong19@meituan.com>
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Nicholas Piggin <npiggin@gmail.com>
> > Cc: <stable@vger.kernel.org>	[v4.2+]
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > [natechancellor: Applied to 4.4 based on Greg's backport on lkml.org]
> 
> I need a working patch for 4.9.y, 4.14.y and 4.16.y first before I can
> apply this one :(
> 
> thanks,
> 
> greg k-h

Will do it now, thanks!

  reply	other threads:[~2018-04-22 10:16 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-22 10:07 [PATCH 4.4] writeback: safer lock nesting Nathan Chancellor
2018-04-22 10:15 ` Greg Kroah-Hartman
2018-04-22 10:16   ` Nathan Chancellor [this message]
2018-04-22 10:36 Backport of 2e898e4c0a38 for 4.4, 4.9, 4.14, and 4.16 Nathan Chancellor
2018-04-22 10:36 ` [PATCH 4.4] writeback: safer lock nesting Nathan Chancellor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180422101637.GA18834@flashbox \
    --to=natechancellor@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=npiggin@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.