From: Eric Biggers <ebiggers@kernel.org>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jaegeuk Kim <jaegeuk@kernel.org>,
"Theodore Y. Ts'o" <tytso@mit.edu>,
Suleiman Souhlal <suleiman@google.com>,
linux-fscrypt@vger.kernel.org, stable@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [stable] ext4 fscrypt_get_encryption_info() circular locking dependency
Date: Fri, 11 Dec 2020 10:03:26 -0800 [thread overview]
Message-ID: <X9O0brQ7junfZTfI@sol.localdomain> (raw)
In-Reply-To: <20201211040807.GF1667627@google.com>
On Fri, Dec 11, 2020 at 01:08:07PM +0900, Sergey Senozhatsky wrote:
> On (20/12/10 19:48), Eric Biggers wrote:
> > >
> > > [ 133.454836] Chain exists of:
> > > jbd2_handle --> fscrypt_init_mutex --> fs_reclaim
> > >
> > > [ 133.454840] Possible unsafe locking scenario:
> > >
> > > [ 133.454841] CPU0 CPU1
> > > [ 133.454843] ---- ----
> > > [ 133.454844] lock(fs_reclaim);
> > > [ 133.454846] lock(fscrypt_init_mutex);
> > > [ 133.454848] lock(fs_reclaim);
> > > [ 133.454850] lock(jbd2_handle);
> > > [ 133.454851]
> >
> > This actually got fixed by the patch series
> > https://lkml.kernel.org/linux-fscrypt/20200913083620.170627-1-ebiggers@kernel.org/
> > which went into 5.10. The more recent patch to remove ext4_dir_open() isn't
> > related.
> >
> > It's a hard patch series to backport. Backporting it to 5.4 would be somewhat
> > feasible, while 4.19 would be very difficult as there have been a lot of other
> > fscrypt commits which would heavily conflict with cherry-picks.
> >
> > How interested are you in having this fixed? Did you encounter an actual
> > deadlock or just the lockdep report?
>
> Difficult to say. On one hand 'yes' I see lockups on my devices (4.19
> kernel); I can't tell at the moment what's the root cause. So on the
> other hand 'no' I can't say that it's because of ext4_dir_open().
>
> What I saw so far involved ext4, kswapd, khugepaged and lots of other things.
>
> [ 1598.655901] INFO: task khugepaged:66 blocked for more than 122 seconds.
> [ 1598.655914] Call Trace:
> [ 1598.655920] __schedule+0x506/0x1240
> [ 1598.655924] ? kvm_zap_rmapp+0x52/0x69
> [ 1598.655927] schedule+0x3f/0x78
> [ 1598.655929] __rwsem_down_read_failed_common+0x186/0x201
> [ 1598.655933] call_rwsem_down_read_failed+0x14/0x30
> [ 1598.655936] down_read+0x2e/0x45
> [ 1598.655939] rmap_walk_file+0x73/0x1ce
> [ 1598.655941] page_referenced+0x10d/0x154
> [ 1598.655948] shrink_active_list+0x1d4/0x475
>
> [..]
>
> [ 1598.655986] INFO: task kswapd0:79 blocked for more than 122 seconds.
> [ 1598.655993] Call Trace:
> [ 1598.655995] __schedule+0x506/0x1240
> [ 1598.655998] schedule+0x3f/0x78
> [ 1598.656000] __rwsem_down_read_failed_common+0x186/0x201
> [ 1598.656003] call_rwsem_down_read_failed+0x14/0x30
> [ 1598.656006] down_read+0x2e/0x45
> [ 1598.656008] rmap_walk_file+0x73/0x1ce
> [ 1598.656010] page_referenced+0x10d/0x154
> [ 1598.656015] shrink_active_list+0x1d4/0x475
>
> [..]
>
> [ 1598.658233] __rwsem_down_read_failed_common+0x186/0x201
> [ 1598.658235] call_rwsem_down_read_failed+0x14/0x30
> [ 1598.658238] down_read+0x2e/0x45
> [ 1598.658240] rmap_walk_file+0x73/0x1ce
> [ 1598.658242] page_referenced+0x10d/0x154
> [ 1598.658247] shrink_active_list+0x1d4/0x475
> [ 1598.658250] shrink_node+0x27e/0x661
> [ 1598.658254] try_to_free_pages+0x425/0x7ec
> [ 1598.658258] __alloc_pages_nodemask+0x80b/0x1514
> [ 1598.658279] __do_page_cache_readahead+0xd4/0x1a9
> [ 1598.658282] filemap_fault+0x346/0x573
> [ 1598.658287] ext4_filemap_fault+0x31/0x44
Could you provide some more information about what is causing these actual
lockups for you? Are there more stack traces?
I'd be surprised if it's related to the fscrypt-related lockdep reports you're
getting, as the fscrypt bug seemed unlikely to cause deadlocks in practice, as
most of the time fscrypt_get_encryption_info() does *not* do or wait for a
GFP_KERNEL allocation. It's only in certain causes (generally, when things are
being initialized as opposed to already running) that it could.
See e.g. how the lockdep reports assume GFP_KERNEL done under
fscrypt_init_mutex, but that only happens the first time an encrypted directory
is accessed after boot. So that path can't cause a deadlock after that.
This was also a 5 years old bug, so it's unclear why it would suddenly be
causing problems just now...
Maybe something changed that exposed the bug more. I don't know what it would
be, though.
As I said, the fix would be difficult to backport. It required a redesign of
how encrypted files get created, as there were lots of different ways in which
fscrypt_get_encryption_info() could be called during a filesystem transaction.
There's a nontrivial risk of regressions by backporting it. (In fact I already
found and fixed two regressions in it upstream...)
So it would be helpful to know if this is important enough to you that you would
be willing to accept a risk of regressions above that of a typical stable
backport in order to get the re-design that fixes this issue backported.
- Eric
next prev parent reply other threads:[~2020-12-11 19:41 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-11 3:36 [stable] ext4 fscrypt_get_encryption_info() circular locking dependency Sergey Senozhatsky
2020-12-11 3:48 ` Eric Biggers
2020-12-11 4:08 ` Sergey Senozhatsky
2020-12-11 4:34 ` Sergey Senozhatsky
2020-12-11 18:03 ` Eric Biggers [this message]
2021-01-13 11:34 ` Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=X9O0brQ7junfZTfI@sol.localdomain \
--to=ebiggers@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=jaegeuk@kernel.org \
--cc=linux-fscrypt@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sergey.senozhatsky.work@gmail.com \
--cc=stable@vger.kernel.org \
--cc=suleiman@google.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).