From: Andreas Dilger <adilger@dilger.ca>
To: Damien Le Moal <damien.lemoal@wdc.com>
Cc: "Theodore Y. Ts'o" <tytso@mit.edu>,
Dave Chinner <david@fromorbit.com>,
Christoph Hellwig <hch@infradead.org>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Christoph Hellwig <hch@lst.de>,
Johannes Thumshirn <jthumshirn@suse.de>,
Naohiro Aota <Naohiro.Aota@wdc.com>,
Masato Suzuki <masato.suzuki@wdc.com>
Subject: Re: [PATCH] ext4: Fix deadlock on page reclaim
Date: Mon, 29 Jul 2019 12:40:44 -0600 [thread overview]
Message-ID: <3D2360FA-AD48-48AE-B1CE-D1CF58C1B8AB@dilger.ca> (raw)
In-Reply-To: <BYAPR04MB58162929012135E47C68923AE7C30@BYAPR04MB5816.namprd04.prod.outlook.com>
[-- Attachment #1: Type: text/plain, Size: 2973 bytes --]
On Jul 26, 2019, at 8:59 PM, Damien Le Moal <damien.lemoal@wdc.com> wrote:
>
> On 2019/07/27 7:55, Theodore Y. Ts'o wrote:
>> On Sat, Jul 27, 2019 at 08:44:23AM +1000, Dave Chinner wrote:
>>>>
>>>> This looks like something that could hit every file systems, so
>>>> shouldn't we fix this in common code? We could also look into
>>>> just using memalloc_nofs_save for the page cache allocation path
>>>> instead of the per-mapping gfp_mask.
>>>
>>> I think it has to be the entire IO path - any allocation from the
>>> underlying filesystem could recurse into the top level filesystem
>>> and then deadlock if the memory reclaim submits IO or blocks on
>>> IO completion from the upper filesystem. That's a bloody big hammer
>>> for something that is only necessary when there are stacked
>>> filesystems like this....
>>
>> Yeah.... that's why using memalloc_nofs_save() probably makes the most
>> sense, and dm_zoned should use that before it calls into ext4.
>
> Unfortunately, with this particular setup, that will not solve the problem.
> dm-zoned submit BIOs to its backend drive in response to XFS activity. The
> requests for these BIOs are passed along to the kernel tcmu HBA and end up in
> that HBA command ring. The commands themselves are read from the ring and
> executed by the tcmu-runner user process which executes them doing
> pread()/pwrite() to the ext4 file. The tcmu-runner process being a different
> context than the dm-zoned worker thread issuing the BIO,
> memalloc_nofs_save/restore() calls in dm-zoned will have no effect.
One way to handle this is to pass on PF_MEMALLOC/memalloc_nofs_save state
in the BIO so that the worker thread will also get the correct behaviour
when it is processing this IO submission.
> We tried a simpler setup using loopback mount (XFS used directly in an ext4
> file) and running the same workload. We failed to recreate a similar deadlock in
> this case, but I am strongly suspecting that it can happen too. It is simply
> much harder to hit because the IO path from XFS to ext4 is all in-kernel and
> asynchronous, whereas tcmu-runner ZBC handler is a synchronous QD=1 path for IOs
> which makes it relatively easy to get inter-dependent writes or read+write
> queued back-to-back and create the deadlock.
>
> So back to Dave's point, we may be needing the big-hammer solution in the case
> of stacked file systems, while a non-stack setups do not necessarily need it
> (that is for the FS to decide). But I do not see how to implement this big
> hammer conditionally. How can a file system tell if it is at the top of the
> stack (big hammer not needed) or lower than the top level (big hammer needed) ?
>
> One simple hack would be an fcntl() or mount option to tell the FS to use
> GFP_NOFS unconditionally, but avoiding the bug would mean making sure that the
> applications or system setup is correct. So not so safe.
Cheers, Andreas
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]
next prev parent reply other threads:[~2019-07-29 18:40 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-25 9:33 [PATCH] ext4: Fix deadlock on page reclaim Damien Le Moal
2019-07-25 11:54 ` Christoph Hellwig
2019-07-25 19:57 ` Andreas Dilger
2019-07-26 2:24 ` Damien Le Moal
2019-07-26 22:44 ` Dave Chinner
2019-07-26 22:55 ` Theodore Y. Ts'o
2019-07-27 2:59 ` Damien Le Moal
2019-07-28 23:41 ` Dave Chinner
2019-07-29 1:07 ` Damien Le Moal
2019-07-29 18:40 ` Andreas Dilger [this message]
2019-07-30 2:06 ` Damien Le Moal
2019-07-30 23:47 ` Dave Chinner
2019-07-31 4:37 ` Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3D2360FA-AD48-48AE-B1CE-D1CF58C1B8AB@dilger.ca \
--to=adilger@dilger.ca \
--cc=Naohiro.Aota@wdc.com \
--cc=damien.lemoal@wdc.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=hch@lst.de \
--cc=jthumshirn@suse.de \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=masato.suzuki@wdc.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).