Re: nfs_page_async_flush returning 0 for fatal errors on writeback

From: Calum Mackay <calum.mackay@oracle.com>
To: Trond Myklebust <trondmy@hammerspace.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Cc: Calum Mackay <calum.mackay@oracle.com>
Subject: Re: nfs_page_async_flush returning 0 for fatal errors on writeback
Date: Thu, 8 Jul 2021 00:57:03 +0100	[thread overview]
Message-ID: <d806f1b6-30ff-812e-e37b-090e11a650f0@oracle.com> (raw)
In-Reply-To: <421b63750c5d80bdda1184c854d7ab3489c7ff01.camel@hammerspace.com>

[-- Attachment #1.1: Type: text/plain, Size: 6418 bytes --]

On 08/07/2021 12:50 am, Trond Myklebust wrote:
> On Thu, 2021-07-08 at 00:13 +0100, Calum Mackay wrote:
>> On 07/07/2021 11:01 pm, Trond Myklebust wrote:
>>> On Wed, 2021-07-07 at 19:51 +0100, Calum Mackay wrote:
>>>> hi Trond,
>>>>
>>>> I had a question about these two old commits of yours, from v5.0
>>>> &
>>>> v5.2:
>>>>
>>>> 14bebe3c90b3 NFS: Don't interrupt file writeout due to fatal
>>>> errors
>>>> (2
>>>> years, 2 months ago)
>>>>
>>>> 8fc75bed96bb NFS: Fix up return value on fatal errors in
>>>> nfs_page_async_flush() (2 years, 5 months ago)
>>>>
>>>>
>>>> I am looking at a crash dump, with a kernel based on an older-
>>>> still
>>>> v4.14 stable which did not have either of the above commits.
>>>>
>>>>           PANIC: "BUG: unable to handle kernel NULL pointer
>>>> dereference
>>>> at
>>>> 0000000000000080"
>>>>
>>>>        [exception RIP: _raw_spin_lock+20]
>>>>
>>>> #10 [ffffb1493d78fcb8] nfs_updatepage at ffffffffc08f1791 [nfs]
>>>> #11 [ffffb1493d78fd10] nfs_write_end at ffffffffc08e094e [nfs]
>>>> #12 [ffffb1493d78fd58] generic_perform_write at ffffffffa71d458b
>>>> #13 [ffffb1493d78fde0] nfs_file_write at ffffffffc08dfdb4 [nfs]
>>>> #14 [ffffb1493d78fe18] __vfs_write at ffffffffa72848bc
>>>> #15 [ffffb1493d78fea0] vfs_write at ffffffffa7284ad2
>>>> #16 [ffffb1493d78fee0] sys_write at ffffffffa7284d35
>>>> #17 [ffffb1493d78ff28] do_syscall_64 at ffffffffa7003949
>>>>
>>>> the real sequence, obscured by compiler inlining, is:
>>>>
>>>>       nfs_updatepage
>>>>          nfs_writepage_setup
>>>>             nfs_setup_write_request
>>>>                nfs_inode_add_request
>>>>                   spin_lock(&mapping->private_lock);
>>>>
>>>> and we crash since the as mapping pointer is NULL.
>>>>
>>>>
>>>> I thought I was able to construct a possible sequence that would
>>>> explain
>>>> the above, if we are in (from above):
>>>>
>>>>       nfs_setup_write_request
>>>>        nfs_try_to_update_request
>>>>         nfs_wb_page
>>>>          nfs_writepage_locked
>>>>           nfs_do_writepage
>>>>
>>>> and nfs_page_async_flush detects a fatal server error, and calls
>>>> nfs_write_error_remove_page, which results in the page->mapping
>>>> set
>>>> to NULL.
>>>>
>>>> In that version of the code, without your commits above,
>>>> nfs_page_async_flush returns 0 in this case, which I thought
>>>> might
>>>> result in nfs_setup_write_request going ahead and calling
>>>> nfs_inode_add_request with that page, resulting in the crash
>>>> seen.
>>>>
>>>>
>>>> I then discovered your v5.0 commit:
>>>>
>>>> 8fc75bed96bb NFS: Fix up return value on fatal errors in
>>>> nfs_page_async_flush() (2 years, 5 months ago)
>>>>
>>>> which appeared to correct that, having nfs_page_async_flush
>>>> return
>>>> the
>>>> error in this case, so we would not end up in
>>>> nfs_inode_add_request.
>>>>
>>>>
>>>> But I then spotted your later v5.2 commit:
>>>>
>>>> 14bebe3c90b3 NFS: Don't interrupt file writeout due to fatal
>>>> errors
>>>> (2
>>>> years, 2 months ago)
>>>>
>>>> which changes things back, so that nfs_page_async_flush now again
>>>> returns 0, in the "launder" case, and that's how that code
>>>> remains
>>>> today.
>>>>
>>>>
>>>> If so, is there anything to stop the possible crash path that I
>>>> describe
>>>> above?
>>>>
>>>>
>>>> path I suggest above? Or perhaps I'm missing another commit that
>>>> stops
>>>> it happening, even after your second commit above?
>>>>
>>>
>>> In order for page->mapping to get set to NULL, we'd have to be
>>> removing
>>> the page from the page cache altogether. I'm not seeing where we'd
>>> be
>>> doing that here. It certainly isn't possible for some third party
>>> to do
>>> so, since our thread is holding the page lock and I'm not seeing
>>> where
>>> the call to nfs_write_error() might be doing so.
>>>
>>> We do call nfs_inode_remove_request(), which removes the struct
>>> nfs_page that is tracking the page dirtiness, however it shouldn't
>>> ever
>>> result in the removal of the pagecache page itself.
>>>
>>> Am I misreading your email?
>>
>> thanks very much Trond; much more likely I am misreading the code :)
>>
>>
>> My theory was that we have nfs_page_async_flush detecting
>> nfs_error_is_fatal_on_server, so calling nfs_write_error_remove_page
>> (this is an older v4.14.72-ish kernel).
>>
>> That would then generic_error_remove_page -> truncate_inode_page ->
>> truncate_complete_page -> delete_from_page_cache
>>
>> thus, as you say, removing the page from the page cache, with
>> __delete_from_page_cache clearing page->mapping.
>>
>>
>> Without your v5.0 commit, nfs_page_async_flush will then return 0,
>> via
>> nfs_do_writepage, nfs_writepage_locked, nfs_wb_page to
>> nfs_try_to_update_request, which then returns NULL to
>> nfs_setup_write_request.
>>
>> nfs_inode_add_request, which itself then dereferences the mapping:
>>
>>          spin_lock(&mapping->private_lock);
>>
>> which is where we crash.
>>
>>
>> Obviously, there are a number of assumptions in the above, so I
>> thought
>> it must just be a possible path the code could take?
>>
>> Does that sound plausible (given that v4.14.72-ish code)?
>>
>>
>>
>> However, I note that in a subsequent v5.2 commit:
>>
>> 22876f540bdf NFS: Don't call generic_error_remove_page() while
>> holding locks
>>
>> you remove the call to generic_error_remove_page from
>> nfs_write_error_remove_page(), and that is itself then renamed
>> nfs_write_error(), as part of a later v5.2 commit:
>>
>> 6fbda89b257f NFS: Replace custom error reporting mechanism with
>> generic one
>>
>>
>> Without those commits, and also without your adjustments to
>> nfs_page_async_flush I mentioned earlier, is it possible that the
>> code
>> path I present above, where the page /is/ removed from the page
>> cache,
>> could result in the crash we saw?
>>
>>
> 
> OK, yes that is plausible. The call to generic_error_remove_page()
> would, as you said, remove the page from the page cache, and thus could
> result in the crash you describe.
> 

that's great, thanks very much indeed for the confirmation, Trond.

sorry to waste your time with older code…

cheers,
calum.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]