linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Lee Jones <lee.jones@linaro.org>,
	linux-ext4@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Dave Chinner <dchinner@redhat.com>,
	Goldwyn Rodrigues <rgoldwyn@suse.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Bob Peterson <rpeterso@redhat.com>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	Ritesh Harjani <riteshh@linux.ibm.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Johannes Thumshirn <jth@kernel.org>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	cluster-devel@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()
Date: Thu, 17 Feb 2022 22:33:34 -0800	[thread overview]
Message-ID: <7bd88058-2a9a-92a6-2280-43c805b516c3@nvidia.com> (raw)
In-Reply-To: <Yg8bxiz02WBGf6qO@mit.edu>

On 2/17/22 20:08, Theodore Ts'o wrote:
> On Thu, Feb 17, 2022 at 05:06:45PM -0800, John Hubbard wrote:
>> Yes. And looking at the pair of backtraces below, this looks very much
>> like another aspect of the "get_user_pages problem" [1], originally
>> described in Jan Kara's 2018 email [2].
> 
> Hmm... I just posted my analysis, which tracks with yours; but I had
> forgotten about Jan's 2018 e-mail on the matter.
> 
>> I'm getting close to posting an RFC for the direct IO conversion to
>> FOLL_PIN, but even after that, various parts of the kernel (reclaim,
>> filesystems/block layer) still need to be changed so as to use
>> page_maybe_dma_pinned() to help avoid this problem. There's a bit
>> more than that, actually.
> 
> The challenge is that fixing this "the right away" is probably not
> something we can backport into an LTS kernel, whether it's 5.15 or
> 5.10... or 4.19.
> 
> The only thing which can probably survive getting backported is
> something like this.  It won't make the right thing happen if someone
> is trying to RDMA or call process_vm_writev() into a file-backed
> memory region --- but I'm not sure I care.  Certainly if the goal is
> to make Android kernels, I'm pretty sure they are't either using RDMA,
> and I suspect they are probably masking out the process_vm_writev(2)
> system call (at least, for Android O and newer).  So if the goal is to
> just to close some Syzbot bugs, what do folks think about this?
> 
>       	      	   	  	     	- Ted

Hi Ted!

This seems reasonable...-ish. Although this could turn into a pretty
grand game of whack-a-mole, one filesystem at a time. :)


> 
> commit 7711b1fda6f7f04274fa1cba6f092410262b0296
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Thu Feb 17 22:54:03 2022 -0500
> 
>      ext4: work around bugs in mm/gup.c that can cause ext4 to BUG()
>      
>      [un]pin_user_pages_remote is dirtying pages without properly warning
>      the file system in advance (or faulting in the file data if the page

Just a small thing I'll say once, to get it out of my system. No action
required here, I just want it understood:

Before commit 803e4572d7c5 ("mm/process_vm_access: set FOLL_PIN via
pin_user_pages_remote()"), you would have written that like this:

"process_vm_writev() is dirtying pages without properly warning the file
system in advance..."

Because, there were many callers that were doing this:

     get_user_pages*()
     ...use the pages...

     for_each_page() {
             set_page_dirty*()
             put_page()
     }

anyway, moving on smartly...

>      is not yet in the page cache).  This was noted by Jan Kara in 2018[1]
>      and more recently has resulted in bug reports by Syzbot in various
>      Android kernels[2].
>      
>      Fixing this for real is non-trivial, and will never be backportable
>      into stable kernels.  So this is a simple workaround that stops the
>      kernel from BUG()'ing.  The changed pages will not be properly written
>      back, but given that the current gup code is missing the "read" in
>      "read-modify-write", the dirty page in the page cache might contain
>      corrupted data anyway.
>      
>      [1] https://www.spinics.net/lists/linux-mm/msg142700.html

(Sorry my earlier response mangled this link. I've manually fixed it
here, and am working with our IT to get the root cause addressed.)

>      [2] https://lore.kernel.org/r/Yg0m6IjcNmfaSokM@google.com
>      
>      Reported-by: syzbot+d59332e2db681cf18f0318a06e994ebbb529a8db@syzkaller.appspotmail.com
>      Reported-by: Lee Jones <lee.jones@linaro.org>
>      Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 01c9e4f743ba..3b2f336a90d1 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1993,6 +1993,15 @@ static int ext4_writepage(struct page *page,
>   	else
>   		len = PAGE_SIZE;
>   
> +	/* Should never happen but for buggy gup code */
> +	if (!page_has_buffers(page)) {
> +		ext4_warning_inode(inode,
> +		   "page %lu does not have buffers attached", page->index);

I see that ext4_warning_inode() has rate limiting, but it doesn't look
like it's really intended for a per-page rate. It looks like it is
per-superblock (yes?), so I suspect one instance of this problem, with
many pages involved, could hit the limit.

Often, WARN_ON_ONCE() is used with per-page assertions. That's not great
either, but it might be better here. OTOH, I have minimal experience
with ext4_warning_inode() and maybe it really is just fine with per-page
failure rates.

thanks,
-- 
John Hubbard
NVIDIA

> +		ClearPageDirty(page);
> +		unlock_page(page);
> +		return 0;
> +	}
> +
>   	page_bufs = page_buffers(page);
>   	/*
>   	 * We cannot do block allocation or other extent handling in this
> @@ -2594,6 +2603,14 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>   			wait_on_page_writeback(page);
>   			BUG_ON(PageWriteback(page));
>   
> +			/* Should never happen but for buggy gup code */
> +			if (!page_has_buffers(page)) {
> +				ext4_warning_inode(mpd->inode, "page %lu does not have buffers attached", page->index);
> +				ClearPageDirty(page);
> +				unlock_page(page);
> +				continue;
> +			}
> +
>   			if (mpd->map.m_len == 0)
>   				mpd->first_page = page->index;
>   			mpd->next_page = page->index + 1;


  reply	other threads:[~2022-02-18  6:33 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-16 16:31 [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers() Lee Jones
2022-02-18  1:06 ` John Hubbard
2022-02-18  4:08   ` Theodore Ts'o
2022-02-18  6:33     ` John Hubbard [this message]
2022-02-23 23:31       ` Theodore Ts'o
2022-02-24  0:44         ` John Hubbard
2022-02-24  4:04           ` Theodore Ts'o
2022-02-18  7:51     ` Greg Kroah-Hartman
2022-02-23 23:35       ` Theodore Ts'o
2022-02-24  1:48         ` Dave Chinner
2022-02-24  3:50           ` Theodore Ts'o
2022-02-24 10:29             ` Dave Chinner
2022-02-18  2:54 ` Theodore Ts'o
2022-02-18  4:24   ` Matthew Wilcox
2022-02-18  6:03     ` Theodore Ts'o
2022-02-25 19:24 ` [PATCH -v2] ext4: don't BUG if kernel subsystems dirty pages without asking ext4 first Theodore Ts'o
2022-02-25 20:51   ` Eric Biggers
2022-02-25 21:08     ` Theodore Ts'o
2022-02-25 21:23       ` [PATCH -v3] " Theodore Ts'o
2022-02-25 21:33         ` John Hubbard
2022-02-25 23:21           ` Theodore Ts'o
2022-02-26  0:41             ` John Hubbard
2022-02-26  1:40               ` Theodore Ts'o
2022-02-26  2:00                 ` Theodore Ts'o
2022-02-26  2:55                 ` John Hubbard
2022-03-03  4:26         ` [PATCH -v4] " Theodore Ts'o
2022-03-03  8:21           ` Christoph Hellwig
2022-03-03  9:21           ` Lee Jones
2022-03-03 14:38           ` [PATCH -v5] ext4: don't BUG if someone " Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7bd88058-2a9a-92a6-2280-43c805b516c3@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=agruenba@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=damien.lemoal@wdc.com \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=jth@kernel.org \
    --cc=lee.jones@linaro.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=rgoldwyn@suse.com \
    --cc=riteshh@linux.ibm.com \
    --cc=rpeterso@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).