All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "Михаил Гаврилов" <mikhail.v.gavrilov@gmail.com>,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org
Subject: Re: kernel BUG at fs/xfs/xfs_aops.c:853! in kernel 4.13 rc6
Date: Mon, 4 Sep 2017 11:43:53 +1000	[thread overview]
Message-ID: <20170904014353.GG10621@dastard> (raw)
In-Reply-To: <20170903074306.GA8351@infradead.org>

On Sun, Sep 03, 2017 at 12:43:06AM -0700, Christoph Hellwig wrote:
> On Sun, Sep 03, 2017 at 09:22:17AM +0500, Михаил Гаврилов wrote:
> > [281502.961248] ------------[ cut here ]------------
> > [281502.961257] kernel BUG at fs/xfs/xfs_aops.c:853!
> 
> This is:
> 
> 	bh = head = page_buffers(page);
> 
> Which looks odd and like some sort of VM/writeback change might
> have triggered that we get a page without buffers, despite always
> creating buffers in iomap_begin/end and page_mkwrite.

Pretty sure this can still happen when buffer_heads_over_limit comes
true. In that case, shrink_active_list() will attempt to strip
the bufferheads off the page even if it's a dirty page. i.e. this
code:

                if (unlikely(buffer_heads_over_limit)) {
                        if (page_has_private(page) && trylock_page(page)) {
                                if (page_has_private(page))
                                        try_to_release_page(page, 0);
                                unlock_page(page);
                        }
                }


There was some discussion about this a while back, the consensus was
that it is a mm bug, but nobody wanted to add a PageDirty check
to try_to_release_page() and so nothing ended up being done about
it in the mm/ subsystem. Instead, filesystems needed to avoid it
if it was a problem for them. Indeed, we fixed it in the filesystem
in 4.8:

99579ccec4e2 xfs: skip dirty pages in ->releasepage()

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 3ba0809e0be8..6135787500fc 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1040,6 +1040,20 @@ xfs_vm_releasepage(
 
        trace_xfs_releasepage(page->mapping->host, page, 0, 0);
 
+       /*
+        * mm accommodates an old ext3 case where clean pages might not have had
+        * the dirty bit cleared. Thus, it can send actual dirty pages to
+        * ->releasepage() via shrink_active_list(). Conversely,
+        * block_invalidatepage() can send pages that are still marked dirty
+        * but otherwise have invalidated buffers.
+        *
+        * We've historically freed buffers on the latter. Instead, quietly
+        * filter out all dirty pages to avoid spurious buffer state warnings.
+        * This can likely be removed once shrink_active_list() is fixed.
+        */
+       if (PageDirty(page))
+               return 0;
+
        xfs_count_page_state(page, &delalloc, &unwritten);

But looking at the current code, the comment is still mostly there
but the PageDirty() check isn't.

<sigh>

In 4.10, this was done:

commit 0a417b8dc1f10b03e8f558b8a831f07ec4c23795
Author: Jan Kara <jack@suse.cz>
Date:   Wed Jan 11 10:20:04 2017 -0800

    xfs: Timely free truncated dirty pages
    
    Commit 99579ccec4e2 "xfs: skip dirty pages in ->releasepage()" started
    to skip dirty pages in xfs_vm_releasepage() which also has the effect
    that if a dirty page is truncated, it does not get freed by
    block_invalidatepage() and is lingering in LRU list waiting for reclaim.
    So a simple loop like:
    
    while true; do
            dd if=/dev/zero of=file bs=1M count=100
            rm file
    done
    
    will keep using more and more memory until we hit low watermarks and
    start pagecache reclaim which will eventually reclaim also the truncate
    pages. Keeping these truncated (and thus never usable) pages in memory
    is just a waste of memory, is unnecessarily stressing page cache
    reclaim, and reportedly also leads to anonymous mmap(2) returning ENOMEM
    prematurely.
    
    So instead of just skipping dirty pages in xfs_vm_releasepage(), return
    to old behavior of skipping them only if they have delalloc or unwritten
    buffers and fix the spurious warnings by warning only if the page is
    clean.
    
    CC: stable@vger.kernel.org
    CC: Brian Foster <bfoster@redhat.com>
    CC: Vlastimil Babka <vbabka@suse.cz>
    Reported-by: Petr T�ma <petr.tuma@d3s.mff.cuni.cz>
    Fixes: 99579ccec4e271c3d4d4e7c946058766812afdab
    Signed-off-by: Jan Kara <jack@suse.cz>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


So, yeah, we reverted the fix for a crash rather than trying to fix
the adverse behaviour caused by invalidation of a dirty page.

e.g. why didn't we simply clear the PageDirty flag in
xfs_vm_invalidatepage()?  The page is being invalidated - it's
contents will never get written back - so having delalloc or
unwritten extents over that page at the time it is invalidated is a
bug and the original fix would have triggered warnings about
this....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "Михаил Гаврилов" <mikhail.v.gavrilov@gmail.com>,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org
Subject: Re: kernel BUG at fs/xfs/xfs_aops.c:853! in kernel 4.13 rc6
Date: Mon, 4 Sep 2017 11:43:53 +1000	[thread overview]
Message-ID: <20170904014353.GG10621@dastard> (raw)
In-Reply-To: <20170903074306.GA8351@infradead.org>

On Sun, Sep 03, 2017 at 12:43:06AM -0700, Christoph Hellwig wrote:
> On Sun, Sep 03, 2017 at 09:22:17AM +0500, D?D,N?D?D,D>> D?D?D2N?D,D>>D 3/4 D2 wrote:
> > [281502.961248] ------------[ cut here ]------------
> > [281502.961257] kernel BUG at fs/xfs/xfs_aops.c:853!
> 
> This is:
> 
> 	bh = head = page_buffers(page);
> 
> Which looks odd and like some sort of VM/writeback change might
> have triggered that we get a page without buffers, despite always
> creating buffers in iomap_begin/end and page_mkwrite.

Pretty sure this can still happen when buffer_heads_over_limit comes
true. In that case, shrink_active_list() will attempt to strip
the bufferheads off the page even if it's a dirty page. i.e. this
code:

                if (unlikely(buffer_heads_over_limit)) {
                        if (page_has_private(page) && trylock_page(page)) {
                                if (page_has_private(page))
                                        try_to_release_page(page, 0);
                                unlock_page(page);
                        }
                }


There was some discussion about this a while back, the consensus was
that it is a mm bug, but nobody wanted to add a PageDirty check
to try_to_release_page() and so nothing ended up being done about
it in the mm/ subsystem. Instead, filesystems needed to avoid it
if it was a problem for them. Indeed, we fixed it in the filesystem
in 4.8:

99579ccec4e2 xfs: skip dirty pages in ->releasepage()

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 3ba0809e0be8..6135787500fc 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1040,6 +1040,20 @@ xfs_vm_releasepage(
 
        trace_xfs_releasepage(page->mapping->host, page, 0, 0);
 
+       /*
+        * mm accommodates an old ext3 case where clean pages might not have had
+        * the dirty bit cleared. Thus, it can send actual dirty pages to
+        * ->releasepage() via shrink_active_list(). Conversely,
+        * block_invalidatepage() can send pages that are still marked dirty
+        * but otherwise have invalidated buffers.
+        *
+        * We've historically freed buffers on the latter. Instead, quietly
+        * filter out all dirty pages to avoid spurious buffer state warnings.
+        * This can likely be removed once shrink_active_list() is fixed.
+        */
+       if (PageDirty(page))
+               return 0;
+
        xfs_count_page_state(page, &delalloc, &unwritten);

But looking at the current code, the comment is still mostly there
but the PageDirty() check isn't.

<sigh>

In 4.10, this was done:

commit 0a417b8dc1f10b03e8f558b8a831f07ec4c23795
Author: Jan Kara <jack@suse.cz>
Date:   Wed Jan 11 10:20:04 2017 -0800

    xfs: Timely free truncated dirty pages
    
    Commit 99579ccec4e2 "xfs: skip dirty pages in ->releasepage()" started
    to skip dirty pages in xfs_vm_releasepage() which also has the effect
    that if a dirty page is truncated, it does not get freed by
    block_invalidatepage() and is lingering in LRU list waiting for reclaim.
    So a simple loop like:
    
    while true; do
            dd if=/dev/zero of=file bs=1M count=100
            rm file
    done
    
    will keep using more and more memory until we hit low watermarks and
    start pagecache reclaim which will eventually reclaim also the truncate
    pages. Keeping these truncated (and thus never usable) pages in memory
    is just a waste of memory, is unnecessarily stressing page cache
    reclaim, and reportedly also leads to anonymous mmap(2) returning ENOMEM
    prematurely.
    
    So instead of just skipping dirty pages in xfs_vm_releasepage(), return
    to old behavior of skipping them only if they have delalloc or unwritten
    buffers and fix the spurious warnings by warning only if the page is
    clean.
    
    CC: stable@vger.kernel.org
    CC: Brian Foster <bfoster@redhat.com>
    CC: Vlastimil Babka <vbabka@suse.cz>
    Reported-by: Petr Ti? 1/2 ma <petr.tuma@d3s.mff.cuni.cz>
    Fixes: 99579ccec4e271c3d4d4e7c946058766812afdab
    Signed-off-by: Jan Kara <jack@suse.cz>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


So, yeah, we reverted the fix for a crash rather than trying to fix
the adverse behaviour caused by invalidation of a dirty page.

e.g. why didn't we simply clear the PageDirty flag in
xfs_vm_invalidatepage()?  The page is being invalidated - it's
contents will never get written back - so having delalloc or
unwritten extents over that page at the time it is invalidated is a
bug and the original fix would have triggered warnings about
this....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-09-04  1:43 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-03  4:22 kernel BUG at fs/xfs/xfs_aops.c:853! in kernel 4.13 rc6 Михаил Гаврилов
2017-09-03  7:43 ` Christoph Hellwig
2017-09-03  7:43   ` Christoph Hellwig
2017-09-03 14:08   ` Михаил Гаврилов
2017-09-03 14:08     ` Михаил Гаврилов
2017-09-04 12:30     ` Jan Kara
2017-09-04 12:30       ` Jan Kara
2017-10-07  8:10       ` Михаил Гаврилов
2017-10-07  8:10         ` Михаил Гаврилов
2017-10-07  9:22         ` Михаил Гаврилов
2017-10-07  9:22           ` Михаил Гаврилов
2017-10-09  0:05         ` Dave Chinner
2017-10-09  0:05           ` Dave Chinner
2017-10-09 18:31           ` Luis R. Rodriguez
2017-10-09 18:31             ` Luis R. Rodriguez
2017-10-09 19:02             ` Eric W. Biederman
2017-10-09 19:02               ` Eric W. Biederman
2017-10-15  8:53               ` Aleksa Sarai
2017-10-15  8:53                 ` Aleksa Sarai
2017-10-15 13:06                 ` Theodore Ts'o
2017-10-15 13:06                   ` Theodore Ts'o
2017-10-15 22:14                   ` Eric W. Biederman
2017-10-15 22:14                     ` Eric W. Biederman
2017-10-15 23:22                     ` Dave Chinner
2017-10-15 23:22                       ` Dave Chinner
2017-10-16 17:44                       ` Eric W. Biederman
2017-10-16 17:44                         ` Eric W. Biederman
2017-10-16 21:38                         ` Dave Chinner
2017-10-16 21:38                           ` Dave Chinner
2017-10-16  1:13                     ` Theodore Ts'o
2017-10-16  1:13                       ` Theodore Ts'o
2017-10-16 17:53                       ` Eric W. Biederman
2017-10-16 17:53                         ` Eric W. Biederman
2017-10-16 18:50                         ` Theodore Ts'o
2017-10-16 18:50                           ` Theodore Ts'o
2017-10-16 22:00                       ` Dave Chinner
2017-10-16 22:00                         ` Dave Chinner
2017-10-17  1:34                         ` Theodore Ts'o
2017-10-17  1:34                           ` Theodore Ts'o
2017-10-17  0:59                       ` Aleksa Sarai
2017-10-17  0:59                         ` Aleksa Sarai
2017-10-17  9:20                         ` Jan Kara
2017-10-17  9:20                           ` Jan Kara
2017-10-17 14:12                           ` Theodore Ts'o
2017-10-17 14:12                             ` Theodore Ts'o
2017-11-06 19:25                             ` Luis R. Rodriguez
2017-11-06 19:25                               ` Luis R. Rodriguez
2017-11-07 15:26                               ` Jan Kara
2017-11-07 15:26                                 ` Jan Kara
2017-10-09 22:28             ` Dave Chinner
2017-10-09 22:28               ` Dave Chinner
2017-10-10  7:57               ` Jan Kara
2017-10-10  7:57                 ` Jan Kara
2017-09-04  1:43   ` Dave Chinner [this message]
2017-09-04  1:43     ` Dave Chinner
2017-09-04  2:20     ` Darrick J. Wong
2017-09-04  2:20       ` Darrick J. Wong
2017-09-04 12:14       ` Jan Kara
2017-09-04 12:14         ` Jan Kara
2017-09-04 22:36         ` Dave Chinner
2017-09-04 22:36           ` Dave Chinner
2017-09-05 16:17           ` Jan Kara
2017-09-05 16:17             ` Jan Kara
2017-09-05 23:42             ` Dave Chinner
2017-09-05 23:42               ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170904014353.GG10621@dastard \
    --to=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mikhail.v.gavrilov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.