linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>,
	linux-fsdevel@vger.kernel.org,  linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 0/2] Use multi-index entries in the page cache
Date: Mon, 6 Jul 2020 20:21:54 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.2007061946180.2346@eggly.anvils> (raw)
In-Reply-To: <20200706144320.GB25523@casper.infradead.org>

On Mon, 6 Jul 2020, Matthew Wilcox wrote:
> On Sat, Jul 04, 2020 at 01:20:19PM -0700, Hugh Dickins wrote:
> 
> > The original non-THP machine ran the same load for
> > ten hours yesterday, but hit no problem. The only significant
> > difference in what ran successfully, is that I've been surprised
> > by all the non-zero entries I saw in xarray nodes, exceeding
> > total entry "count" (I've also been bothered by non-zero "offset"
> > at root, but imagine that's just noise that never gets used).
> > So I've changed the kmem_cache_alloc()s in lib/radix-tree.c to
> > kmem_cache_zalloc()s, as in the diff below: not suggesting that
> > as necessary, just a temporary precaution in case something is
> > not being initialized as intended.
> 
> Umm.  ->count should always be accurate and match the number of non-NULL
> entries in a node.  the zalloc shouldn't be necessary, and will probably
> break the workingset code.  Actually, it should BUG because we have both
> a constructor and an instruction to zero the allocation, and they can't
> both be right.

Those kmem_cache_zalloc()s did not cause BUGs because I did them in
lib/radix-tree.c: it occurred to me yesterday morning that that was
an odd choice to have made! so I then extended them to lib/xarray.c,
immediately hit the BUGs you point out, so also added INIT_LIST_HEADs.

And very interestingly, the little huge tmpfs xfstests script I gave
last time then usually succeeds without any crash; but not always.

To me that suggests that there's somewhere you omit to clear a slot;
and that usually those xfstests don't encounter that stale slot again,
before the node is recycled for use elsewhere (with the zalloc making
it right again); but occasionally the tests do hit the bogus entry
before the node is recycled.

Before the zallocs, I had noticed a node with count 4, but filled
with non-zero entries (most of them looking like huge head pointers).

> 
> You're right that ->offset is never used at root.  I had plans to
> repurpose that to support smaller files more efficiently, but never
> got round to implementing those plans.
> 
> > These problems were either mm/filemap.c:1565 find_lock_entry()
> > VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page); or hangs, which
> > (at least the ones that I went on to investigate) turned out also to be
> > find_lock_entry(), circling around with page_mapping(page) != mapping.
> > It seems that find_get_entry() is sometimes supplying the wrong page,
> > and you will work out why much quicker than I shall.  (One tantalizing
> > detail of the bad offset crashes: very often page pgoff is exactly one
> > less than the requested offset.)
> 
> I added this:
> 
> @@ -1535,6 +1535,11 @@ struct page *find_get_entry(struct address_space *mapping, pgoff_t offset)
>                 goto repeat;
>         }
>         page = find_subpage(page, offset);
> +       if (page_to_index(page) != offset) {
> +               printk("offset %ld xas index %ld offset %d\n", offset, xas.xa_index, xas.xa_offset);

(It's much easier to spot huge page issues with %x than with %d.)

> +               dump_page(page, "index mismatch");
> +               printk("xa_load %p\n", xa_load(&mapping->i_pages, offset));
> +       }
>  out:
>         rcu_read_unlock();
>  
> and I have a good clue now:
> 
> 1322 offset 631 xas index 631 offset 48
> 1322 page:000000008c9a9bc3 refcount:4 mapcount:0 mapping:00000000d8615d47 index:0x276

(You appear to have more patience with all this ghastly hashing of
kernel addresses in debug messages than I have: it really gets in our way.)

> 1322 flags: 0x4000000000002026(referenced|uptodate|active|private)
> 1322 mapping->aops:0xffffffff88a2ebc0 ino 1800b82 dentry name:"f1141"
> 1322 raw: 4000000000002026 dead000000000100 dead000000000122 ffff98ff2a8b8a20
> 1322 raw: 0000000000000276 ffff98ff1ac271a0 00000004ffffffff 0000000000000000
> 1322 page dumped because: index mismatch
> 1322 xa_load 000000008c9a9bc3
> 
> 0x276 is decimal 630.  So we're looking up a tail page and getting its

Index 630 for offset 631, yes, that's exactly the kind of off-by-one
I saw so frequently.

> erstwhile head page.  I'll dig in and figure out exactly how that's
> happening.

Very pleasing to see the word "erstwhile" in use (and I'm serious about
that); but I don't get your head for tail point - I can't make heads or
tails of what you're saying there :) Neither 631 nor 630 is close to 512.

Certainly it's finding a bogus page, and that's related to the multi-order
splitting (I saw no problem with your 1-7 series), I think it's from a
stale entry being left behind. (But that does not at all explain the
off-by-one pattern.)

Hugh


  parent reply	other threads:[~2020-07-07  3:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-29 15:20 [PATCH 0/2] Use multi-index entries in the page cache Matthew Wilcox (Oracle)
2020-06-29 15:20 ` [PATCH 1/2] XArray: Add xas_split Matthew Wilcox (Oracle)
2020-06-29 17:18   ` William Kucharski
2020-06-29 15:20 ` [PATCH 2/2] mm: Use multi-index entries in the page cache Matthew Wilcox (Oracle)
2020-06-30  3:16 ` [PATCH 0/2] " William Kucharski
2020-07-04 20:20 ` Hugh Dickins
2020-07-06 14:43   ` Matthew Wilcox
2020-07-06 18:50     ` Matthew Wilcox
2020-07-07  3:43       ` Hugh Dickins
2020-07-07  3:21     ` Hugh Dickins [this message]
2020-07-07  3:49       ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2007061946180.2346@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).