linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: linux-mm@kvack.org
Cc: jhubbard@nvidia.com, rcampbell@nvidia.com, willy@infradead.org,
	jgg@nvidia.com, dan.j.williams@intel.com, david@fromorbit.com,
	linux-fsdevel@vger.kernel.org, jack@suse.cz, djwong@kernel.org,
	hch@lst.de, david@redhat.com
Subject: ZONE_DEVICE refcounting
Date: Fri, 08 Mar 2024 15:24:35 +1100	[thread overview]
Message-ID: <87ttlhmj9p.fsf@nvdebian.thelocal> (raw)

Hi,

I have been looking at fixing up ZONE_DEVICE refcounting again. Specifically I
have been looking  at fixing the 1-based refcounts that are currently used for
FS DAX pages (and p2pdma pages, but that's trival).

This started with the simple idea of "just subtract one from the
refcounts everywhere and that will fix the off by one". Unfortunately
it's not that simple. For starters doing a simple conversion like that
requires allowing pages to be mapped with zero refcounts. That seems
wrong. It also leads to problems detecting idle IO vs. page map pages.

So instead I'm thinking of doing something along the lines of the following:

1. Refcount FS DAX pages normally. Ie. map them with vm_insert_page() and
   increment the refcount inline with mapcount and decrement it when pages are
   unmapped.

2. As per normal pages the pages are considered free when the refcount drops
   to zero.

3. Because these are treated as normal pages for refcounting we no longer map
   them as pte_devmap() (possibly freeing up a PTE bit).

4. PMD sized FS DAX pages get treated the same as normal compound pages.

5. This means we need to allow compound ZONE DEVICE pages. Tail pages share
   the page->pgmap field with page->compound_head, but this isn't a problem
   because the LSB of page->pgmap is free and we can still get pgmap from
   compound_head(page)->pgmap.

6. When FS DAX pages are freed they notify filesystem drivers. This can be done
   from the pgmap->ops->page_free() callback.

7. We could probably get rid of the pgmap refcounting because we can just scan
   pages and look for any pages with non-zero references and wait for them to be
   freed whilst ensuring no new mappings can be created (some drivers do a
   similar thing for private pages today). This might be a follow-up change.

I have made good progress implementing the above, and am reasonably confident I
can make it work (I have some tests that exercise these code paths working).

However my knowledge of the filesystem layer is a bit thin, so before going too
much further down this path I was hoping to get some feedback on the overall
direction to see if there are any corner cases or other potential problems I
have missed that may prevent the above being practical.

If not I will clean my series up and post it as an RFC. Thanks.

 - Alistair


             reply	other threads:[~2024-03-08  4:37 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-08  4:24 Alistair Popple [this message]
2024-03-08 13:44 ` ZONE_DEVICE refcounting Jason Gunthorpe
2024-03-13  6:32 ` Dan Williams
2024-03-20  5:20   ` Alistair Popple
2024-03-21  5:26     ` Alistair Popple
2024-03-21  6:03       ` Dan Williams
2024-03-22  0:01         ` Alistair Popple
2024-03-22  3:18           ` Dave Chinner
2024-03-22  5:34             ` Alistair Popple
2024-03-22  6:58               ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ttlhmj9p.fsf@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rcampbell@nvidia.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).