From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>,
Pasha Tatashin <tatashin@google.com>,
Sourav Panda <souravpanda@google.com>,
lsf-pc@lists.linux-foundation.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-block@vger.kernel.org, linux-ide@vger.kernel.org,
linux-scsi@vger.kernel.org, linux-nvme@lists.infradead.org,
bpf@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] State Of The Page
Date: Sun, 21 Jan 2024 19:18:09 -0500 [thread overview]
Message-ID: <CA+CK2bC8-f2hWqnK4feRYBtuwqjdRoN8=sdaipJOiHFSNos=mg@mail.gmail.com> (raw)
In-Reply-To: <Za2uq2L7_IU8RQWU@casper.infradead.org>
On Sun, Jan 21, 2024 at 6:54 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sun, Jan 21, 2024 at 06:31:48PM -0500, Pasha Tatashin wrote:
> > On Sun, Jan 21, 2024 at 6:14 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > I can add a proposal for a topic on both the PCP and Buddy allocators
> > > (I have a series of Thoughts on how the PCP allocator works in a memdesc
> > > world that I haven't written down & sent out yet).
> >
> > Interesting, given that pcp are mostly allocated by kmalloc and use
> > vmalloc for large allocations, how memdesc can be different for them
> > compared to regular kmalloc allocations given that they are sub-page?
>
> Oh! I don't mean the mm/percpu.c allocator. I mean the pcp allocator
> in mm/page_alloc.c.
Nevermind, this makes perfect sense now :-)
> I don't have any Thoughts on mm/percpu.c at this time. I'm vaguely
> aware that it exists ;-)
>
> > > Thee's so much work to be done! And it's mostly parallelisable and almost
> > > trivial. It's just largely on the filesystem-page cache interaction, so
> > > it's not terribly interesting. See, for example, the ext2, ext4, gfs2,
> > > nilfs2, ufs and ubifs patchsets I've done over the past few releases.
> > > I have about half of an ntfs3 patchset ready to send.
> >
> > > There's a bunch of work to be done in DRM to switch from pages to folios
> > > due to their use of shmem. You can also grep for 'page->mapping' (because
> > > fortunately we aren't too imaginative when it comes to naming variables)
> > > and find 270 places that need to be changed. Some are comments, but
> > > those still need to be updated!
> > >
> > > Anything using lock_page(), get_page(), set_page_dirty(), using
> > > &folio->page, any of the functions in mm/folio-compat.c needs auditing.
> > > We can make the first three of those work, but they're good indicators
> > > that the code needs to be looked at.
> > >
> > > There is some interesting work to be done, and one of the things I'm
> > > thinking hard about right now is how we're doing folio conversions
> > > that make sense with today's code, and stop making sense when we get
> > > to memdescs. That doesn't apply to anything interacting with the page
> > > cache (because those are folios now and in the future), but it does apply
> > > to one spot in ext4 where it allocates memory from slab and attaches a
> > > buffer_head to it ...
> >
> > There are many more drivers that would need the conversion. For
> > example, IOMMU page tables can occupy gigabytes of space, have
> > different implementations for AMD, X86, and several ARMs. Conversion
> > to memdesc and unifying the IO page table management implementation
> > for these platforms would be beneficial.
>
> Understood; there's a lot of code that can benefit from larger
> allocations. I was listing the impediments to shrinking struct page
> rather than the places which would most benefit from switching to larger
> allocations. They're complementary to a large extent; you can switch
> to compound allocations today and get the benefit later. And unifying
> implementations is always a worthy project.
next prev parent reply other threads:[~2024-01-22 0:18 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-19 16:24 [LSF/MM/BPF TOPIC] State Of The Page Matthew Wilcox
2024-01-19 20:31 ` Keith Busch
2024-01-20 14:11 ` Chuck Lever III
2024-01-21 21:00 ` David Rientjes
2024-01-21 23:14 ` Matthew Wilcox
2024-01-21 23:31 ` Pasha Tatashin
2024-01-21 23:54 ` Matthew Wilcox
2024-01-22 0:18 ` Pasha Tatashin [this message]
2024-01-24 17:51 ` Christoph Lameter (Ampere)
2024-01-24 17:55 ` Matthew Wilcox
2024-01-24 19:05 ` Christoph Lameter (Ampere)
2024-01-27 10:10 ` Amir Goldstein
2024-01-27 16:18 ` Matthew Wilcox
2024-01-27 17:57 ` Kent Overstreet
2024-01-27 18:43 ` Matthew Wilcox
-- strict thread matches above, loose matches on Subject: below --
2023-01-26 16:40 Matthew Wilcox
2023-02-21 16:57 ` David Howells
2023-02-21 18:08 ` Gao Xiang
2023-02-21 19:09 ` Yang Shi
2023-02-22 2:40 ` Gao Xiang
2023-02-21 19:58 ` Matthew Wilcox
2023-02-22 2:38 ` Gao Xiang
2023-03-02 3:17 ` David Rientjes
2023-03-02 3:50 ` Pasha Tatashin
2023-03-02 4:03 ` Matthew Wilcox
2023-03-02 4:16 ` Pasha Tatashin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+CK2bC8-f2hWqnK4feRYBtuwqjdRoN8=sdaipJOiHFSNos=mg@mail.gmail.com' \
--to=pasha.tatashin@soleen.com \
--cc=bpf@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=rientjes@google.com \
--cc=souravpanda@google.com \
--cc=tatashin@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).