linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-block@vger.kernel.org,
	linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-nvme@lists.infradead.org, bpf@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] State Of The Page
Date: Sat, 27 Jan 2024 18:43:37 +0000	[thread overview]
Message-ID: <ZbVO2RKhw-dLUMvf@casper.infradead.org> (raw)
In-Reply-To: <yrswihigbp46vlyxqvi3io5pfngcivfwfb3gdlnjs6tzntldbx@mbnrycaujxb3>

On Sat, Jan 27, 2024 at 12:57:45PM -0500, Kent Overstreet wrote:
> On Fri, Jan 19, 2024 at 04:24:29PM +0000, Matthew Wilcox wrote:
> >  - What are we going to do about bio_vecs?
> 
> For bios and biovecs, I think it's important to keep in mind the
> distinction between the code that owns and submits the bio, and the
> consumer underneath.
> 
> The code underneath could just as easily work with pfns, and the code
> above got those pages from somewhere else, so it doesn't _need_ the bio
> for access to those pages/folios (it would be a lot of refactoring
> though).
> 
> But I've been thinking about going in a different direction - what if we
> unified iov_iter and bio? We've got ~3 different scatter-gather types
> that an IO passes through down the stack, and it would be lovely if we
> could get it down to just one; e.g. for DIO, pinning pages right at the
> copy_from_user boundary.

Yes, but ...

One of the things that Xen can do and Linux can't is I/O to/from memory
that doesn't have an associated struct page.  We have all kinds of hacks
in place to get around that right now, and I'd like to remove those.

Since we want that kind of memory (lets take, eg, GPU memory as an
example) to be mappable to userspace, and we want to be able to do DIO
to that memory, that points us to using a non-page-based structure right
from the start.  Yes, if it happens to be backed by pages we need to 'pin'
them in some way (I'd like to get away from per-page or even per-folio
pinning, but we'll see about that), but the data structure that we use
to represent that memory as it moves through the I/O subsystem needs to
be physical address based.

So my 40,000 foot view is that we do something like get_user_phyrs()
at the start of DIO, pas the phyr to the filesystem; the filesystem then
passes one or more phyrs to the block layer, the block layer gives the
phyrs to the driver which DMA maps the phyr.

Yes, the IO completion path (for buffered IO) needs to figure out which
folios are decsribed by this phyr, but that's a phys_to_folio() call away.

  reply	other threads:[~2024-01-27 18:43 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-19 16:24 [LSF/MM/BPF TOPIC] State Of The Page Matthew Wilcox
2024-01-19 20:31 ` Keith Busch
2024-01-20 14:11 ` Chuck Lever III
2024-01-21 21:00 ` David Rientjes
2024-01-21 23:14   ` Matthew Wilcox
2024-01-21 23:31     ` Pasha Tatashin
2024-01-21 23:54       ` Matthew Wilcox
2024-01-22  0:18         ` Pasha Tatashin
2024-01-24 17:51     ` Christoph Lameter (Ampere)
2024-01-24 17:55       ` Matthew Wilcox
2024-01-24 19:05         ` Christoph Lameter (Ampere)
2024-01-27 10:10 ` Amir Goldstein
2024-01-27 16:18   ` Matthew Wilcox
2024-01-27 17:57 ` Kent Overstreet
2024-01-27 18:43   ` Matthew Wilcox [this message]
  -- strict thread matches above, loose matches on Subject: below --
2023-01-26 16:40 Matthew Wilcox
2023-02-21 16:57 ` David Howells
2023-02-21 18:08 ` Gao Xiang
2023-02-21 19:09   ` Yang Shi
2023-02-22  2:40     ` Gao Xiang
2023-02-21 19:58   ` Matthew Wilcox
2023-02-22  2:38     ` Gao Xiang
2023-03-02  3:17     ` David Rientjes
2023-03-02  3:50     ` Pasha Tatashin
2023-03-02  4:03       ` Matthew Wilcox
2023-03-02  4:16         ` Pasha Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZbVO2RKhw-dLUMvf@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=bpf@vger.kernel.org \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).