io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Keith Busch <kbusch@fb.com>,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	io-uring@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	axboe@kernel.dk, hch@lst.de
Subject: Re: [PATCH 4/5] io_uring: add support for dma pre-mapping
Date: Wed, 27 Jul 2022 17:00:09 -0600	[thread overview]
Message-ID: <YuHDeRImQPuuV2Mr@kbusch-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20220727223232.GV3600936@dread.disaster.area>

On Thu, Jul 28, 2022 at 08:32:32AM +1000, Dave Chinner wrote:
> On Wed, Jul 27, 2022 at 09:04:25AM -0600, Keith Busch wrote:
> > On Wed, Jul 27, 2022 at 03:04:56PM +0100, Al Viro wrote:
> > > On Wed, Jul 27, 2022 at 07:58:29AM -0600, Keith Busch wrote:
> > > > On Wed, Jul 27, 2022 at 12:12:53AM +0100, Al Viro wrote:
> > > > > On Tue, Jul 26, 2022 at 10:38:13AM -0700, Keith Busch wrote:
> > > > > 
> > > > > > +	if (S_ISBLK(file_inode(file)->i_mode))
> > > > > > +		bdev = I_BDEV(file->f_mapping->host);
> > > > > > +	else if (S_ISREG(file_inode(file)->i_mode))
> > > > > > +		bdev = file->f_inode->i_sb->s_bdev;
> > > > > 
> > > > > *blink*
> > > > > 
> > > > > Just what's the intended use of the second case here?
> > > > 
> > > > ??
> > > > 
> > > > The use case is same as the first's: dma map the user addresses to the backing
> > > > storage. There's two cases here because getting the block_device for a regular
> > > > filesystem file is different than a raw block device.
> > > 
> > > Excuse me, but "file on some filesystem + block number on underlying device"
> > > makes no sense as an API...
> > 
> > Sorry if I'm misunderstanding your concern here.
> > 
> > The API is a file descriptor + index range of registered buffers (which is a
> > pre-existing io_uring API). The file descriptor can come from opening either a
> > raw block device (ex: /dev/nvme0n1), or any regular file on a mounted
> > filesystem using nvme as a backing store.
> 
> That's fundamentally flawed. Filesystems can have multiple block
> devices backing them that the VFS doesn't actually know about (e.g.
> btrfs, XFS, etc). Further, some of these filesystems can spread
> indiivdual file data across mutliple block devices i.e. the backing
> bdev changes as file offset changes....
> 
> Filesystems might not even have a block device (NFS, CIFS, etc) -
> what happens if you call this function on a file belonging to such a
> filesystem?

The block_device driver has to opt-in to this feature. If a multi-device block
driver wants to opt-in to this, then it would be responsible to handle
translating that driver's specific cookie to whatever representation the
drivers it stacks atop require. Otherwise, the cookie threaded through the bio
is an opque value: nothing between io_uring and the block_device driver need to
decode it.

If the block_device doesn't support providing this cookie, then io_uring just
falls back to the existing less optimal methond, and all will continue to work
as it does today.

  reply	other threads:[~2022-07-27 23:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-26 17:38 [PATCH 0/5] dma mapping optimisations Keith Busch
2022-07-26 17:38 ` [PATCH 1/5] blk-mq: add ops to dma map bvec Keith Busch
2022-07-26 17:38 ` [PATCH 2/5] iov_iter: introduce type for preregistered dma tags Keith Busch
2022-07-26 23:10   ` Al Viro
2022-07-27 13:52     ` Keith Busch
2022-07-26 17:38 ` [PATCH 3/5] block: add dma tag bio type Keith Busch
2022-07-26 17:38 ` [PATCH 4/5] io_uring: add support for dma pre-mapping Keith Busch
2022-07-26 23:12   ` Al Viro
2022-07-27 13:58     ` Keith Busch
2022-07-27 14:04       ` Al Viro
2022-07-27 15:04         ` Keith Busch
2022-07-27 22:32           ` Dave Chinner
2022-07-27 23:00             ` Keith Busch [this message]
2022-07-28  2:35               ` Dave Chinner
2022-07-28 13:25                 ` Keith Busch
2022-07-27 14:11   ` Al Viro
2022-07-27 14:48     ` Keith Busch
2022-07-27 15:26       ` Al Viro
2022-07-26 17:38 ` [PATCH 5/5] nvme-pci: implement dma_map support Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YuHDeRImQPuuV2Mr@kbusch-mbp.dhcp.thefacebook.com \
    --to=kbusch@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=io-uring@vger.kernel.org \
    --cc=kbusch@fb.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).