linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jerome Glisse <jglisse@redhat.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [LSF/MM TOPIC] Direct block mapping through fs for device
Date: Fri, 26 Apr 2019 16:28:16 +1000	[thread overview]
Message-ID: <20190426062816.GG1454@dread.disaster.area> (raw)
In-Reply-To: <20190426013814.GB3350@redhat.com>

On Thu, Apr 25, 2019 at 09:38:14PM -0400, Jerome Glisse wrote:
> I see that they are still empty spot in LSF/MM schedule so i would like to
> have a discussion on allowing direct block mapping of file for devices (nic,
> gpu, fpga, ...). This is mm, fs and block discussion, thought the mm side
> is pretty light ie only adding 2 callback to vm_operations_struct:

The filesystem already has infrastructure for the bits it needs to
provide. They are called file layout leases (how many times do I
have to keep telling people this!), and what you do with the lease
for the LBA range the filesystem maps for you is then something you
can negotiate with the underlying block device.

i.e. go look at how xfs_pnfs.c works to hand out block mappings to
remote pNFS clients so they can directly access the underlying
storage. Basically, anyone wanting to map blocks needs a file layout
lease and then to manage the filesystem state over that range via
these methods in the struct export_operations:

        int (*get_uuid)(struct super_block *sb, u8 *buf, u32 *len, u64 *offset);
        int (*map_blocks)(struct inode *inode, loff_t offset,
                          u64 len, struct iomap *iomap,
                          bool write, u32 *device_generation);
        int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
                             int nr_iomaps, struct iattr *iattr);

Basically, before you read/write data, you map the blocks. if you've
written data, then you need to commit the blocks (i.e. tell the fs
they've been written to).

The iomap will give you a contiguous LBA range and the block device
they belong to, and you can then use that to whatever smart DMA stuff
you need to do through the block device directly.

If the filesystem wants the space back (e.g. because truncate) then
the lease will be revoked. The client then must finish off it's
outstanding operations, commit them and release the lease. To access
the file range again, it must renew the lease and remap the file
through ->map_blocks....

> So i would like to gather people feedback on general approach and few things
> like:
>     - Do block device need to be able to invalidate such mapping too ?
> 
>       It is easy for fs the to invalidate as it can walk file mappings
>       but block device do not know about file.

If you are needing the block device to invalidate filesystem level
information, then your model is all wrong.

>     - Do we want to provide some generic implementation to share accross
>       fs ?

We already have a generic interface, filesystems other than XFS will
need to implement them.

>     - Maybe some share helpers for block devices that could track file
>       corresponding to peer mapping ?

If the application hasn't supplied the peer with the file it needs
to access, get a lease from and then map an LBA range out of, then
you are doing it all wrong.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-04-26  6:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-26  1:38 [LSF/MM TOPIC] Direct block mapping through fs for device Jerome Glisse
2019-04-26  6:28 ` Dave Chinner [this message]
2019-04-26 12:45   ` Christoph Hellwig
2019-04-26 14:45     ` Darrick J. Wong
2019-04-26 14:47       ` Christoph Hellwig
2019-04-26 15:20   ` Jerome Glisse
2019-04-27  1:25     ` Dave Chinner
2019-04-29 13:26       ` Jerome Glisse
2019-05-01 23:47         ` Dave Chinner
2019-05-02  1:52         ` Matthew Wilcox
2019-04-26 20:28 ` Adam Manzanares

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190426062816.GG1454@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=jglisse@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).