Re: [LSF/MM TOPIC] Direct block mapping through fs for device

From: Dave Chinner <david@fromorbit.com>
To: Jerome Glisse <jglisse@redhat.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [LSF/MM TOPIC] Direct block mapping through fs for device
Date: Fri, 26 Apr 2019 16:28:16 +1000	[thread overview]
Message-ID: <20190426062816.GG1454@dread.disaster.area> (raw)
In-Reply-To: <20190426013814.GB3350@redhat.com>

On Thu, Apr 25, 2019 at 09:38:14PM -0400, Jerome Glisse wrote:
> I see that they are still empty spot in LSF/MM schedule so i would like to
> have a discussion on allowing direct block mapping of file for devices (nic,
> gpu, fpga, ...). This is mm, fs and block discussion, thought the mm side
> is pretty light ie only adding 2 callback to vm_operations_struct:

The filesystem already has infrastructure for the bits it needs to
provide. They are called file layout leases (how many times do I
have to keep telling people this!), and what you do with the lease
for the LBA range the filesystem maps for you is then something you
can negotiate with the underlying block device.

i.e. go look at how xfs_pnfs.c works to hand out block mappings to
remote pNFS clients so they can directly access the underlying
storage. Basically, anyone wanting to map blocks needs a file layout
lease and then to manage the filesystem state over that range via
these methods in the struct export_operations:

        int (*get_uuid)(struct super_block *sb, u8 *buf, u32 *len, u64 *offset);
        int (*map_blocks)(struct inode *inode, loff_t offset,
                          u64 len, struct iomap *iomap,
                          bool write, u32 *device_generation);
        int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
                             int nr_iomaps, struct iattr *iattr);

Basically, before you read/write data, you map the blocks. if you've
written data, then you need to commit the blocks (i.e. tell the fs
they've been written to).

The iomap will give you a contiguous LBA range and the block device
they belong to, and you can then use that to whatever smart DMA stuff
you need to do through the block device directly.

If the filesystem wants the space back (e.g. because truncate) then
the lease will be revoked. The client then must finish off it's
outstanding operations, commit them and release the lease. To access
the file range again, it must renew the lease and remap the file
through ->map_blocks....

> So i would like to gather people feedback on general approach and few things
> like:
>     - Do block device need to be able to invalidate such mapping too ?
> 
>       It is easy for fs the to invalidate as it can walk file mappings
>       but block device do not know about file.

If you are needing the block device to invalidate filesystem level
information, then your model is all wrong.

>     - Do we want to provide some generic implementation to share accross
>       fs ?

We already have a generic interface, filesystems other than XFS will
need to implement them.

>     - Maybe some share helpers for block devices that could track file
>       corresponding to peer mapping ?

If the application hasn't supplied the peer with the file it needs
to access, get a lease from and then map an LBA range out of, then
you are doing it all wrong.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com