Linux-Fsdevel Archive on lore.kernel.org
 help / Atom feed
* Extending FIEMAP ioctl to report device id
@ 2019-02-11  9:43 Carlos Maiolino
  2019-02-11 11:29 ` Nikolay Borisov
  2019-02-11 15:23 ` Matthew Wilcox
  0 siblings, 2 replies; 6+ messages in thread
From: Carlos Maiolino @ 2019-02-11  9:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: hch, adilger, darrick.wong

Hi.

A discussion has been started on another thread [1], with the idea of extending
FIEMAP ioctl interface, to also report the device id where the extents being
reported are physically located. I've started to work on the extension, but,
before I spend time implementing it, I'd rather start a discussion to ensure
it's really feasible or just a waste of time in pursuing it.

The whole context, can be found in the thread [1], more specifically in the
discussion started on patch 9, here [2].

About the proposal:

- The general idea, is to provide a way for FIEMAP ioctls to return the device
  id where each extent is physically located.
- This is particularly useful for those filesystems where the file extents are
  located on a different block device other than that associated with the
  superblock , for example, btrfs using multiple devices, and XFS when using a
  real-time device.

Achieving this is relatively easy, using one of the __u32 fe_reserved fields in
struct fiemap_extent, to create a new field (__u32 fe_device), which can be used
for two purposes, based on two new FIEMAP_EXTENT_ flags : 

- FIEMAP_EXTENT_DEVICE: which will indicate the fiemap_extent.fe_device contains
  the major/minor numbers of the block device where the specific extent is
  located

- FIEMAP_EXTENT_COOKIE (of _EXTENT_PRIVATE), which indicates the
  fiemap_extent.fe_device will contain a special meaning depending on the fs.
  Such flag sounded interesting for distributed filesystems, which could use
  this field for example, to specify each node of the cluster (or whatever other
  name is defined by the specific fs) that specific extent is located.


As mentioned before, implementing it, looks not that difficult, considering such
reserved fields are not to be touched by userspace, and using one of the new
fields won't break any current userspace application which doesn't understand
the new data.
But still, things which are worth to discuss is if such information (the
physical location of the extents) is something that should be exported to
userspace or not.

Any comments if this is something worth to implement or not, are welcome.

Cheers

[1] https://www.spinics.net/lists/linux-fsdevel/msg136559.html
[2] https://www.spinics.net/lists/linux-fsdevel/msg136568.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Extending FIEMAP ioctl to report device id
  2019-02-11  9:43 Extending FIEMAP ioctl to report device id Carlos Maiolino
@ 2019-02-11 11:29 ` Nikolay Borisov
  2019-02-11 14:56   ` Carlos Maiolino
  2019-02-11 15:23 ` Matthew Wilcox
  1 sibling, 1 reply; 6+ messages in thread
From: Nikolay Borisov @ 2019-02-11 11:29 UTC (permalink / raw)
  To: Carlos Maiolino, linux-fsdevel; +Cc: hch, adilger, darrick.wong



On 11.02.19 г. 11:43 ч., Carlos Maiolino wrote:
> Hi.
> 
> A discussion has been started on another thread [1], with the idea of extending
> FIEMAP ioctl interface, to also report the device id where the extents being
> reported are physically located. I've started to work on the extension, but,
> before I spend time implementing it, I'd rather start a discussion to ensure
> it's really feasible or just a waste of time in pursuing it.
> 
> The whole context, can be found in the thread [1], more specifically in the
> discussion started on patch 9, here [2].
> 
> About the proposal:
> 
> - The general idea, is to provide a way for FIEMAP ioctls to return the device
>   id where each extent is physically located.
> - This is particularly useful for those filesystems where the file extents are
>   located on a different block device other than that associated with the
>   superblock , for example, btrfs using multiple devices, and XFS when using a
>   real-time device.
> 
> Achieving this is relatively easy, using one of the __u32 fe_reserved fields in
> struct fiemap_extent, to create a new field (__u32 fe_device), which can be used
> for two purposes, based on two new FIEMAP_EXTENT_ flags : 
> 
> - FIEMAP_EXTENT_DEVICE: which will indicate the fiemap_extent.fe_device contains
>   the major/minor numbers of the block device where the specific extent is
>   located
> 
> - FIEMAP_EXTENT_COOKIE (of _EXTENT_PRIVATE), which indicates the
>   fiemap_extent.fe_device will contain a special meaning depending on the fs.
>   Such flag sounded interesting for distributed filesystems, which could use
>   this field for example, to specify each node of the cluster (or whatever other
>   name is defined by the specific fs) that specific extent is located.

Who decides which flag is set? Do you intend for the default behavior to
be FIEMAP_EXTENT_DEVICE which could be overridden by
FIEMAP_EXTENT_COOKIE? IMHO a more becoming name could be
FIEMAP_EXTENT_DEV_PRIVATE or PRIVATE_DEV.



> 
> 
> As mentioned before, implementing it, looks not that difficult, considering such
> reserved fields are not to be touched by userspace, and using one of the new
> fields won't break any current userspace application which doesn't understand
> the new data.
> But still, things which are worth to discuss is if such information (the
> physical location of the extents) is something that should be exported to
> userspace or not.
> 
> Any comments if this is something worth to implement or not, are welcome.
> 
> Cheers
> 
> [1] https://www.spinics.net/lists/linux-fsdevel/msg136559.html
> [2] https://www.spinics.net/lists/linux-fsdevel/msg136568.html
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Extending FIEMAP ioctl to report device id
  2019-02-11 11:29 ` Nikolay Borisov
@ 2019-02-11 14:56   ` Carlos Maiolino
  0 siblings, 0 replies; 6+ messages in thread
From: Carlos Maiolino @ 2019-02-11 14:56 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-fsdevel, hch, adilger, darrick.wong

Hi,

On Mon, Feb 11, 2019 at 01:29:57PM +0200, Nikolay Borisov wrote:
> 
> 
> On 11.02.19 г. 11:43 ч., Carlos Maiolino wrote:
> > Hi.
> > 
> > A discussion has been started on another thread [1], with the idea of extending
> > FIEMAP ioctl interface, to also report the device id where the extents being
> > reported are physically located. I've started to work on the extension, but,
> > before I spend time implementing it, I'd rather start a discussion to ensure
> > it's really feasible or just a waste of time in pursuing it.
> > 
> > The whole context, can be found in the thread [1], more specifically in the
> > discussion started on patch 9, here [2].
> > 
> > About the proposal:
> > 
> > - The general idea, is to provide a way for FIEMAP ioctls to return the device
> >   id where each extent is physically located.
> > - This is particularly useful for those filesystems where the file extents are
> >   located on a different block device other than that associated with the
> >   superblock , for example, btrfs using multiple devices, and XFS when using a
> >   real-time device.
> > 
> > Achieving this is relatively easy, using one of the __u32 fe_reserved fields in
> > struct fiemap_extent, to create a new field (__u32 fe_device), which can be used
> > for two purposes, based on two new FIEMAP_EXTENT_ flags : 
> > 
> > - FIEMAP_EXTENT_DEVICE: which will indicate the fiemap_extent.fe_device contains
> >   the major/minor numbers of the block device where the specific extent is
> >   located
> > 
> > - FIEMAP_EXTENT_COOKIE (of _EXTENT_PRIVATE), which indicates the
> >   fiemap_extent.fe_device will contain a special meaning depending on the fs.
> >   Such flag sounded interesting for distributed filesystems, which could use
> >   this field for example, to specify each node of the cluster (or whatever other
> >   name is defined by the specific fs) that specific extent is located.
> 
> Who decides which flag is set? Do you intend for the default behavior to
> be FIEMAP_EXTENT_DEVICE which could be overridden by
> FIEMAP_EXTENT_COOKIE? IMHO a more becoming name could be
> FIEMAP_EXTENT_DEV_PRIVATE or PRIVATE_DEV.
> 

The idea is:

- If none of the flags are set, the fe_device field is ignored by FIEMAP
  infrastructure.
- If FIEMAP_EXTENT_DEVICE is set, and _COOKIE (or _PRIVATE) IS NOT, then, the
  fe_device is the major/minor dev_t of the block device holding the extent.
- If both flags ARE SET, the fe_device holds a value that only the specific
  filesystem (and maybe its users) can properly interpret.


> 
> 
> > 
> > 
> > As mentioned before, implementing it, looks not that difficult, considering such
> > reserved fields are not to be touched by userspace, and using one of the new
> > fields won't break any current userspace application which doesn't understand
> > the new data.
> > But still, things which are worth to discuss is if such information (the
> > physical location of the extents) is something that should be exported to
> > userspace or not.
> > 
> > Any comments if this is something worth to implement or not, are welcome.
> > 
> > Cheers
> > 
> > [1] https://www.spinics.net/lists/linux-fsdevel/msg136559.html
> > [2] https://www.spinics.net/lists/linux-fsdevel/msg136568.html
> > 

-- 
Carlos

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Extending FIEMAP ioctl to report device id
  2019-02-11  9:43 Extending FIEMAP ioctl to report device id Carlos Maiolino
  2019-02-11 11:29 ` Nikolay Borisov
@ 2019-02-11 15:23 ` Matthew Wilcox
  2019-02-11 20:52   ` Andreas Dilger
  1 sibling, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2019-02-11 15:23 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: linux-fsdevel, hch, adilger, darrick.wong

On Mon, Feb 11, 2019 at 10:43:06AM +0100, Carlos Maiolino wrote:
> - The general idea, is to provide a way for FIEMAP ioctls to return the device
>   id where each extent is physically located.

How does userspace get to use this information?  If I call fiemap() and
it tells me extent 1 is on device 0x12345678 and extent 2 is on device
0x34567812, what can I do with that information?

Bear in mind that glibc uses a different dev_t from the kernel.

> - This is particularly useful for those filesystems where the file extents are
>   located on a different block device other than that associated with the
>   superblock , for example, btrfs using multiple devices, and XFS when using a
>   real-time device.

Darrick said it was useful for _inside_ the kernel.  How is it useful
for outside the kernel?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Extending FIEMAP ioctl to report device id
  2019-02-11 15:23 ` Matthew Wilcox
@ 2019-02-11 20:52   ` Andreas Dilger
  2019-02-11 21:34     ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Dilger @ 2019-02-11 20:52 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Carlos Maiolino, linux-fsdevel, Christoph Hellwig, darrick.wong

[-- Attachment #1: Type: text/plain, Size: 2300 bytes --]

On Feb 11, 2019, at 8:23 AM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Mon, Feb 11, 2019 at 10:43:06AM +0100, Carlos Maiolino wrote:
>> - The general idea, is to provide a way for FIEMAP ioctls to return the device
>>  id where each extent is physically located.
> 
> How does userspace get to use this information?  If I call fiemap() and
> it tells me extent 1 is on device 0x12345678 and extent 2 is on device
> 0x34567812, what can I do with that information?

For filesystems that may store a file on different devices, filefrag will
print out which device the file is located on, so that users can see where
the file is located.

Programs (e.g a mythical LILO that used FIEMAP instead of FIBMAP) could
check fe_device to see whether the whole file is located on the same block
device or not, and not allow booting from such a file.

> Bear in mind that glibc uses a different dev_t from the kernel.

That is glibc's problem.  The kernel would return fe_device using the same
dev_t that it uses for stat.st_dev and friends.  Even so, the majority of
users will care about "these blocks/files are on a different device than
those other blocks/files" and not the exact meaning of the bits.

>> - This is particularly useful for those filesystems where the file extents are
>>  located on a different block device other than that associated with the
>>  superblock , for example, btrfs using multiple devices, and XFS when using a
>>  real-time device.
> 
> Darrick said it was useful for _inside_ the kernel.  How is it useful
> for outside the kernel?

In my experience, this can be very useful for users to understand how their
file is allocated if there are performance or other issues with a particular
device.  Also, in some respects, it is _required_ for multi-device filesystems,
since it makes it clear that block 123 on one device is not related to the same
block number on a different device.

It may well be that ext4 will get some kind of multi-device capability in the
future (e.g. with the existing ext4 SMR patch using a separate flash journal
device and file data being permanently kept in the journal instead of the HDD,
or storing all the metadata on a flash device and all data on a HDD device).

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Extending FIEMAP ioctl to report device id
  2019-02-11 20:52   ` Andreas Dilger
@ 2019-02-11 21:34     ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2019-02-11 21:34 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Matthew Wilcox, Carlos Maiolino, linux-fsdevel,
	Christoph Hellwig, darrick.wong

On Mon, Feb 11, 2019 at 01:52:25PM -0700, Andreas Dilger wrote:
> On Feb 11, 2019, at 8:23 AM, Matthew Wilcox <willy@infradead.org> wrote:
> > 
> > On Mon, Feb 11, 2019 at 10:43:06AM +0100, Carlos Maiolino wrote:
> >> - The general idea, is to provide a way for FIEMAP ioctls to return the device
> >>  id where each extent is physically located.
> > 
> > How does userspace get to use this information?  If I call fiemap() and
> > it tells me extent 1 is on device 0x12345678 and extent 2 is on device
> > 0x34567812, what can I do with that information?
> 
> For filesystems that may store a file on different devices, filefrag will
> print out which device the file is located on, so that users can see where
> the file is located.

I suspect that even for XFS, we'd return a special cookie to say "On
data device #X", "on real time device #Y" or "on subvolume #Z"
rather than an actual block device. That will have a lot more
meaning to the XFS filesystem utilities that might use this
information than a raw block device (which may or may not exist!)
because then they don't have to jump through hoops to convert it to
something meaningful....

> > Darrick said it was useful for _inside_ the kernel.  How is it useful
> > for outside the kernel?
> 
> In my experience, this can be very useful for users to understand how their
> file is allocated if there are performance or other issues with a particular
> device.

*nod*

And when you have allocation policies that select different
devices for different files it makes it possible to easily verify
the policy is working correctly.

https://patchwork.kernel.org/patch/10081163/


> Also, in some respects, it is _required_ for multi-device filesystems,
> since it makes it clear that block 123 on one device is not related to the same
> block number on a different device.

*nod*

On xfs, we have to do 'xfs_io -c stat -c "fiemap -v" <file>' to
get the RT dev attribute in addition to the extent list right now.
It would be good to get them in just the fiemap call.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-11  9:43 Extending FIEMAP ioctl to report device id Carlos Maiolino
2019-02-11 11:29 ` Nikolay Borisov
2019-02-11 14:56   ` Carlos Maiolino
2019-02-11 15:23 ` Matthew Wilcox
2019-02-11 20:52   ` Andreas Dilger
2019-02-11 21:34     ` Dave Chinner

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox