All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/3] man-pages: fix reflink/dedupe ioctl manpages
@ 2016-08-25 23:26 ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:26 UTC (permalink / raw)
  To: mtk.manpages, darrick.wong; +Cc: linux-fsdevel, linux-api, linux-man

Hi all,

This is the eighth revision of a patchset that adds to XFS kernel
support for mapping multiple file logical blocks to the same physical
block (reflink/deduplication), implements the beginnings of online
metadata scrubbing and preening, and implements reverse mapping for
the realtime device.  There shouldn't be any incompatible on-disk
format changes, pending a thorough review of the patches within.

The patches in this series fix some errors and clarify further the
behavior of the clone and dedupe ioctls.  The third patch is a RFC
manpage for the proposed GETFSMAP ioctl, though the interface is not
yet upstream.

This is an extraordinary way to eat your data.  Enjoy! 
Comments and questions are, as always, welcome.

--D

[1] https://github.com/djwong/linux/tree/djwong-devel
[2] https://github.com/djwong/xfsprogs/tree/djwong-devel
[3] https://github.com/djwong/xfstests/tree/djwong-devel
[4] https://github.com/djwong/xfs-documentation/tree/djwong-devel
[5] https://github.com/djwong/man-pages/tree/djwong-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v8 0/3] man-pages: fix reflink/dedupe ioctl manpages
@ 2016-08-25 23:26 ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:26 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Hi all,

This is the eighth revision of a patchset that adds to XFS kernel
support for mapping multiple file logical blocks to the same physical
block (reflink/deduplication), implements the beginnings of online
metadata scrubbing and preening, and implements reverse mapping for
the realtime device.  There shouldn't be any incompatible on-disk
format changes, pending a thorough review of the patches within.

The patches in this series fix some errors and clarify further the
behavior of the clone and dedupe ioctls.  The third patch is a RFC
manpage for the proposed GETFSMAP ioctl, though the interface is not
yet upstream.

This is an extraordinary way to eat your data.  Enjoy! 
Comments and questions are, as always, welcome.

--D

[1] https://github.com/djwong/linux/tree/djwong-devel
[2] https://github.com/djwong/xfsprogs/tree/djwong-devel
[3] https://github.com/djwong/xfstests/tree/djwong-devel
[4] https://github.com/djwong/xfs-documentation/tree/djwong-devel
[5] https://github.com/djwong/man-pages/tree/djwong-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/3] man2/fallocate.2: document behavior with shared blocks
@ 2016-08-25 23:26   ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:26 UTC (permalink / raw)
  To: mtk.manpages, darrick.wong; +Cc: linux-fsdevel, linux-api, linux-man

Add a blurb to the fallocate manpage explaining that the fallocate
command may use CoW to unshare blocks to guarantee that a disk write
won't fail with ENOSPC.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man2/fallocate.2 |    6 ++++++
 1 file changed, 6 insertions(+)


diff --git a/man2/fallocate.2 b/man2/fallocate.2
index 54d6340..e050536 100644
--- a/man2/fallocate.2
+++ b/man2/fallocate.2
@@ -83,6 +83,12 @@ is useful for optimizing append workloads.
 Because allocation is done in block size chunks,
 .BR fallocate ()
 may allocate a larger range of disk space than was specified.
+.PP
+Filesystems which allow files to share the same physical storage may
+employ copy on write to unshare the physical blocks to guarantee that
+subsequent writes will not fail due to lack of disk space.
+If the disk blocks are then re-shared, a subsequent write may still
+fail due to lack of space.
 .SS Deallocating file space
 Specifying the
 .BR FALLOC_FL_PUNCH_HOLE


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 1/3] man2/fallocate.2: document behavior with shared blocks
@ 2016-08-25 23:26   ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:26 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Add a blurb to the fallocate manpage explaining that the fallocate
command may use CoW to unshare blocks to guarantee that a disk write
won't fail with ENOSPC.

Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 man2/fallocate.2 |    6 ++++++
 1 file changed, 6 insertions(+)


diff --git a/man2/fallocate.2 b/man2/fallocate.2
index 54d6340..e050536 100644
--- a/man2/fallocate.2
+++ b/man2/fallocate.2
@@ -83,6 +83,12 @@ is useful for optimizing append workloads.
 Because allocation is done in block size chunks,
 .BR fallocate ()
 may allocate a larger range of disk space than was specified.
+.PP
+Filesystems which allow files to share the same physical storage may
+employ copy on write to unshare the physical blocks to guarantee that
+subsequent writes will not fail due to lack of disk space.
+If the disk blocks are then re-shared, a subsequent write may still
+fail due to lack of space.
 .SS Deallocating file space
 Specifying the
 .BR FALLOC_FL_PUNCH_HOLE

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/3] man2/ioctl_fideduperange.2: clarify operation some more
@ 2016-08-25 23:26   ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:26 UTC (permalink / raw)
  To: mtk.manpages, darrick.wong; +Cc: linux-fsdevel, linux-api, linux-man

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man2/ioctl_ficlonerange.2  |    2 +-
 man2/ioctl_fideduperange.2 |   26 ++++++++++++++++++++++----
 2 files changed, 23 insertions(+), 5 deletions(-)


diff --git a/man2/ioctl_ficlonerange.2 b/man2/ioctl_ficlonerange.2
index ac0738a..0e3ae0e 100644
--- a/man2/ioctl_ficlonerange.2
+++ b/man2/ioctl_ficlonerange.2
@@ -114,7 +114,7 @@ regions in directories.
 .TP
 .B EOPNOTSUPP
 This can appear if the filesystem does not support reflinking either file
-descriptor.
+descriptor, or if either file descriptor refers to special inodes.
 .TP
 .B EPERM
 .IR dest_fd
diff --git a/man2/ioctl_fideduperange.2 b/man2/ioctl_fideduperange.2
index c52fa2a..2112d10 100644
--- a/man2/ioctl_fideduperange.2
+++ b/man2/ioctl_fideduperange.2
@@ -99,21 +99,39 @@ Each deduplication operation targets
 bytes in file descriptor
 .IR dest_fd
 at offset
-.IR logical_offset ".
+.IR dest_offset ".
 The field
 .IR reserved
 must be zero.
+During the call,
+.IR src_fd
+must be open for reading and
+.IR dest_fd
+must be open for writing.
+For any call to this ioctl, there may not be more than 65,536
+requests attached; each request may not exceed 16MiB.
+By convention, the storage used by
+.IR src_fd
+is mapped into
+.IR dest_fd
+and the previous contents in
+.IR dest_fd
+are freed.
 
 Upon successful completion of this ioctl, the number of bytes successfully
 deduplicated is returned in
 .IR bytes_deduped
 and a status code for the deduplication operation is returned in
 .IR status ".
-
+If even a single byte in the range does not match, the deduplication
+request will be ignored and
+.IR status
+set to
+.BR FILE_DEDUPE_RANGE_DIFFERS .
 The
 .IR status
 code is set to
-.B 0
+.B FILE_DEDUPE_RANGE_SAME
 for success, a negative error code in case of error, or
 .B FILE_DEDUPE_RANGE_DIFFERS
 if the data did not match.
@@ -150,7 +168,7 @@ regions in directories.
 .TP
 .B EOPNOTSUPP
 This can appear if the filesystem does not support deduplicating either file
-descriptor.
+descriptor, or if either file descriptor refers to special inodes.
 .TP
 .B EPERM
 .IR dest_fd


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/3] man2/ioctl_fideduperange.2: clarify operation some more
@ 2016-08-25 23:26   ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:26 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 man2/ioctl_ficlonerange.2  |    2 +-
 man2/ioctl_fideduperange.2 |   26 ++++++++++++++++++++++----
 2 files changed, 23 insertions(+), 5 deletions(-)


diff --git a/man2/ioctl_ficlonerange.2 b/man2/ioctl_ficlonerange.2
index ac0738a..0e3ae0e 100644
--- a/man2/ioctl_ficlonerange.2
+++ b/man2/ioctl_ficlonerange.2
@@ -114,7 +114,7 @@ regions in directories.
 .TP
 .B EOPNOTSUPP
 This can appear if the filesystem does not support reflinking either file
-descriptor.
+descriptor, or if either file descriptor refers to special inodes.
 .TP
 .B EPERM
 .IR dest_fd
diff --git a/man2/ioctl_fideduperange.2 b/man2/ioctl_fideduperange.2
index c52fa2a..2112d10 100644
--- a/man2/ioctl_fideduperange.2
+++ b/man2/ioctl_fideduperange.2
@@ -99,21 +99,39 @@ Each deduplication operation targets
 bytes in file descriptor
 .IR dest_fd
 at offset
-.IR logical_offset ".
+.IR dest_offset ".
 The field
 .IR reserved
 must be zero.
+During the call,
+.IR src_fd
+must be open for reading and
+.IR dest_fd
+must be open for writing.
+For any call to this ioctl, there may not be more than 65,536
+requests attached; each request may not exceed 16MiB.
+By convention, the storage used by
+.IR src_fd
+is mapped into
+.IR dest_fd
+and the previous contents in
+.IR dest_fd
+are freed.
 
 Upon successful completion of this ioctl, the number of bytes successfully
 deduplicated is returned in
 .IR bytes_deduped
 and a status code for the deduplication operation is returned in
 .IR status ".
-
+If even a single byte in the range does not match, the deduplication
+request will be ignored and
+.IR status
+set to
+.BR FILE_DEDUPE_RANGE_DIFFERS .
 The
 .IR status
 code is set to
-.B 0
+.B FILE_DEDUPE_RANGE_SAME
 for success, a negative error code in case of error, or
 .B FILE_DEDUPE_RANGE_DIFFERS
 if the data did not match.
@@ -150,7 +168,7 @@ regions in directories.
 .TP
 .B EOPNOTSUPP
 This can appear if the filesystem does not support deduplicating either file
-descriptor.
+descriptor, or if either file descriptor refers to special inodes.
 .TP
 .B EPERM
 .IR dest_fd

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
@ 2016-08-25 23:26   ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:26 UTC (permalink / raw)
  To: mtk.manpages, darrick.wong; +Cc: linux-fsdevel, linux-api, linux-man

Document the new XFS_IOC_GETFSMAP ioctl that returns the physical
layout of a (disk-based) filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man2/ioctl_xfs_ioc_getfsmap.2 |  294 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 294 insertions(+)
 create mode 100644 man2/ioctl_xfs_ioc_getfsmap.2


diff --git a/man2/ioctl_xfs_ioc_getfsmap.2 b/man2/ioctl_xfs_ioc_getfsmap.2
new file mode 100644
index 0000000..0d9ed47
--- /dev/null
+++ b/man2/ioctl_xfs_ioc_getfsmap.2
@@ -0,0 +1,294 @@
+.\" Copyright (c) 2016, Oracle.  All rights reserved.
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" This is free documentation; you can redistribute it and/or
+.\" modify it under the terms of the GNU General Public License as
+.\" published by the Free Software Foundation; either version 2 of
+.\" the License, or (at your option) any later version.
+.\"
+.\" The GNU General Public License's references to "object code"
+.\" and "executables" are to be interpreted as the output of any
+.\" document formatting or typesetting system, including
+.\" intermediate and printed output.
+.\"
+.\" This manual is distributed in the hope that it will be useful,
+.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
+.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+.\" GNU General Public License for more details.
+.\"
+.\" You should have received a copy of the GNU General Public
+.\" License along with this manual; if not, see
+.\" <http://www.gnu.org/licenses/>.
+.\" %%%LICENSE_END
+.TH IOCTL-XFS_IOC_GETFSMAP 2 2016-07-20 "Linux" "Linux Programmer's Manual"
+.SH NAME
+ioctl_xfs_ioc_getfsmap \- retrieve the physical layout of the filesystem
+.SH SYNOPSIS
+.br
+.B #include <sys/ioctl.h>
+.br
+.B #include <linux/fs.h>
+.sp
+.BI "int ioctl(int " fd ", XFS_IOC_GETFSMAP, struct getfsmap * " arg );
+.SH DESCRIPTION
+This
+.BR ioctl (2)
+retrieves physical extent mappings for a filesystem.
+This information can be used to discover which files are mapped to a physical
+block, examine free space, or find known bad blocks, among other things.
+
+The sole argument to this ioctl should be an array of the following
+structure:
+.in +4n
+.nf
+
+struct getfsmap {
+	__u32		fmv_device;	/* device id */
+	__u32		fmv_unused1;	/* future use, must be zero */
+	__u64		fmv_block;	/* starting block */
+	__u64		fmv_owner;	/* owner id */
+	__u64		fmv_offset;	/* file offset of segment */
+	__u64		fmv_length;	/* length of segment, blocks */
+	__u32		fmv_oflags;	/* mapping flags */
+	__u32		fmv_iflags;	/* control flags (1st structure) */
+	__u32		fmv_count;	/* # of entries in array incl. input */
+	__u32		fmv_entries;	/* # of entries filled in (output). */
+	__u64		fmv_unused2;	/* future use, must be zero */
+};
+
+.fi
+.in
+The array must contain at least two elements.
+The first two array elements specify the lowest and highest reverse-mapping
+keys, respectively, for which userspace would like physical mapping
+information.
+A reverse mapping key consists of the tuple (device, block, owner, offset).
+The owner and offset fields are part of the key because some filesystems
+support sharing physical blocks between multiple files and
+therefore may return multiple mappings for a given physical block.
+
+.SS Fields of struct getfsmap
+.PP
+The
+.I fmv_device
+field contains a 32-bit cookie to uniquely identify the underlying storage
+device.
+If the
+.B FMV_HOF_DEV_T
+flag is set in the header's
+.I fmv_oflags
+field, this field contains a dev_t from which major and minor numbers can
+be extracted.
+If the flag is not set, this field contains a value that must be unique
+for each storage device.
+
+.PP
+The
+.I fmv_unused1
+field must be zero in the first two array elements.
+
+.PP
+The
+.I fmv_block
+field contains the 512-byte sector address of the extent.
+
+.PP
+The
+.I fmv_owner
+field contains the owner of the extent.
+This is generally an inode number, though if
+.B FMV_OF_SPECIAL_OWNER
+is set in the
+.I fmv_oflags
+field, then the owner value is one of the following special values:
+.TP
+.B FMV_OWN_FREE
+Free space.
+.TP
+.B FMV_OWN_UNKNOWN
+This extent has an unknown owner.
+.TP
+.B FMV_OWN_FS
+Static filesystem metadata.
+.TP
+.B FMV_OWN_LOG
+The filesystem journal.
+.TP
+.B FMV_OWN_AG
+Allocation group metadata.
+.TP
+.B FMV_OWN_INOBT
+The inode index, if one is provided.
+.TP
+.B FMV_OWN_INODES
+Inodes.
+.TP
+.B FMV_OWN_REFC
+Reference counting indexes.
+.TP
+.B FMV_OWN_COW
+This extent is being used to stage a copy-on-write.
+.TP
+.B FMV_OWN_DEFECTIVE:
+This extent has been marked defective either by the filesystem or the
+underlying device.
+
+.PP
+The
+.I fmv_offset
+field contains the logical address of the reverse mapping record, in units
+of 512-byte blocks.
+This field has no meaning if the
+.BR FMV_OF_SPECIAL_OWNER " or " FMV_OF_EXTENT_MAP
+flags are set in
+.IR fmv_oflags "."
+
+.PP
+The
+.I fmv_length
+field contains the length of the extent, in units of 512-byte blocks.
+This field must be zero in the second array element.
+
+.PP
+The
+.I fmv_oflags
+field is a bitmask of extent state flags.
+In the header, the bits are:
+.TP
+.B FMV_HOF_DEV_T
+All
+.I fmv_device
+values will be in dev_t format.
+If this flag is not set, the value is merely a 32-bit cookie that will be
+unique for each physical device.
+.TP
+In a non-header, the bits are:
+.TP
+.B FMV_OF_PREALLOC
+The extent is allocated but not yet written.
+.TP
+.B FMV_OF_ATTR_FORK
+This extent contains extended attribute data.
+.TP
+.B FMV_OF_EXTENT_MAP
+This extent contains extent map information for the owner.
+.TP
+.B FMV_OF_SHARED
+Parts of this extent may be shared.
+.TP
+.B FMV_OF_SPECIAL_OWNER
+The
+.I fmv_owner
+field contains a special value instead of an inode number.
+.TP
+.B FMV_OF_LAST
+This is the last record in the filesystem.
+
+.PP
+The
+.I fmv_iflags
+field is a bitmask passed to the kernel to alter the output.
+There are no flags defined, so this value must be zero in the first
+two array elements.
+
+.PP
+The
+.I fmv_count
+field contains the number of elements in the array being passed to the
+kernel.
+This count must include the two control elements at the start of the
+array.
+The value must be specified in the first array element; in the second
+element this field must be zero.
+
+If this value is 2,
+.I fmv_entries
+will be set to the number of records that would have been returned had
+the array been large enough;
+no extent information will be returned.
+
+.PP
+The
+.I fmv_entries
+field contains the number of elements in the array that contain useful
+information if the ioctl returns a non-error value.
+This value does not include the two control elements at the start of the array.
+This value is only set in the first array element;
+in the second element, this field must be zero.
+
+.PP
+The
+.I fmv_unused2
+field must be zero in the first two array elements.
+
+.SS Array Elements
+.PP
+The key fields (fmv_device, fmv_block, fmv_owner, fmv_offset) of the first
+element of the array specify the lowest extent record in the keyspace that
+the caller wants returned.
+For example, if the key is set to (0, 36, 0, 0), the filesystem will
+only return records for extents starting at or above sector 36 on
+disk.
+For convenience, the
+.I fmv_length
+field will be added to the
+.IR fmv_block " and " fmv_offset
+fields as appropriate so that the (fmv_device, fmv_block, fmv_owner,
+fmv_offset, fmv_length) fields in the last array element can be copied
+into the first element to seed the next ioctl call.
+
+The key fields of the second element of the array specify the highest
+extent record in the keyspace that the caller wants returned.
+Returning to our example above, if that example key were instead
+passed in via the second array element, the filesystem will not return
+records for extents going past sector 36 on disk.
+For convenience, the four key fields can be set to ~0 (all ones) to
+signify "end of filesystem".
+
+If
+.I fmv_count
+in the first element of the array is 2, then
+.I fmv_entries
+in the first element of the array will be set to the number of extent
+records found in the filesystem.
+Otherwise,
+.I fmv_entries
+will be set to the number of extents actually returned, and the subsequent
+array elements will be filled out with extent information.
+In these
+subsequent array elements, the fields
+.IR fmv_iflags ", " fmv_count ", " fmv_entries ", and " fmv_unused1
+will be set to zero by the filesystem.
+
+.SH RETURN VALUE
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.PP
+.SH ERRORS
+Error codes can be one of, but are not limited to, the following:
+.TP
+.B EINVAL
+The array is not long enough, or a non-zero value was passed in one of the
+fields that must be zero.
+.TP
+.B EFAULT
+The pointer passed in was not mapped to a valid memory address.
+.TP
+.B EBADF
+.IR fd
+is not open for reading.
+.TP
+.B EPERM
+This query is not allowed.
+.TP
+.B EOPNOTSUPP
+The filesystem does not support this command.
+
+.SH CONFORMING TO
+This API is Linux-specific.
+Not all filesystems support it.
+.fi
+.in
+.SH SEE ALSO
+.BR ioctl (2)


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
@ 2016-08-25 23:26   ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:26 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Document the new XFS_IOC_GETFSMAP ioctl that returns the physical
layout of a (disk-based) filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 man2/ioctl_xfs_ioc_getfsmap.2 |  294 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 294 insertions(+)
 create mode 100644 man2/ioctl_xfs_ioc_getfsmap.2


diff --git a/man2/ioctl_xfs_ioc_getfsmap.2 b/man2/ioctl_xfs_ioc_getfsmap.2
new file mode 100644
index 0000000..0d9ed47
--- /dev/null
+++ b/man2/ioctl_xfs_ioc_getfsmap.2
@@ -0,0 +1,294 @@
+.\" Copyright (c) 2016, Oracle.  All rights reserved.
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" This is free documentation; you can redistribute it and/or
+.\" modify it under the terms of the GNU General Public License as
+.\" published by the Free Software Foundation; either version 2 of
+.\" the License, or (at your option) any later version.
+.\"
+.\" The GNU General Public License's references to "object code"
+.\" and "executables" are to be interpreted as the output of any
+.\" document formatting or typesetting system, including
+.\" intermediate and printed output.
+.\"
+.\" This manual is distributed in the hope that it will be useful,
+.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
+.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+.\" GNU General Public License for more details.
+.\"
+.\" You should have received a copy of the GNU General Public
+.\" License along with this manual; if not, see
+.\" <http://www.gnu.org/licenses/>.
+.\" %%%LICENSE_END
+.TH IOCTL-XFS_IOC_GETFSMAP 2 2016-07-20 "Linux" "Linux Programmer's Manual"
+.SH NAME
+ioctl_xfs_ioc_getfsmap \- retrieve the physical layout of the filesystem
+.SH SYNOPSIS
+.br
+.B #include <sys/ioctl.h>
+.br
+.B #include <linux/fs.h>
+.sp
+.BI "int ioctl(int " fd ", XFS_IOC_GETFSMAP, struct getfsmap * " arg );
+.SH DESCRIPTION
+This
+.BR ioctl (2)
+retrieves physical extent mappings for a filesystem.
+This information can be used to discover which files are mapped to a physical
+block, examine free space, or find known bad blocks, among other things.
+
+The sole argument to this ioctl should be an array of the following
+structure:
+.in +4n
+.nf
+
+struct getfsmap {
+	__u32		fmv_device;	/* device id */
+	__u32		fmv_unused1;	/* future use, must be zero */
+	__u64		fmv_block;	/* starting block */
+	__u64		fmv_owner;	/* owner id */
+	__u64		fmv_offset;	/* file offset of segment */
+	__u64		fmv_length;	/* length of segment, blocks */
+	__u32		fmv_oflags;	/* mapping flags */
+	__u32		fmv_iflags;	/* control flags (1st structure) */
+	__u32		fmv_count;	/* # of entries in array incl. input */
+	__u32		fmv_entries;	/* # of entries filled in (output). */
+	__u64		fmv_unused2;	/* future use, must be zero */
+};
+
+.fi
+.in
+The array must contain at least two elements.
+The first two array elements specify the lowest and highest reverse-mapping
+keys, respectively, for which userspace would like physical mapping
+information.
+A reverse mapping key consists of the tuple (device, block, owner, offset).
+The owner and offset fields are part of the key because some filesystems
+support sharing physical blocks between multiple files and
+therefore may return multiple mappings for a given physical block.
+
+.SS Fields of struct getfsmap
+.PP
+The
+.I fmv_device
+field contains a 32-bit cookie to uniquely identify the underlying storage
+device.
+If the
+.B FMV_HOF_DEV_T
+flag is set in the header's
+.I fmv_oflags
+field, this field contains a dev_t from which major and minor numbers can
+be extracted.
+If the flag is not set, this field contains a value that must be unique
+for each storage device.
+
+.PP
+The
+.I fmv_unused1
+field must be zero in the first two array elements.
+
+.PP
+The
+.I fmv_block
+field contains the 512-byte sector address of the extent.
+
+.PP
+The
+.I fmv_owner
+field contains the owner of the extent.
+This is generally an inode number, though if
+.B FMV_OF_SPECIAL_OWNER
+is set in the
+.I fmv_oflags
+field, then the owner value is one of the following special values:
+.TP
+.B FMV_OWN_FREE
+Free space.
+.TP
+.B FMV_OWN_UNKNOWN
+This extent has an unknown owner.
+.TP
+.B FMV_OWN_FS
+Static filesystem metadata.
+.TP
+.B FMV_OWN_LOG
+The filesystem journal.
+.TP
+.B FMV_OWN_AG
+Allocation group metadata.
+.TP
+.B FMV_OWN_INOBT
+The inode index, if one is provided.
+.TP
+.B FMV_OWN_INODES
+Inodes.
+.TP
+.B FMV_OWN_REFC
+Reference counting indexes.
+.TP
+.B FMV_OWN_COW
+This extent is being used to stage a copy-on-write.
+.TP
+.B FMV_OWN_DEFECTIVE:
+This extent has been marked defective either by the filesystem or the
+underlying device.
+
+.PP
+The
+.I fmv_offset
+field contains the logical address of the reverse mapping record, in units
+of 512-byte blocks.
+This field has no meaning if the
+.BR FMV_OF_SPECIAL_OWNER " or " FMV_OF_EXTENT_MAP
+flags are set in
+.IR fmv_oflags "."
+
+.PP
+The
+.I fmv_length
+field contains the length of the extent, in units of 512-byte blocks.
+This field must be zero in the second array element.
+
+.PP
+The
+.I fmv_oflags
+field is a bitmask of extent state flags.
+In the header, the bits are:
+.TP
+.B FMV_HOF_DEV_T
+All
+.I fmv_device
+values will be in dev_t format.
+If this flag is not set, the value is merely a 32-bit cookie that will be
+unique for each physical device.
+.TP
+In a non-header, the bits are:
+.TP
+.B FMV_OF_PREALLOC
+The extent is allocated but not yet written.
+.TP
+.B FMV_OF_ATTR_FORK
+This extent contains extended attribute data.
+.TP
+.B FMV_OF_EXTENT_MAP
+This extent contains extent map information for the owner.
+.TP
+.B FMV_OF_SHARED
+Parts of this extent may be shared.
+.TP
+.B FMV_OF_SPECIAL_OWNER
+The
+.I fmv_owner
+field contains a special value instead of an inode number.
+.TP
+.B FMV_OF_LAST
+This is the last record in the filesystem.
+
+.PP
+The
+.I fmv_iflags
+field is a bitmask passed to the kernel to alter the output.
+There are no flags defined, so this value must be zero in the first
+two array elements.
+
+.PP
+The
+.I fmv_count
+field contains the number of elements in the array being passed to the
+kernel.
+This count must include the two control elements at the start of the
+array.
+The value must be specified in the first array element; in the second
+element this field must be zero.
+
+If this value is 2,
+.I fmv_entries
+will be set to the number of records that would have been returned had
+the array been large enough;
+no extent information will be returned.
+
+.PP
+The
+.I fmv_entries
+field contains the number of elements in the array that contain useful
+information if the ioctl returns a non-error value.
+This value does not include the two control elements at the start of the array.
+This value is only set in the first array element;
+in the second element, this field must be zero.
+
+.PP
+The
+.I fmv_unused2
+field must be zero in the first two array elements.
+
+.SS Array Elements
+.PP
+The key fields (fmv_device, fmv_block, fmv_owner, fmv_offset) of the first
+element of the array specify the lowest extent record in the keyspace that
+the caller wants returned.
+For example, if the key is set to (0, 36, 0, 0), the filesystem will
+only return records for extents starting at or above sector 36 on
+disk.
+For convenience, the
+.I fmv_length
+field will be added to the
+.IR fmv_block " and " fmv_offset
+fields as appropriate so that the (fmv_device, fmv_block, fmv_owner,
+fmv_offset, fmv_length) fields in the last array element can be copied
+into the first element to seed the next ioctl call.
+
+The key fields of the second element of the array specify the highest
+extent record in the keyspace that the caller wants returned.
+Returning to our example above, if that example key were instead
+passed in via the second array element, the filesystem will not return
+records for extents going past sector 36 on disk.
+For convenience, the four key fields can be set to ~0 (all ones) to
+signify "end of filesystem".
+
+If
+.I fmv_count
+in the first element of the array is 2, then
+.I fmv_entries
+in the first element of the array will be set to the number of extent
+records found in the filesystem.
+Otherwise,
+.I fmv_entries
+will be set to the number of extents actually returned, and the subsequent
+array elements will be filled out with extent information.
+In these
+subsequent array elements, the fields
+.IR fmv_iflags ", " fmv_count ", " fmv_entries ", and " fmv_unused1
+will be set to zero by the filesystem.
+
+.SH RETURN VALUE
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.PP
+.SH ERRORS
+Error codes can be one of, but are not limited to, the following:
+.TP
+.B EINVAL
+The array is not long enough, or a non-zero value was passed in one of the
+fields that must be zero.
+.TP
+.B EFAULT
+The pointer passed in was not mapped to a valid memory address.
+.TP
+.B EBADF
+.IR fd
+is not open for reading.
+.TP
+.B EPERM
+This query is not allowed.
+.TP
+.B EOPNOTSUPP
+The filesystem does not support this command.
+
+.SH CONFORMING TO
+This API is Linux-specific.
+Not all filesystems support it.
+.fi
+.in
+.SH SEE ALSO
+.BR ioctl (2)

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
  2016-08-25 23:26   ` Darrick J. Wong
  (?)
@ 2016-08-29 21:34   ` Andreas Dilger
  2016-08-30 19:09       ` Darrick J. Wong
  -1 siblings, 1 reply; 26+ messages in thread
From: Andreas Dilger @ 2016-08-29 21:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: mtk.manpages, linux-fsdevel, linux-api, linux-man

[-- Attachment #1: Type: text/plain, Size: 11555 bytes --]

On Aug 25, 2016, at 5:26 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> Document the new XFS_IOC_GETFSMAP ioctl that returns the physical
> layout of a (disk-based) filesystem.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> man2/ioctl_xfs_ioc_getfsmap.2 |  294 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 294 insertions(+)
> create mode 100644 man2/ioctl_xfs_ioc_getfsmap.2
> 
> 
> diff --git a/man2/ioctl_xfs_ioc_getfsmap.2 b/man2/ioctl_xfs_ioc_getfsmap.2
> new file mode 100644
> index 0000000..0d9ed47
> --- /dev/null
> +++ b/man2/ioctl_xfs_ioc_getfsmap.2
> @@ -0,0 +1,294 @@
> +.\" Copyright (c) 2016, Oracle.  All rights reserved.
> +.\"
> +.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
> +.\" This is free documentation; you can redistribute it and/or
> +.\" modify it under the terms of the GNU General Public License as
> +.\" published by the Free Software Foundation; either version 2 of
> +.\" the License, or (at your option) any later version.
> +.\"
> +.\" The GNU General Public License's references to "object code"
> +.\" and "executables" are to be interpreted as the output of any
> +.\" document formatting or typesetting system, including
> +.\" intermediate and printed output.
> +.\"
> +.\" This manual is distributed in the hope that it will be useful,
> +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
> +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +.\" GNU General Public License for more details.
> +.\"
> +.\" You should have received a copy of the GNU General Public
> +.\" License along with this manual; if not, see
> +.\" <http://www.gnu.org/licenses/>.
> +.\" %%%LICENSE_END
> +.TH IOCTL-XFS_IOC_GETFSMAP 2 2016-07-20 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +ioctl_xfs_ioc_getfsmap \- retrieve the physical layout of the filesystem
> +.SH SYNOPSIS
> +.br
> +.B #include <sys/ioctl.h>
> +.br
> +.B #include <linux/fs.h>
> +.sp
> +.BI "int ioctl(int " fd ", XFS_IOC_GETFSMAP, struct getfsmap * " arg );
> +.SH DESCRIPTION
> +This
> +.BR ioctl (2)
> +retrieves physical extent mappings for a filesystem.
> +This information can be used to discover which files are mapped to a physical
> +block, examine free space, or find known bad blocks, among other things.
> +
> +The sole argument to this ioctl should be an array of the following
> +structure:
> +.in +4n
> +.nf
> +
> +struct getfsmap {
> +	__u32		fmv_device;	/* device id */
> +	__u32		fmv_unused1;	/* future use, must be zero */
> +	__u64		fmv_block;	/* starting block */
> +	__u64		fmv_owner;	/* owner id */
> +	__u64		fmv_offset;	/* file offset of segment */
> +	__u64		fmv_length;	/* length of segment, blocks */
> +	__u32		fmv_oflags;	/* mapping flags */
> +	__u32		fmv_iflags;	/* control flags (1st structure) */
> +	__u32		fmv_count;	/* # of entries in array incl. input */
> +	__u32		fmv_entries;	/* # of entries filled in (output). */
> +	__u64		fmv_unused2;	/* future use, must be zero */
> +};
> +
> +.fi
> +.in
> +The array must contain at least two elements.
> +The first two array elements specify the lowest and highest reverse-mapping
> +keys, respectively, for which userspace would like physical mapping
> +information.
> +A reverse mapping key consists of the tuple (device, block, owner, offset).
> +The owner and offset fields are part of the key because some filesystems
> +support sharing physical blocks between multiple files and
> +therefore may return multiple mappings for a given physical block.
> +
> +.SS Fields of struct getfsmap
> +.PP
> +The
> +.I fmv_device
> +field contains a 32-bit cookie to uniquely identify the underlying storage
> +device.
> +If the
> +.B FMV_HOF_DEV_T
> +flag is set in the header's
> +.I fmv_oflags
> +field, this field contains a dev_t from which major and minor numbers can
> +be extracted.
> +If the flag is not set, this field contains a value that must be unique
> +for each storage device.
> +
> +.PP
> +The
> +.I fmv_unused1
> +field must be zero in the first two array elements.
> +
> +.PP
> +The
> +.I fmv_block
> +field contains the 512-byte sector address of the extent.

Why would you use 512-byte sectors in a new interface?  I recall for FIEMAP
that some filesystems may not have files aligned to sector offsets, and we
just used byte offsets.  Storage like NVDIMMs are cacheline granular, so I
don't think it makes sense to tie this to old disk sector sizes.  Alternately,
the units could be in terms of fs blocks as returned by statvfs.st_bsize,
but mixing units for fmv_block, fmv_offset, fmv_length is uneeded complexity.

> +
> +.PP
> +The
> +.I fmv_owner
> +field contains the owner of the extent.
> +This is generally an inode number, though if
> +.B FMV_OF_SPECIAL_OWNER
> +is set in the
> +.I fmv_oflags
> +field, then the owner value is one of the following special values:
> +.TP
> +.B FMV_OWN_FREE
> +Free space.
> +.TP
> +.B FMV_OWN_UNKNOWN
> +This extent has an unknown owner.
> +.TP
> +.B FMV_OWN_FS
> +Static filesystem metadata.
> +.TP
> +.B FMV_OWN_LOG
> +The filesystem journal.
> +.TP
> +.B FMV_OWN_AG
> +Allocation group metadata.
> +.TP
> +.B FMV_OWN_INODES
> +Inodes.
> +.TP
> +.B FMV_OWN_DEFECTIVE:
> +This extent has been marked defective either by the filesystem or the
> +underlying device.

These above ones are relatively clear what they are.  The next items are
not very clear what they are, and whether they need to be exported as
specific items, or could they just be lumped under "FMV_OWN_FS"?  If they
serve some specific purpose, at a minimum they need better descriptions.

> +.TP
> +.B FMV_OWN_INOBT
> +The inode index, if one is provided.
> +.TP
> +.B FMV_OWN_REFC
> +Reference counting indexes.
> +.TP
> +.B FMV_OWN_COW
> +This extent is being used to stage a copy-on-write.
> 
> +
> +.PP
> +The
> +.I fmv_offset
> +field contains the logical address of the reverse mapping record, in units
> +of 512-byte blocks.
> +This field has no meaning if the
> +.BR FMV_OF_SPECIAL_OWNER " or " FMV_OF_EXTENT_MAP
> +flags are set in
> +.IR fmv_oflags "."
> +
> +.PP
> +The
> +.I fmv_length
> +field contains the length of the extent, in units of 512-byte blocks.
> +This field must be zero in the second array element.
> +
> +.PP
> +The
> +.I fmv_oflags
> +field is a bitmask of extent state flags.
> +In the header, the bits are:
> +.TP
> +.B FMV_HOF_DEV_T
> +All
> +.I fmv_device
> +values will be in dev_t format.
> +If this flag is not set, the value is merely a 32-bit cookie that will be
> +unique for each physical device.
> +.TP
> +In a non-header, the bits are:
> +.TP
> +.B FMV_OF_PREALLOC
> +The extent is allocated but not yet written.
> +.TP
> +.B FMV_OF_ATTR_FORK
> +This extent contains extended attribute data.
> +.TP
> +.B FMV_OF_EXTENT_MAP
> +This extent contains extent map information for the owner.
> +.TP
> +.B FMV_OF_SHARED
> +Parts of this extent may be shared.
> +.TP
> +.B FMV_OF_SPECIAL_OWNER
> +The
> +.I fmv_owner
> +field contains a special value instead of an inode number.
> +.TP
> +.B FMV_OF_LAST
> +This is the last record in the filesystem.
> +
> +.PP
> +The
> +.I fmv_iflags
> +field is a bitmask passed to the kernel to alter the output.
> +There are no flags defined, so this value must be zero in the first
> +two array elements.

It seems like there are several fields in the structure that are used for
only input or only output?  Does it make more sense to have one structure
used only for the input request, and then the array of values returned be
in a different structure?  I'm not necessarily requesting that it be changed,
but it definitely is something I noticed a few times while reading this doc.

Cheers, Andreas

> +.PP
> +The
> +.I fmv_count
> +field contains the number of elements in the array being passed to the
> +kernel.
> +This count must include the two control elements at the start of the
> +array.
> +The value must be specified in the first array element; in the second
> +element this field must be zero.
> +
> +If this value is 2,
> +.I fmv_entries
> +will be set to the number of records that would have been returned had
> +the array been large enough;
> +no extent information will be returned.
> +
> +.PP
> +The
> +.I fmv_entries
> +field contains the number of elements in the array that contain useful
> +information if the ioctl returns a non-error value.
> +This value does not include the two control elements at the start of the array.
> +This value is only set in the first array element;
> +in the second element, this field must be zero.
> +
> +.PP
> +The
> +.I fmv_unused2
> +field must be zero in the first two array elements.
> +
> +.SS Array Elements
> +.PP
> +The key fields (fmv_device, fmv_block, fmv_owner, fmv_offset) of the first
> +element of the array specify the lowest extent record in the keyspace that
> +the caller wants returned.
> +For example, if the key is set to (0, 36, 0, 0), the filesystem will
> +only return records for extents starting at or above sector 36 on
> +disk.
> +For convenience, the
> +.I fmv_length
> +field will be added to the
> +.IR fmv_block " and " fmv_offset
> +fields as appropriate so that the (fmv_device, fmv_block, fmv_owner,
> +fmv_offset, fmv_length) fields in the last array element can be copied
> +into the first element to seed the next ioctl call.
> +
> +The key fields of the second element of the array specify the highest
> +extent record in the keyspace that the caller wants returned.
> +Returning to our example above, if that example key were instead
> +passed in via the second array element, the filesystem will not return
> +records for extents going past sector 36 on disk.
> +For convenience, the four key fields can be set to ~0 (all ones) to
> +signify "end of filesystem".
> +
> +If
> +.I fmv_count
> +in the first element of the array is 2, then
> +.I fmv_entries
> +in the first element of the array will be set to the number of extent
> +records found in the filesystem.
> +Otherwise,
> +.I fmv_entries
> +will be set to the number of extents actually returned, and the subsequent
> +array elements will be filled out with extent information.
> +In these
> +subsequent array elements, the fields
> +.IR fmv_iflags ", " fmv_count ", " fmv_entries ", and " fmv_unused1
> +will be set to zero by the filesystem.
> +
> +.SH RETURN VALUE
> +On error, \-1 is returned, and
> +.I errno
> +is set to indicate the error.
> +.PP
> +.SH ERRORS
> +Error codes can be one of, but are not limited to, the following:
> +.TP
> +.B EINVAL
> +The array is not long enough, or a non-zero value was passed in one of the
> +fields that must be zero.
> +.TP
> +.B EFAULT
> +The pointer passed in was not mapped to a valid memory address.
> +.TP
> +.B EBADF
> +.IR fd
> +is not open for reading.
> +.TP
> +.B EPERM
> +This query is not allowed.
> +.TP
> +.B EOPNOTSUPP
> +The filesystem does not support this command.
> +
> +.SH CONFORMING TO
> +This API is Linux-specific.
> +Not all filesystems support it.
> +.fi
> +.in
> +.SH SEE ALSO
> +.BR ioctl (2)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
  2016-08-29 21:34   ` Andreas Dilger
  2016-08-30 19:09       ` Darrick J. Wong
@ 2016-08-30 19:09       ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-08-30 19:09 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: mtk.manpages, linux-fsdevel, linux-api, linux-man, xfs,
	linux-xfs, linux-btrfs

[add a few more relevant lists to cc]

On Mon, Aug 29, 2016 at 03:34:11PM -0600, Andreas Dilger wrote:
> On Aug 25, 2016, at 5:26 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > 
> > Document the new XFS_IOC_GETFSMAP ioctl that returns the physical
> > layout of a (disk-based) filesystem.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > man2/ioctl_xfs_ioc_getfsmap.2 |  294 +++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 294 insertions(+)
> > create mode 100644 man2/ioctl_xfs_ioc_getfsmap.2
> > 
> > 
> > diff --git a/man2/ioctl_xfs_ioc_getfsmap.2 b/man2/ioctl_xfs_ioc_getfsmap.2
> > new file mode 100644
> > index 0000000..0d9ed47
> > --- /dev/null
> > +++ b/man2/ioctl_xfs_ioc_getfsmap.2
> > @@ -0,0 +1,294 @@
> > +.\" Copyright (c) 2016, Oracle.  All rights reserved.
> > +.\"
> > +.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
> > +.\" This is free documentation; you can redistribute it and/or
> > +.\" modify it under the terms of the GNU General Public License as
> > +.\" published by the Free Software Foundation; either version 2 of
> > +.\" the License, or (at your option) any later version.
> > +.\"
> > +.\" The GNU General Public License's references to "object code"
> > +.\" and "executables" are to be interpreted as the output of any
> > +.\" document formatting or typesetting system, including
> > +.\" intermediate and printed output.
> > +.\"
> > +.\" This manual is distributed in the hope that it will be useful,
> > +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +.\" GNU General Public License for more details.
> > +.\"
> > +.\" You should have received a copy of the GNU General Public
> > +.\" License along with this manual; if not, see
> > +.\" <http://www.gnu.org/licenses/>.
> > +.\" %%%LICENSE_END
> > +.TH IOCTL-XFS_IOC_GETFSMAP 2 2016-07-20 "Linux" "Linux Programmer's Manual"
> > +.SH NAME
> > +ioctl_xfs_ioc_getfsmap \- retrieve the physical layout of the filesystem
> > +.SH SYNOPSIS
> > +.br
> > +.B #include <sys/ioctl.h>
> > +.br
> > +.B #include <linux/fs.h>
> > +.sp
> > +.BI "int ioctl(int " fd ", XFS_IOC_GETFSMAP, struct getfsmap * " arg );
> > +.SH DESCRIPTION
> > +This
> > +.BR ioctl (2)
> > +retrieves physical extent mappings for a filesystem.
> > +This information can be used to discover which files are mapped to a physical
> > +block, examine free space, or find known bad blocks, among other things.
> > +
> > +The sole argument to this ioctl should be an array of the following
> > +structure:
> > +.in +4n
> > +.nf
> > +
> > +struct getfsmap {
> > +	__u32		fmv_device;	/* device id */
> > +	__u32		fmv_unused1;	/* future use, must be zero */
> > +	__u64		fmv_block;	/* starting block */
> > +	__u64		fmv_owner;	/* owner id */
> > +	__u64		fmv_offset;	/* file offset of segment */
> > +	__u64		fmv_length;	/* length of segment, blocks */
> > +	__u32		fmv_oflags;	/* mapping flags */
> > +	__u32		fmv_iflags;	/* control flags (1st structure) */
> > +	__u32		fmv_count;	/* # of entries in array incl. input */
> > +	__u32		fmv_entries;	/* # of entries filled in (output). */
> > +	__u64		fmv_unused2;	/* future use, must be zero */
> > +};
> > +
> > +.fi
> > +.in
> > +The array must contain at least two elements.
> > +The first two array elements specify the lowest and highest reverse-mapping
> > +keys, respectively, for which userspace would like physical mapping
> > +information.
> > +A reverse mapping key consists of the tuple (device, block, owner, offset).
> > +The owner and offset fields are part of the key because some filesystems
> > +support sharing physical blocks between multiple files and
> > +therefore may return multiple mappings for a given physical block.
> > +
> > +.SS Fields of struct getfsmap
> > +.PP
> > +The
> > +.I fmv_device
> > +field contains a 32-bit cookie to uniquely identify the underlying storage
> > +device.
> > +If the
> > +.B FMV_HOF_DEV_T
> > +flag is set in the header's
> > +.I fmv_oflags
> > +field, this field contains a dev_t from which major and minor numbers can
> > +be extracted.
> > +If the flag is not set, this field contains a value that must be unique
> > +for each storage device.
> > +
> > +.PP
> > +The
> > +.I fmv_unused1
> > +field must be zero in the first two array elements.
> > +
> > +.PP
> > +The
> > +.I fmv_block
> > +field contains the 512-byte sector address of the extent.
> 
> Why would you use 512-byte sectors in a new interface?

I started designing XFS GETFSMAP with the intent of making it feel
familiar to anyone who'd already used the XFS GETBMAP interface.
Hence you pass in an array of struct getfsmap[N] where the start of
the array are key fields and the rest are filled out by the kernel,
and the units are 512-byte blocks.  As a result, some things (special
owners in particular) are strongly influenced by XFS.

Ofc then LSF happened and the btrfs developers expressed a desire to
have a similar call, so now it's out for review on fsdevel.  Now
there's a question of whether or not we can create a generic enough
interface to fit the major filesystems so as not to expose a bunch of
balkanized fsmap ioctls to userspace.

I also haven't heard much from the btrfs list in previous review cycles.

(I say that more in reference to the 'special owners' below than any
other part of GETFSMAP.)

> I recall for FIEMAP that some filesystems may not have files aligned
> to sector offsets, and we just used byte offsets.  Storage like
> NVDIMMs are cacheline granular, so I don't think it makes sense to
> tie this to old disk sector sizes.  Alternately, the units could be
> in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> units for fmv_block, fmv_offset, fmv_length is uneeded complexity.

Ugh.  I'd rather just change the units to bytes rather than force all
the users to multiply things. :)

> > +
> > +.PP
> > +The
> > +.I fmv_owner
> > +field contains the owner of the extent.
> > +This is generally an inode number, though if
> > +.B FMV_OF_SPECIAL_OWNER
> > +is set in the
> > +.I fmv_oflags
> > +field, then the owner value is one of the following special values:
> > +.TP
> > +.B FMV_OWN_FREE
> > +Free space.
> > +.TP
> > +.B FMV_OWN_UNKNOWN
> > +This extent has an unknown owner.
> > +.TP
> > +.B FMV_OWN_FS
> > +Static filesystem metadata.

"Static filesystem metadata.  This information must exist at this disk
address; on XFS, this is the AG superblock, AGF, AGI, and AGFL
headers."

> > +.TP
> > +.B FMV_OWN_LOG
> > +The filesystem journal.
> > +.TP
> > +.B FMV_OWN_AG
> > +Allocation group metadata.

"Allocation group metadata.  On XFS these are the free space btrees
and the reverse mapping btree."

> > +.TP
> > +.B FMV_OWN_INODES
> > +Inodes.
> > +.TP
> > +.B FMV_OWN_DEFECTIVE:
> > +This extent has been marked defective either by the filesystem or the
> > +underlying device.
> 
> These above ones are relatively clear what they are.  The next items
> are not very clear what they are,

These all are very XFS-specific special owner codes; most of them
correspond directly to the special owners in the XFS reverse-mapping
structure.

OWN_FS = AG superblock
OWN_AG = free space and rmap btrees
OWN_INODES = inode records
OWN_INOBT = inode btree pointing to inode record blocks
OWN_REFC = reference count btree
OWN_COW = extent being used for a copy-on-write
OWN_LOG = internal log

For ext4, we could probably reuse the owner codes:

OWN_FS = superblock + group descriptors
OWN_AG = block/inode bitmaps
OWN_INODES = inode table
OWN_LOG = journal

Granted, we could also just smush everything into OWN_METADATA such
that the only special owners would be FREE, METADATA, COW, and
DEFECTIVE.  I don't like that because now the kernel decides to throw
away information that userspace might be able to use, because I prefer
more expressive APIs.  Though I do see the counter-argument that
userspace should not have direct access to metadata and therefore
needn't know more than it's metadata.

I'd much rather just add more special owner codes for any other
filesystem that has distinguishable metadata types that are not
covered by the existing OWN_ codes.  We /do/ have 2^64 possible
values, so it's not like we're going to run out.

> and whether they need to be exported as specific items, or could
> they just be lumped under "FMV_OWN_FS"?  If they serve some specific
> purpose, at a minimum they need better descriptions.
> 
> > +.TP
> > +.B FMV_OWN_INOBT
> > +The inode index, if one is provided.

"Inode indexing information.  On XFS this is the inode btree and free
inode btree." ?

> > +.TP
> > +.B FMV_OWN_REFC
> > +Reference counting indexes.

"Reference count information.  On XFS this is the refcount btree." ?

> > +.TP
> > +.B FMV_OWN_COW
> > +This extent is being used to stage a copy-on-write.

I'm not sure if you found this description to be lacking; I think it's
fine.

> > +
> > +.PP
> > +The
> > +.I fmv_offset
> > +field contains the logical address of the reverse mapping record, in units
> > +of 512-byte blocks.
> > +This field has no meaning if the
> > +.BR FMV_OF_SPECIAL_OWNER " or " FMV_OF_EXTENT_MAP
> > +flags are set in
> > +.IR fmv_oflags "."
> > +
> > +.PP
> > +The
> > +.I fmv_length
> > +field contains the length of the extent, in units of 512-byte blocks.
> > +This field must be zero in the second array element.
> > +
> > +.PP
> > +The
> > +.I fmv_oflags
> > +field is a bitmask of extent state flags.
> > +In the header, the bits are:
> > +.TP
> > +.B FMV_HOF_DEV_T
> > +All
> > +.I fmv_device
> > +values will be in dev_t format.
> > +If this flag is not set, the value is merely a 32-bit cookie that will be
> > +unique for each physical device.
> > +.TP
> > +In a non-header, the bits are:
> > +.TP
> > +.B FMV_OF_PREALLOC
> > +The extent is allocated but not yet written.
> > +.TP
> > +.B FMV_OF_ATTR_FORK
> > +This extent contains extended attribute data.
> > +.TP
> > +.B FMV_OF_EXTENT_MAP
> > +This extent contains extent map information for the owner.
> > +.TP
> > +.B FMV_OF_SHARED
> > +Parts of this extent may be shared.
> > +.TP
> > +.B FMV_OF_SPECIAL_OWNER
> > +The
> > +.I fmv_owner
> > +field contains a special value instead of an inode number.
> > +.TP
> > +.B FMV_OF_LAST
> > +This is the last record in the filesystem.
> > +
> > +.PP
> > +The
> > +.I fmv_iflags
> > +field is a bitmask passed to the kernel to alter the output.
> > +There are no flags defined, so this value must be zero in the first
> > +two array elements.
> 
> It seems like there are several fields in the structure that are used for
> only input or only output?  Does it make more sense to have one structure
> used only for the input request, and then the array of values returned be
> in a different structure?  I'm not necessarily requesting that it be changed,
> but it definitely is something I noticed a few times while reading this doc.

I've been thinking about rearranging this a bit, since the flags
handling is very awkward with the current array structure.  Each
rmap has its own flags; we may someday want to pass operation flags
into the ioctl; and we currently have one operation flag to pass back
to userspace.  Each of those flags can be a separate field.  I think
people will get confused about FMV_OF_* and FMV_HOF_* being referenced
in oflags, and iflags has no meaning for returned records.

So, this instead?

struct getfsmap_rec {
	u32 device;		/* device id */
	u32 flags;		/* mapping flags */
	u64 block;		/* physical addr, bytes */
	u64 owner;		/* inode or special owner code */
	u64 offset;		/* file offset of mapping, bytes */
	u64 length;		/* length of segment, bytes */
	u64 reserved;		/* will be set to zero */
}; /* 48 bytes */

struct getfsmap_head {
	u32 iflags;		/* none defined yet */
	u32 oflags;		/* FMV_HOF_DEV_T */
	u32 count;		/* # entries in recs array */
	u32 entries;		/* # entries filled in (output) */
	u64 reserved[2]; 	/* must be zero */

	struct getfsmap_rec keys[2]; /* low and high keys for the mapping search */
	struct getfsmap_rec recs[0];
}; /* 32 bytes + 2*48 = 128 bytes */

#define XFS_IOC_GETFSMAP	_IOWR('X', 59, struct getfsmap_head)

This also means that userspace can set up for the next ioctl
invocation with memcpy(&head->keys[0], &head->recs[head->entries - 1]).

Yes, I think I like this better.  Everyone else, please chime in. :)

--D

> Cheers, Andreas
> 
> > +.PP
> > +The
> > +.I fmv_count
> > +field contains the number of elements in the array being passed to the
> > +kernel.
> > +This count must include the two control elements at the start of the
> > +array.
> > +The value must be specified in the first array element; in the second
> > +element this field must be zero.
> > +
> > +If this value is 2,
> > +.I fmv_entries
> > +will be set to the number of records that would have been returned had
> > +the array been large enough;
> > +no extent information will be returned.
> > +
> > +.PP
> > +The
> > +.I fmv_entries
> > +field contains the number of elements in the array that contain useful
> > +information if the ioctl returns a non-error value.
> > +This value does not include the two control elements at the start of the array.
> > +This value is only set in the first array element;
> > +in the second element, this field must be zero.
> > +
> > +.PP
> > +The
> > +.I fmv_unused2
> > +field must be zero in the first two array elements.
> > +
> > +.SS Array Elements
> > +.PP
> > +The key fields (fmv_device, fmv_block, fmv_owner, fmv_offset) of the first
> > +element of the array specify the lowest extent record in the keyspace that
> > +the caller wants returned.
> > +For example, if the key is set to (0, 36, 0, 0), the filesystem will
> > +only return records for extents starting at or above sector 36 on
> > +disk.
> > +For convenience, the
> > +.I fmv_length
> > +field will be added to the
> > +.IR fmv_block " and " fmv_offset
> > +fields as appropriate so that the (fmv_device, fmv_block, fmv_owner,
> > +fmv_offset, fmv_length) fields in the last array element can be copied
> > +into the first element to seed the next ioctl call.
> > +
> > +The key fields of the second element of the array specify the highest
> > +extent record in the keyspace that the caller wants returned.
> > +Returning to our example above, if that example key were instead
> > +passed in via the second array element, the filesystem will not return
> > +records for extents going past sector 36 on disk.
> > +For convenience, the four key fields can be set to ~0 (all ones) to
> > +signify "end of filesystem".
> > +
> > +If
> > +.I fmv_count
> > +in the first element of the array is 2, then
> > +.I fmv_entries
> > +in the first element of the array will be set to the number of extent
> > +records found in the filesystem.
> > +Otherwise,
> > +.I fmv_entries
> > +will be set to the number of extents actually returned, and the subsequent
> > +array elements will be filled out with extent information.
> > +In these
> > +subsequent array elements, the fields
> > +.IR fmv_iflags ", " fmv_count ", " fmv_entries ", and " fmv_unused1
> > +will be set to zero by the filesystem.
> > +
> > +.SH RETURN VALUE
> > +On error, \-1 is returned, and
> > +.I errno
> > +is set to indicate the error.
> > +.PP
> > +.SH ERRORS
> > +Error codes can be one of, but are not limited to, the following:
> > +.TP
> > +.B EINVAL
> > +The array is not long enough, or a non-zero value was passed in one of the
> > +fields that must be zero.
> > +.TP
> > +.B EFAULT
> > +The pointer passed in was not mapped to a valid memory address.
> > +.TP
> > +.B EBADF
> > +.IR fd
> > +is not open for reading.
> > +.TP
> > +.B EPERM
> > +This query is not allowed.
> > +.TP
> > +.B EOPNOTSUPP
> > +The filesystem does not support this command.
> > +
> > +.SH CONFORMING TO
> > +This API is Linux-specific.
> > +Not all filesystems support it.
> > +.fi
> > +.in
> > +.SH SEE ALSO
> > +.BR ioctl (2)
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
@ 2016-08-30 19:09       ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-08-30 19:09 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: linux-man, linux-api, xfs, linux-xfs, mtk.manpages,
	linux-fsdevel, linux-btrfs

[add a few more relevant lists to cc]

On Mon, Aug 29, 2016 at 03:34:11PM -0600, Andreas Dilger wrote:
> On Aug 25, 2016, at 5:26 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > 
> > Document the new XFS_IOC_GETFSMAP ioctl that returns the physical
> > layout of a (disk-based) filesystem.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > man2/ioctl_xfs_ioc_getfsmap.2 |  294 +++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 294 insertions(+)
> > create mode 100644 man2/ioctl_xfs_ioc_getfsmap.2
> > 
> > 
> > diff --git a/man2/ioctl_xfs_ioc_getfsmap.2 b/man2/ioctl_xfs_ioc_getfsmap.2
> > new file mode 100644
> > index 0000000..0d9ed47
> > --- /dev/null
> > +++ b/man2/ioctl_xfs_ioc_getfsmap.2
> > @@ -0,0 +1,294 @@
> > +.\" Copyright (c) 2016, Oracle.  All rights reserved.
> > +.\"
> > +.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
> > +.\" This is free documentation; you can redistribute it and/or
> > +.\" modify it under the terms of the GNU General Public License as
> > +.\" published by the Free Software Foundation; either version 2 of
> > +.\" the License, or (at your option) any later version.
> > +.\"
> > +.\" The GNU General Public License's references to "object code"
> > +.\" and "executables" are to be interpreted as the output of any
> > +.\" document formatting or typesetting system, including
> > +.\" intermediate and printed output.
> > +.\"
> > +.\" This manual is distributed in the hope that it will be useful,
> > +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +.\" GNU General Public License for more details.
> > +.\"
> > +.\" You should have received a copy of the GNU General Public
> > +.\" License along with this manual; if not, see
> > +.\" <http://www.gnu.org/licenses/>.
> > +.\" %%%LICENSE_END
> > +.TH IOCTL-XFS_IOC_GETFSMAP 2 2016-07-20 "Linux" "Linux Programmer's Manual"
> > +.SH NAME
> > +ioctl_xfs_ioc_getfsmap \- retrieve the physical layout of the filesystem
> > +.SH SYNOPSIS
> > +.br
> > +.B #include <sys/ioctl.h>
> > +.br
> > +.B #include <linux/fs.h>
> > +.sp
> > +.BI "int ioctl(int " fd ", XFS_IOC_GETFSMAP, struct getfsmap * " arg );
> > +.SH DESCRIPTION
> > +This
> > +.BR ioctl (2)
> > +retrieves physical extent mappings for a filesystem.
> > +This information can be used to discover which files are mapped to a physical
> > +block, examine free space, or find known bad blocks, among other things.
> > +
> > +The sole argument to this ioctl should be an array of the following
> > +structure:
> > +.in +4n
> > +.nf
> > +
> > +struct getfsmap {
> > +	__u32		fmv_device;	/* device id */
> > +	__u32		fmv_unused1;	/* future use, must be zero */
> > +	__u64		fmv_block;	/* starting block */
> > +	__u64		fmv_owner;	/* owner id */
> > +	__u64		fmv_offset;	/* file offset of segment */
> > +	__u64		fmv_length;	/* length of segment, blocks */
> > +	__u32		fmv_oflags;	/* mapping flags */
> > +	__u32		fmv_iflags;	/* control flags (1st structure) */
> > +	__u32		fmv_count;	/* # of entries in array incl. input */
> > +	__u32		fmv_entries;	/* # of entries filled in (output). */
> > +	__u64		fmv_unused2;	/* future use, must be zero */
> > +};
> > +
> > +.fi
> > +.in
> > +The array must contain at least two elements.
> > +The first two array elements specify the lowest and highest reverse-mapping
> > +keys, respectively, for which userspace would like physical mapping
> > +information.
> > +A reverse mapping key consists of the tuple (device, block, owner, offset).
> > +The owner and offset fields are part of the key because some filesystems
> > +support sharing physical blocks between multiple files and
> > +therefore may return multiple mappings for a given physical block.
> > +
> > +.SS Fields of struct getfsmap
> > +.PP
> > +The
> > +.I fmv_device
> > +field contains a 32-bit cookie to uniquely identify the underlying storage
> > +device.
> > +If the
> > +.B FMV_HOF_DEV_T
> > +flag is set in the header's
> > +.I fmv_oflags
> > +field, this field contains a dev_t from which major and minor numbers can
> > +be extracted.
> > +If the flag is not set, this field contains a value that must be unique
> > +for each storage device.
> > +
> > +.PP
> > +The
> > +.I fmv_unused1
> > +field must be zero in the first two array elements.
> > +
> > +.PP
> > +The
> > +.I fmv_block
> > +field contains the 512-byte sector address of the extent.
> 
> Why would you use 512-byte sectors in a new interface?

I started designing XFS GETFSMAP with the intent of making it feel
familiar to anyone who'd already used the XFS GETBMAP interface.
Hence you pass in an array of struct getfsmap[N] where the start of
the array are key fields and the rest are filled out by the kernel,
and the units are 512-byte blocks.  As a result, some things (special
owners in particular) are strongly influenced by XFS.

Ofc then LSF happened and the btrfs developers expressed a desire to
have a similar call, so now it's out for review on fsdevel.  Now
there's a question of whether or not we can create a generic enough
interface to fit the major filesystems so as not to expose a bunch of
balkanized fsmap ioctls to userspace.

I also haven't heard much from the btrfs list in previous review cycles.

(I say that more in reference to the 'special owners' below than any
other part of GETFSMAP.)

> I recall for FIEMAP that some filesystems may not have files aligned
> to sector offsets, and we just used byte offsets.  Storage like
> NVDIMMs are cacheline granular, so I don't think it makes sense to
> tie this to old disk sector sizes.  Alternately, the units could be
> in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> units for fmv_block, fmv_offset, fmv_length is uneeded complexity.

Ugh.  I'd rather just change the units to bytes rather than force all
the users to multiply things. :)

> > +
> > +.PP
> > +The
> > +.I fmv_owner
> > +field contains the owner of the extent.
> > +This is generally an inode number, though if
> > +.B FMV_OF_SPECIAL_OWNER
> > +is set in the
> > +.I fmv_oflags
> > +field, then the owner value is one of the following special values:
> > +.TP
> > +.B FMV_OWN_FREE
> > +Free space.
> > +.TP
> > +.B FMV_OWN_UNKNOWN
> > +This extent has an unknown owner.
> > +.TP
> > +.B FMV_OWN_FS
> > +Static filesystem metadata.

"Static filesystem metadata.  This information must exist at this disk
address; on XFS, this is the AG superblock, AGF, AGI, and AGFL
headers."

> > +.TP
> > +.B FMV_OWN_LOG
> > +The filesystem journal.
> > +.TP
> > +.B FMV_OWN_AG
> > +Allocation group metadata.

"Allocation group metadata.  On XFS these are the free space btrees
and the reverse mapping btree."

> > +.TP
> > +.B FMV_OWN_INODES
> > +Inodes.
> > +.TP
> > +.B FMV_OWN_DEFECTIVE:
> > +This extent has been marked defective either by the filesystem or the
> > +underlying device.
> 
> These above ones are relatively clear what they are.  The next items
> are not very clear what they are,

These all are very XFS-specific special owner codes; most of them
correspond directly to the special owners in the XFS reverse-mapping
structure.

OWN_FS = AG superblock
OWN_AG = free space and rmap btrees
OWN_INODES = inode records
OWN_INOBT = inode btree pointing to inode record blocks
OWN_REFC = reference count btree
OWN_COW = extent being used for a copy-on-write
OWN_LOG = internal log

For ext4, we could probably reuse the owner codes:

OWN_FS = superblock + group descriptors
OWN_AG = block/inode bitmaps
OWN_INODES = inode table
OWN_LOG = journal

Granted, we could also just smush everything into OWN_METADATA such
that the only special owners would be FREE, METADATA, COW, and
DEFECTIVE.  I don't like that because now the kernel decides to throw
away information that userspace might be able to use, because I prefer
more expressive APIs.  Though I do see the counter-argument that
userspace should not have direct access to metadata and therefore
needn't know more than it's metadata.

I'd much rather just add more special owner codes for any other
filesystem that has distinguishable metadata types that are not
covered by the existing OWN_ codes.  We /do/ have 2^64 possible
values, so it's not like we're going to run out.

> and whether they need to be exported as specific items, or could
> they just be lumped under "FMV_OWN_FS"?  If they serve some specific
> purpose, at a minimum they need better descriptions.
> 
> > +.TP
> > +.B FMV_OWN_INOBT
> > +The inode index, if one is provided.

"Inode indexing information.  On XFS this is the inode btree and free
inode btree." ?

> > +.TP
> > +.B FMV_OWN_REFC
> > +Reference counting indexes.

"Reference count information.  On XFS this is the refcount btree." ?

> > +.TP
> > +.B FMV_OWN_COW
> > +This extent is being used to stage a copy-on-write.

I'm not sure if you found this description to be lacking; I think it's
fine.

> > +
> > +.PP
> > +The
> > +.I fmv_offset
> > +field contains the logical address of the reverse mapping record, in units
> > +of 512-byte blocks.
> > +This field has no meaning if the
> > +.BR FMV_OF_SPECIAL_OWNER " or " FMV_OF_EXTENT_MAP
> > +flags are set in
> > +.IR fmv_oflags "."
> > +
> > +.PP
> > +The
> > +.I fmv_length
> > +field contains the length of the extent, in units of 512-byte blocks.
> > +This field must be zero in the second array element.
> > +
> > +.PP
> > +The
> > +.I fmv_oflags
> > +field is a bitmask of extent state flags.
> > +In the header, the bits are:
> > +.TP
> > +.B FMV_HOF_DEV_T
> > +All
> > +.I fmv_device
> > +values will be in dev_t format.
> > +If this flag is not set, the value is merely a 32-bit cookie that will be
> > +unique for each physical device.
> > +.TP
> > +In a non-header, the bits are:
> > +.TP
> > +.B FMV_OF_PREALLOC
> > +The extent is allocated but not yet written.
> > +.TP
> > +.B FMV_OF_ATTR_FORK
> > +This extent contains extended attribute data.
> > +.TP
> > +.B FMV_OF_EXTENT_MAP
> > +This extent contains extent map information for the owner.
> > +.TP
> > +.B FMV_OF_SHARED
> > +Parts of this extent may be shared.
> > +.TP
> > +.B FMV_OF_SPECIAL_OWNER
> > +The
> > +.I fmv_owner
> > +field contains a special value instead of an inode number.
> > +.TP
> > +.B FMV_OF_LAST
> > +This is the last record in the filesystem.
> > +
> > +.PP
> > +The
> > +.I fmv_iflags
> > +field is a bitmask passed to the kernel to alter the output.
> > +There are no flags defined, so this value must be zero in the first
> > +two array elements.
> 
> It seems like there are several fields in the structure that are used for
> only input or only output?  Does it make more sense to have one structure
> used only for the input request, and then the array of values returned be
> in a different structure?  I'm not necessarily requesting that it be changed,
> but it definitely is something I noticed a few times while reading this doc.

I've been thinking about rearranging this a bit, since the flags
handling is very awkward with the current array structure.  Each
rmap has its own flags; we may someday want to pass operation flags
into the ioctl; and we currently have one operation flag to pass back
to userspace.  Each of those flags can be a separate field.  I think
people will get confused about FMV_OF_* and FMV_HOF_* being referenced
in oflags, and iflags has no meaning for returned records.

So, this instead?

struct getfsmap_rec {
	u32 device;		/* device id */
	u32 flags;		/* mapping flags */
	u64 block;		/* physical addr, bytes */
	u64 owner;		/* inode or special owner code */
	u64 offset;		/* file offset of mapping, bytes */
	u64 length;		/* length of segment, bytes */
	u64 reserved;		/* will be set to zero */
}; /* 48 bytes */

struct getfsmap_head {
	u32 iflags;		/* none defined yet */
	u32 oflags;		/* FMV_HOF_DEV_T */
	u32 count;		/* # entries in recs array */
	u32 entries;		/* # entries filled in (output) */
	u64 reserved[2]; 	/* must be zero */

	struct getfsmap_rec keys[2]; /* low and high keys for the mapping search */
	struct getfsmap_rec recs[0];
}; /* 32 bytes + 2*48 = 128 bytes */

#define XFS_IOC_GETFSMAP	_IOWR('X', 59, struct getfsmap_head)

This also means that userspace can set up for the next ioctl
invocation with memcpy(&head->keys[0], &head->recs[head->entries - 1]).

Yes, I think I like this better.  Everyone else, please chime in. :)

--D

> Cheers, Andreas
> 
> > +.PP
> > +The
> > +.I fmv_count
> > +field contains the number of elements in the array being passed to the
> > +kernel.
> > +This count must include the two control elements at the start of the
> > +array.
> > +The value must be specified in the first array element; in the second
> > +element this field must be zero.
> > +
> > +If this value is 2,
> > +.I fmv_entries
> > +will be set to the number of records that would have been returned had
> > +the array been large enough;
> > +no extent information will be returned.
> > +
> > +.PP
> > +The
> > +.I fmv_entries
> > +field contains the number of elements in the array that contain useful
> > +information if the ioctl returns a non-error value.
> > +This value does not include the two control elements at the start of the array.
> > +This value is only set in the first array element;
> > +in the second element, this field must be zero.
> > +
> > +.PP
> > +The
> > +.I fmv_unused2
> > +field must be zero in the first two array elements.
> > +
> > +.SS Array Elements
> > +.PP
> > +The key fields (fmv_device, fmv_block, fmv_owner, fmv_offset) of the first
> > +element of the array specify the lowest extent record in the keyspace that
> > +the caller wants returned.
> > +For example, if the key is set to (0, 36, 0, 0), the filesystem will
> > +only return records for extents starting at or above sector 36 on
> > +disk.
> > +For convenience, the
> > +.I fmv_length
> > +field will be added to the
> > +.IR fmv_block " and " fmv_offset
> > +fields as appropriate so that the (fmv_device, fmv_block, fmv_owner,
> > +fmv_offset, fmv_length) fields in the last array element can be copied
> > +into the first element to seed the next ioctl call.
> > +
> > +The key fields of the second element of the array specify the highest
> > +extent record in the keyspace that the caller wants returned.
> > +Returning to our example above, if that example key were instead
> > +passed in via the second array element, the filesystem will not return
> > +records for extents going past sector 36 on disk.
> > +For convenience, the four key fields can be set to ~0 (all ones) to
> > +signify "end of filesystem".
> > +
> > +If
> > +.I fmv_count
> > +in the first element of the array is 2, then
> > +.I fmv_entries
> > +in the first element of the array will be set to the number of extent
> > +records found in the filesystem.
> > +Otherwise,
> > +.I fmv_entries
> > +will be set to the number of extents actually returned, and the subsequent
> > +array elements will be filled out with extent information.
> > +In these
> > +subsequent array elements, the fields
> > +.IR fmv_iflags ", " fmv_count ", " fmv_entries ", and " fmv_unused1
> > +will be set to zero by the filesystem.
> > +
> > +.SH RETURN VALUE
> > +On error, \-1 is returned, and
> > +.I errno
> > +is set to indicate the error.
> > +.PP
> > +.SH ERRORS
> > +Error codes can be one of, but are not limited to, the following:
> > +.TP
> > +.B EINVAL
> > +The array is not long enough, or a non-zero value was passed in one of the
> > +fields that must be zero.
> > +.TP
> > +.B EFAULT
> > +The pointer passed in was not mapped to a valid memory address.
> > +.TP
> > +.B EBADF
> > +.IR fd
> > +is not open for reading.
> > +.TP
> > +.B EPERM
> > +This query is not allowed.
> > +.TP
> > +.B EOPNOTSUPP
> > +The filesystem does not support this command.
> > +
> > +.SH CONFORMING TO
> > +This API is Linux-specific.
> > +Not all filesystems support it.
> > +.fi
> > +.in
> > +.SH SEE ALSO
> > +.BR ioctl (2)
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
@ 2016-08-30 19:09       ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-08-30 19:09 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA, xfs,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA

[add a few more relevant lists to cc]

On Mon, Aug 29, 2016 at 03:34:11PM -0600, Andreas Dilger wrote:
> On Aug 25, 2016, at 5:26 PM, Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> > 
> > Document the new XFS_IOC_GETFSMAP ioctl that returns the physical
> > layout of a (disk-based) filesystem.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > ---
> > man2/ioctl_xfs_ioc_getfsmap.2 |  294 +++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 294 insertions(+)
> > create mode 100644 man2/ioctl_xfs_ioc_getfsmap.2
> > 
> > 
> > diff --git a/man2/ioctl_xfs_ioc_getfsmap.2 b/man2/ioctl_xfs_ioc_getfsmap.2
> > new file mode 100644
> > index 0000000..0d9ed47
> > --- /dev/null
> > +++ b/man2/ioctl_xfs_ioc_getfsmap.2
> > @@ -0,0 +1,294 @@
> > +.\" Copyright (c) 2016, Oracle.  All rights reserved.
> > +.\"
> > +.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
> > +.\" This is free documentation; you can redistribute it and/or
> > +.\" modify it under the terms of the GNU General Public License as
> > +.\" published by the Free Software Foundation; either version 2 of
> > +.\" the License, or (at your option) any later version.
> > +.\"
> > +.\" The GNU General Public License's references to "object code"
> > +.\" and "executables" are to be interpreted as the output of any
> > +.\" document formatting or typesetting system, including
> > +.\" intermediate and printed output.
> > +.\"
> > +.\" This manual is distributed in the hope that it will be useful,
> > +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +.\" GNU General Public License for more details.
> > +.\"
> > +.\" You should have received a copy of the GNU General Public
> > +.\" License along with this manual; if not, see
> > +.\" <http://www.gnu.org/licenses/>.
> > +.\" %%%LICENSE_END
> > +.TH IOCTL-XFS_IOC_GETFSMAP 2 2016-07-20 "Linux" "Linux Programmer's Manual"
> > +.SH NAME
> > +ioctl_xfs_ioc_getfsmap \- retrieve the physical layout of the filesystem
> > +.SH SYNOPSIS
> > +.br
> > +.B #include <sys/ioctl.h>
> > +.br
> > +.B #include <linux/fs.h>
> > +.sp
> > +.BI "int ioctl(int " fd ", XFS_IOC_GETFSMAP, struct getfsmap * " arg );
> > +.SH DESCRIPTION
> > +This
> > +.BR ioctl (2)
> > +retrieves physical extent mappings for a filesystem.
> > +This information can be used to discover which files are mapped to a physical
> > +block, examine free space, or find known bad blocks, among other things.
> > +
> > +The sole argument to this ioctl should be an array of the following
> > +structure:
> > +.in +4n
> > +.nf
> > +
> > +struct getfsmap {
> > +	__u32		fmv_device;	/* device id */
> > +	__u32		fmv_unused1;	/* future use, must be zero */
> > +	__u64		fmv_block;	/* starting block */
> > +	__u64		fmv_owner;	/* owner id */
> > +	__u64		fmv_offset;	/* file offset of segment */
> > +	__u64		fmv_length;	/* length of segment, blocks */
> > +	__u32		fmv_oflags;	/* mapping flags */
> > +	__u32		fmv_iflags;	/* control flags (1st structure) */
> > +	__u32		fmv_count;	/* # of entries in array incl. input */
> > +	__u32		fmv_entries;	/* # of entries filled in (output). */
> > +	__u64		fmv_unused2;	/* future use, must be zero */
> > +};
> > +
> > +.fi
> > +.in
> > +The array must contain at least two elements.
> > +The first two array elements specify the lowest and highest reverse-mapping
> > +keys, respectively, for which userspace would like physical mapping
> > +information.
> > +A reverse mapping key consists of the tuple (device, block, owner, offset).
> > +The owner and offset fields are part of the key because some filesystems
> > +support sharing physical blocks between multiple files and
> > +therefore may return multiple mappings for a given physical block.
> > +
> > +.SS Fields of struct getfsmap
> > +.PP
> > +The
> > +.I fmv_device
> > +field contains a 32-bit cookie to uniquely identify the underlying storage
> > +device.
> > +If the
> > +.B FMV_HOF_DEV_T
> > +flag is set in the header's
> > +.I fmv_oflags
> > +field, this field contains a dev_t from which major and minor numbers can
> > +be extracted.
> > +If the flag is not set, this field contains a value that must be unique
> > +for each storage device.
> > +
> > +.PP
> > +The
> > +.I fmv_unused1
> > +field must be zero in the first two array elements.
> > +
> > +.PP
> > +The
> > +.I fmv_block
> > +field contains the 512-byte sector address of the extent.
> 
> Why would you use 512-byte sectors in a new interface?

I started designing XFS GETFSMAP with the intent of making it feel
familiar to anyone who'd already used the XFS GETBMAP interface.
Hence you pass in an array of struct getfsmap[N] where the start of
the array are key fields and the rest are filled out by the kernel,
and the units are 512-byte blocks.  As a result, some things (special
owners in particular) are strongly influenced by XFS.

Ofc then LSF happened and the btrfs developers expressed a desire to
have a similar call, so now it's out for review on fsdevel.  Now
there's a question of whether or not we can create a generic enough
interface to fit the major filesystems so as not to expose a bunch of
balkanized fsmap ioctls to userspace.

I also haven't heard much from the btrfs list in previous review cycles.

(I say that more in reference to the 'special owners' below than any
other part of GETFSMAP.)

> I recall for FIEMAP that some filesystems may not have files aligned
> to sector offsets, and we just used byte offsets.  Storage like
> NVDIMMs are cacheline granular, so I don't think it makes sense to
> tie this to old disk sector sizes.  Alternately, the units could be
> in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> units for fmv_block, fmv_offset, fmv_length is uneeded complexity.

Ugh.  I'd rather just change the units to bytes rather than force all
the users to multiply things. :)

> > +
> > +.PP
> > +The
> > +.I fmv_owner
> > +field contains the owner of the extent.
> > +This is generally an inode number, though if
> > +.B FMV_OF_SPECIAL_OWNER
> > +is set in the
> > +.I fmv_oflags
> > +field, then the owner value is one of the following special values:
> > +.TP
> > +.B FMV_OWN_FREE
> > +Free space.
> > +.TP
> > +.B FMV_OWN_UNKNOWN
> > +This extent has an unknown owner.
> > +.TP
> > +.B FMV_OWN_FS
> > +Static filesystem metadata.

"Static filesystem metadata.  This information must exist at this disk
address; on XFS, this is the AG superblock, AGF, AGI, and AGFL
headers."

> > +.TP
> > +.B FMV_OWN_LOG
> > +The filesystem journal.
> > +.TP
> > +.B FMV_OWN_AG
> > +Allocation group metadata.

"Allocation group metadata.  On XFS these are the free space btrees
and the reverse mapping btree."

> > +.TP
> > +.B FMV_OWN_INODES
> > +Inodes.
> > +.TP
> > +.B FMV_OWN_DEFECTIVE:
> > +This extent has been marked defective either by the filesystem or the
> > +underlying device.
> 
> These above ones are relatively clear what they are.  The next items
> are not very clear what they are,

These all are very XFS-specific special owner codes; most of them
correspond directly to the special owners in the XFS reverse-mapping
structure.

OWN_FS = AG superblock
OWN_AG = free space and rmap btrees
OWN_INODES = inode records
OWN_INOBT = inode btree pointing to inode record blocks
OWN_REFC = reference count btree
OWN_COW = extent being used for a copy-on-write
OWN_LOG = internal log

For ext4, we could probably reuse the owner codes:

OWN_FS = superblock + group descriptors
OWN_AG = block/inode bitmaps
OWN_INODES = inode table
OWN_LOG = journal

Granted, we could also just smush everything into OWN_METADATA such
that the only special owners would be FREE, METADATA, COW, and
DEFECTIVE.  I don't like that because now the kernel decides to throw
away information that userspace might be able to use, because I prefer
more expressive APIs.  Though I do see the counter-argument that
userspace should not have direct access to metadata and therefore
needn't know more than it's metadata.

I'd much rather just add more special owner codes for any other
filesystem that has distinguishable metadata types that are not
covered by the existing OWN_ codes.  We /do/ have 2^64 possible
values, so it's not like we're going to run out.

> and whether they need to be exported as specific items, or could
> they just be lumped under "FMV_OWN_FS"?  If they serve some specific
> purpose, at a minimum they need better descriptions.
> 
> > +.TP
> > +.B FMV_OWN_INOBT
> > +The inode index, if one is provided.

"Inode indexing information.  On XFS this is the inode btree and free
inode btree." ?

> > +.TP
> > +.B FMV_OWN_REFC
> > +Reference counting indexes.

"Reference count information.  On XFS this is the refcount btree." ?

> > +.TP
> > +.B FMV_OWN_COW
> > +This extent is being used to stage a copy-on-write.

I'm not sure if you found this description to be lacking; I think it's
fine.

> > +
> > +.PP
> > +The
> > +.I fmv_offset
> > +field contains the logical address of the reverse mapping record, in units
> > +of 512-byte blocks.
> > +This field has no meaning if the
> > +.BR FMV_OF_SPECIAL_OWNER " or " FMV_OF_EXTENT_MAP
> > +flags are set in
> > +.IR fmv_oflags "."
> > +
> > +.PP
> > +The
> > +.I fmv_length
> > +field contains the length of the extent, in units of 512-byte blocks.
> > +This field must be zero in the second array element.
> > +
> > +.PP
> > +The
> > +.I fmv_oflags
> > +field is a bitmask of extent state flags.
> > +In the header, the bits are:
> > +.TP
> > +.B FMV_HOF_DEV_T
> > +All
> > +.I fmv_device
> > +values will be in dev_t format.
> > +If this flag is not set, the value is merely a 32-bit cookie that will be
> > +unique for each physical device.
> > +.TP
> > +In a non-header, the bits are:
> > +.TP
> > +.B FMV_OF_PREALLOC
> > +The extent is allocated but not yet written.
> > +.TP
> > +.B FMV_OF_ATTR_FORK
> > +This extent contains extended attribute data.
> > +.TP
> > +.B FMV_OF_EXTENT_MAP
> > +This extent contains extent map information for the owner.
> > +.TP
> > +.B FMV_OF_SHARED
> > +Parts of this extent may be shared.
> > +.TP
> > +.B FMV_OF_SPECIAL_OWNER
> > +The
> > +.I fmv_owner
> > +field contains a special value instead of an inode number.
> > +.TP
> > +.B FMV_OF_LAST
> > +This is the last record in the filesystem.
> > +
> > +.PP
> > +The
> > +.I fmv_iflags
> > +field is a bitmask passed to the kernel to alter the output.
> > +There are no flags defined, so this value must be zero in the first
> > +two array elements.
> 
> It seems like there are several fields in the structure that are used for
> only input or only output?  Does it make more sense to have one structure
> used only for the input request, and then the array of values returned be
> in a different structure?  I'm not necessarily requesting that it be changed,
> but it definitely is something I noticed a few times while reading this doc.

I've been thinking about rearranging this a bit, since the flags
handling is very awkward with the current array structure.  Each
rmap has its own flags; we may someday want to pass operation flags
into the ioctl; and we currently have one operation flag to pass back
to userspace.  Each of those flags can be a separate field.  I think
people will get confused about FMV_OF_* and FMV_HOF_* being referenced
in oflags, and iflags has no meaning for returned records.

So, this instead?

struct getfsmap_rec {
	u32 device;		/* device id */
	u32 flags;		/* mapping flags */
	u64 block;		/* physical addr, bytes */
	u64 owner;		/* inode or special owner code */
	u64 offset;		/* file offset of mapping, bytes */
	u64 length;		/* length of segment, bytes */
	u64 reserved;		/* will be set to zero */
}; /* 48 bytes */

struct getfsmap_head {
	u32 iflags;		/* none defined yet */
	u32 oflags;		/* FMV_HOF_DEV_T */
	u32 count;		/* # entries in recs array */
	u32 entries;		/* # entries filled in (output) */
	u64 reserved[2]; 	/* must be zero */

	struct getfsmap_rec keys[2]; /* low and high keys for the mapping search */
	struct getfsmap_rec recs[0];
}; /* 32 bytes + 2*48 = 128 bytes */

#define XFS_IOC_GETFSMAP	_IOWR('X', 59, struct getfsmap_head)

This also means that userspace can set up for the next ioctl
invocation with memcpy(&head->keys[0], &head->recs[head->entries - 1]).

Yes, I think I like this better.  Everyone else, please chime in. :)

--D

> Cheers, Andreas
> 
> > +.PP
> > +The
> > +.I fmv_count
> > +field contains the number of elements in the array being passed to the
> > +kernel.
> > +This count must include the two control elements at the start of the
> > +array.
> > +The value must be specified in the first array element; in the second
> > +element this field must be zero.
> > +
> > +If this value is 2,
> > +.I fmv_entries
> > +will be set to the number of records that would have been returned had
> > +the array been large enough;
> > +no extent information will be returned.
> > +
> > +.PP
> > +The
> > +.I fmv_entries
> > +field contains the number of elements in the array that contain useful
> > +information if the ioctl returns a non-error value.
> > +This value does not include the two control elements at the start of the array.
> > +This value is only set in the first array element;
> > +in the second element, this field must be zero.
> > +
> > +.PP
> > +The
> > +.I fmv_unused2
> > +field must be zero in the first two array elements.
> > +
> > +.SS Array Elements
> > +.PP
> > +The key fields (fmv_device, fmv_block, fmv_owner, fmv_offset) of the first
> > +element of the array specify the lowest extent record in the keyspace that
> > +the caller wants returned.
> > +For example, if the key is set to (0, 36, 0, 0), the filesystem will
> > +only return records for extents starting at or above sector 36 on
> > +disk.
> > +For convenience, the
> > +.I fmv_length
> > +field will be added to the
> > +.IR fmv_block " and " fmv_offset
> > +fields as appropriate so that the (fmv_device, fmv_block, fmv_owner,
> > +fmv_offset, fmv_length) fields in the last array element can be copied
> > +into the first element to seed the next ioctl call.
> > +
> > +The key fields of the second element of the array specify the highest
> > +extent record in the keyspace that the caller wants returned.
> > +Returning to our example above, if that example key were instead
> > +passed in via the second array element, the filesystem will not return
> > +records for extents going past sector 36 on disk.
> > +For convenience, the four key fields can be set to ~0 (all ones) to
> > +signify "end of filesystem".
> > +
> > +If
> > +.I fmv_count
> > +in the first element of the array is 2, then
> > +.I fmv_entries
> > +in the first element of the array will be set to the number of extent
> > +records found in the filesystem.
> > +Otherwise,
> > +.I fmv_entries
> > +will be set to the number of extents actually returned, and the subsequent
> > +array elements will be filled out with extent information.
> > +In these
> > +subsequent array elements, the fields
> > +.IR fmv_iflags ", " fmv_count ", " fmv_entries ", and " fmv_unused1
> > +will be set to zero by the filesystem.
> > +
> > +.SH RETURN VALUE
> > +On error, \-1 is returned, and
> > +.I errno
> > +is set to indicate the error.
> > +.PP
> > +.SH ERRORS
> > +Error codes can be one of, but are not limited to, the following:
> > +.TP
> > +.B EINVAL
> > +The array is not long enough, or a non-zero value was passed in one of the
> > +fields that must be zero.
> > +.TP
> > +.B EFAULT
> > +The pointer passed in was not mapped to a valid memory address.
> > +.TP
> > +.B EBADF
> > +.IR fd
> > +is not open for reading.
> > +.TP
> > +.B EPERM
> > +This query is not allowed.
> > +.TP
> > +.B EOPNOTSUPP
> > +The filesystem does not support this command.
> > +
> > +.SH CONFORMING TO
> > +This API is Linux-specific.
> > +Not all filesystems support it.
> > +.fi
> > +.in
> > +.SH SEE ALSO
> > +.BR ioctl (2)
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 3/3] ioctl_getfsmap.2: document the GETFSMAP ioctl
  2016-08-25 23:26   ` Darrick J. Wong
@ 2016-09-04  5:36     ` Darrick J. Wong
  -1 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-09-04  5:36 UTC (permalink / raw)
  To: mtk.manpages
  Cc: linux-fsdevel, linux-api, linux-man, Andreas Dilger, linux-btrfs,
	xfs, linux-xfs

Document the new GETFSMAP ioctl that returns the physical layout of a
(disk-based) filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man2/ioctl_getfsmap.2 |  313 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 313 insertions(+)
 create mode 100644 man2/ioctl_getfsmap.2

diff --git a/man2/ioctl_getfsmap.2 b/man2/ioctl_getfsmap.2
new file mode 100644
index 0000000..99de417
--- /dev/null
+++ b/man2/ioctl_getfsmap.2
@@ -0,0 +1,313 @@
+.\" Copyright (c) 2016, Oracle.  All rights reserved.
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" This is free documentation; you can redistribute it and/or
+.\" modify it under the terms of the GNU General Public License as
+.\" published by the Free Software Foundation; either version 2 of
+.\" the License, or (at your option) any later version.
+.\"
+.\" The GNU General Public License's references to "object code"
+.\" and "executables" are to be interpreted as the output of any
+.\" document formatting or typesetting system, including
+.\" intermediate and printed output.
+.\"
+.\" This manual is distributed in the hope that it will be useful,
+.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
+.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+.\" GNU General Public License for more details.
+.\"
+.\" You should have received a copy of the GNU General Public
+.\" License along with this manual; if not, see
+.\" <http://www.gnu.org/licenses/>.
+.\" %%%LICENSE_END
+.TH IOCTL-GETFSMAP 2 2016-09-03 "Linux" "Linux Programmer's Manual"
+.SH NAME
+ioctl_getfsmap \- retrieve the physical layout of the filesystem
+.SH SYNOPSIS
+.br
+.B #include <sys/ioctl.h>
+.br
+.B #include <linux/fs.h>
+.sp
+.BI "int ioctl(int " fd ", GETFSMAP, struct fsmap_head * " arg );
+.SH DESCRIPTION
+This
+.BR ioctl (2)
+retrieves physical extent mappings for a filesystem.
+This information can be used to discover which files are mapped to a physical
+block, examine free space, or find known bad blocks, among other things.
+
+The sole argument to this ioctl should be a pointer to a single
+.BR "struct fsmap_head" ":"
+.in +4n
+.nf
+
+struct fsmap {
+	__u32		fmr_device;	/* device id */
+	__u32		fmr_flags;	/* mapping flags */
+	__u64		fmr_physical;	/* device offset of segment */
+	__u64		fmr_owner;	/* owner id */
+	__u64		fmr_offset;	/* file offset of segment */
+	__u64		fmr_length;	/* length of segment */
+	__u64		fmr_reserved;	/* future use, must be zero */
+};
+
+struct fsmap_head {
+	__u32		fmh_iflags;	/* control flags */
+	__u32		fmh_oflags;	/* output flags */
+	__u32		fmh_count;	/* # of entries in array incl. input */
+	__u32		fmh_entries;	/* # of entries filled in (output). */
+	__u64		fmh_reserved[2];	/* must be zero */
+
+	struct fsmap	fmh_keys[2];	/* low and high keys for the mapping search */
+	struct fsmap	fmh_recs[0];	/* returned records */
+};
+
+.fi
+.in
+The two
+.I fmh_keys
+array elements specify the lowest and highest reverse-mapping
+keys, respectively, for which userspace would like physical mapping
+information.
+A reverse mapping key consists of the tuple (device, block, owner, offset).
+The owner and offset fields are part of the key because some filesystems
+support sharing physical blocks between multiple files and
+therefore may return multiple mappings for a given physical block.
+.PP
+Filesystem mappings are copied into the
+.I fmh_recs
+array, which immediately follows the header data.
+.SS Fields of struct fsmap_head
+.PP
+The
+.I fmh_iflags
+field is a bitmask passed to the kernel to alter the output.
+There are no flags defined, so this value must be zero.
+
+.PP
+The
+.I fmh_oflags
+field is a bitmask of flags that concern all output mappings.
+If
+.B FMH_OF_DEV_T
+is set, then the
+.I fmr_device
+field represents a
+.B dev_t
+structure containing the major and minor numbers of the block device.
+
+.PP
+The
+.I fmh_count
+field contains the number of elements in the array being passed to the
+kernel.
+If this value is 0,
+.I fmh_entries
+will be set to the number of records that would have been returned had
+the array been large enough;
+no mapping information will be returned.
+
+.PP
+The
+.I fmh_entries
+field contains the number of elements in the
+.I fmh_recs
+array that contain useful information.
+
+.PP
+The
+.I fmh_reserved
+fields must be set to zero.
+
+.SS Keys
+.PP
+The two key records in
+.B fsmap_head.fmh_keys
+specify the lowest and highest extent records in the keyspace that the caller
+wants returned.
+A filesystem that can share blocks between files likely requires the tuple
+.RI "(" "device" ", " "physical" ", " "owner" ", " "offset" ", " "flags" ")"
+to uniquely index any filesystem mapping record.
+Classic non-sharing filesystems might be able to identify any record with only
+.RI "(" "device" ", " "physical" ", " "flags" ")."
+For example, if the low key is set to (0, 36864, 0, 0, 0), the filesystem will
+only return records for extents starting at or above 36KiB on disk.
+If the high key is set to (0, 1048576, 0, 0, 0), only records below 1MiB will
+be returned.
+By convention, the field
+.B fsmap_head.fmh_keys[0]
+must contain the low key and
+.B fsmap_head.fmh_keys[1]
+must contain the high key for the request.
+.PP
+For convenience, if
+.B fmr_length
+is set in the low key, it will be added to
+.IR fmr_block " or " fmr_offset
+as appropriate.
+The caller can take advantage of this subtlety to set up subsequent calls
+by copying
+.B fsmap_head.fmh_recs[fsmap_head.fmh_entries - 1]
+into the low key.
+
+.SS Fields of struct fsmap
+.PP
+The
+.I fmr_device
+field contains a 32-bit cookie to uniquely identify the underlying storage
+device.
+If the
+.B FMH_OF_DEV_T
+flag is set in the header's
+.I fmh_oflags
+field, this field contains a
+.B dev_t
+from which major and minor numbers can be extracted.
+If the flag is not set, this field contains a value that must be unique
+for each unique storage device.
+
+.PP
+The
+.I fmr_physical
+field contains the disk address of the extent in bytes.
+
+.PP
+The
+.I fmr_owner
+field contains the owner of the extent.
+This is an inode number unless
+.B FMR_OF_SPECIAL_OWNER
+is set in the
+.I fmr_flags
+field, in which case the owner value is one of the following special values:
+.RS 0.4i
+.TP
+.B FMR_OWN_FREE
+Free space.
+.TP
+.B FMR_OWN_UNKNOWN
+This extent has an unknown owner.
+.TP
+.B FMR_OWN_FS
+Static filesystem metadata which exists at a fixed address.
+On XFS these are the AG superblock, AGF, AGFL, and AGI headers.
+.TP
+.B FMR_OWN_LOG
+The filesystem journal.
+.TP
+.B FMR_OWN_AG
+Allocation group metadata.
+On XFS these are the free space btrees or the reverse mapping btrees.
+.TP
+.B FMR_OWN_INOBT
+Inode indexing, if any are provided.
+On XFS these are the inode and free inode btrees.
+.TP
+.B FMR_OWN_INODES
+Inode records.
+.TP
+.B FMR_OWN_REFC
+Reference count information.
+On XFS this is the reference count btree.
+.TP
+.B FMR_OWN_COW
+This extent is being used to stage a copy-on-write.
+.TP
+.B FMR_OWN_DEFECTIVE:
+This extent has been marked defective either by the filesystem or the
+underlying device.
+.RE
+
+.PP
+The
+.I fmr_offset
+field contains the logical address in the mapping record in bytes.
+This field has no meaning if the
+.BR FMR_OF_SPECIAL_OWNER " or " FMR_OF_EXTENT_MAP
+flags are set in
+.IR fmr_flags "."
+
+.PP
+The
+.I fmr_length
+field contains the length of the extent in bytes.
+
+.PP
+The
+.I fmr_flags
+field is a bitmask of extent state flags.
+The bits are:
+.RS 0.4i
+.TP
+.B FMR_OF_PREALLOC
+The extent is allocated but not yet written.
+.TP
+.B FMR_OF_ATTR_FORK
+This extent contains extended attribute data.
+.TP
+.B FMR_OF_EXTENT_MAP
+This extent contains extent map information for the owner.
+.TP
+.B FMR_OF_SHARED
+Parts of this extent may be shared.
+.TP
+.B FMR_OF_SPECIAL_OWNER
+The
+.I fmr_owner
+field contains a special value instead of an inode number.
+.TP
+.B FMR_OF_LAST
+This is the last record in the filesystem.
+.RE
+
+.PP
+The
+.I fmr_reserved
+field will be set to zero.
+
+.SH RETURN VALUE
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.PP
+.SH ERRORS
+Error codes can be one of, but are not limited to, the following:
+.TP
+.B EINVAL
+The array is not long enough, or a non-zero value was passed in one of the
+fields that must be zero.
+.TP
+.B EFAULT
+The pointer passed in was not mapped to a valid memory address.
+.TP
+.B EBADF
+.IR fd
+is not open for reading.
+.TP
+.B EPERM
+This query is not allowed.
+.TP
+.B EOPNOTSUPP
+The filesystem does not support this command.
+.TP
+.B EUCLEAN
+The filesystem metadata is corrupt and needs repair.
+.TP
+.B EBADMSG
+The filesystem has detected a checksum error in the metadata.
+.TP
+.B ENOMEM
+Insufficient memory to process the request.
+
+.SH EXAMPLE
+.TP
+Please see io/fsmap.c in the xfsprogs distribution for a sample program.
+
+.SH CONFORMING TO
+This API is Linux-specific.
+Not all filesystems support it.
+.fi
+.in
+.SH SEE ALSO
+.BR ioctl (2)

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 3/3] ioctl_getfsmap.2: document the GETFSMAP ioctl
@ 2016-09-04  5:36     ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-09-04  5:36 UTC (permalink / raw)
  To: mtk.manpages
  Cc: linux-man, linux-api, xfs, linux-xfs, Andreas Dilger,
	linux-fsdevel, linux-btrfs

Document the new GETFSMAP ioctl that returns the physical layout of a
(disk-based) filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man2/ioctl_getfsmap.2 |  313 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 313 insertions(+)
 create mode 100644 man2/ioctl_getfsmap.2

diff --git a/man2/ioctl_getfsmap.2 b/man2/ioctl_getfsmap.2
new file mode 100644
index 0000000..99de417
--- /dev/null
+++ b/man2/ioctl_getfsmap.2
@@ -0,0 +1,313 @@
+.\" Copyright (c) 2016, Oracle.  All rights reserved.
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" This is free documentation; you can redistribute it and/or
+.\" modify it under the terms of the GNU General Public License as
+.\" published by the Free Software Foundation; either version 2 of
+.\" the License, or (at your option) any later version.
+.\"
+.\" The GNU General Public License's references to "object code"
+.\" and "executables" are to be interpreted as the output of any
+.\" document formatting or typesetting system, including
+.\" intermediate and printed output.
+.\"
+.\" This manual is distributed in the hope that it will be useful,
+.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
+.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+.\" GNU General Public License for more details.
+.\"
+.\" You should have received a copy of the GNU General Public
+.\" License along with this manual; if not, see
+.\" <http://www.gnu.org/licenses/>.
+.\" %%%LICENSE_END
+.TH IOCTL-GETFSMAP 2 2016-09-03 "Linux" "Linux Programmer's Manual"
+.SH NAME
+ioctl_getfsmap \- retrieve the physical layout of the filesystem
+.SH SYNOPSIS
+.br
+.B #include <sys/ioctl.h>
+.br
+.B #include <linux/fs.h>
+.sp
+.BI "int ioctl(int " fd ", GETFSMAP, struct fsmap_head * " arg );
+.SH DESCRIPTION
+This
+.BR ioctl (2)
+retrieves physical extent mappings for a filesystem.
+This information can be used to discover which files are mapped to a physical
+block, examine free space, or find known bad blocks, among other things.
+
+The sole argument to this ioctl should be a pointer to a single
+.BR "struct fsmap_head" ":"
+.in +4n
+.nf
+
+struct fsmap {
+	__u32		fmr_device;	/* device id */
+	__u32		fmr_flags;	/* mapping flags */
+	__u64		fmr_physical;	/* device offset of segment */
+	__u64		fmr_owner;	/* owner id */
+	__u64		fmr_offset;	/* file offset of segment */
+	__u64		fmr_length;	/* length of segment */
+	__u64		fmr_reserved;	/* future use, must be zero */
+};
+
+struct fsmap_head {
+	__u32		fmh_iflags;	/* control flags */
+	__u32		fmh_oflags;	/* output flags */
+	__u32		fmh_count;	/* # of entries in array incl. input */
+	__u32		fmh_entries;	/* # of entries filled in (output). */
+	__u64		fmh_reserved[2];	/* must be zero */
+
+	struct fsmap	fmh_keys[2];	/* low and high keys for the mapping search */
+	struct fsmap	fmh_recs[0];	/* returned records */
+};
+
+.fi
+.in
+The two
+.I fmh_keys
+array elements specify the lowest and highest reverse-mapping
+keys, respectively, for which userspace would like physical mapping
+information.
+A reverse mapping key consists of the tuple (device, block, owner, offset).
+The owner and offset fields are part of the key because some filesystems
+support sharing physical blocks between multiple files and
+therefore may return multiple mappings for a given physical block.
+.PP
+Filesystem mappings are copied into the
+.I fmh_recs
+array, which immediately follows the header data.
+.SS Fields of struct fsmap_head
+.PP
+The
+.I fmh_iflags
+field is a bitmask passed to the kernel to alter the output.
+There are no flags defined, so this value must be zero.
+
+.PP
+The
+.I fmh_oflags
+field is a bitmask of flags that concern all output mappings.
+If
+.B FMH_OF_DEV_T
+is set, then the
+.I fmr_device
+field represents a
+.B dev_t
+structure containing the major and minor numbers of the block device.
+
+.PP
+The
+.I fmh_count
+field contains the number of elements in the array being passed to the
+kernel.
+If this value is 0,
+.I fmh_entries
+will be set to the number of records that would have been returned had
+the array been large enough;
+no mapping information will be returned.
+
+.PP
+The
+.I fmh_entries
+field contains the number of elements in the
+.I fmh_recs
+array that contain useful information.
+
+.PP
+The
+.I fmh_reserved
+fields must be set to zero.
+
+.SS Keys
+.PP
+The two key records in
+.B fsmap_head.fmh_keys
+specify the lowest and highest extent records in the keyspace that the caller
+wants returned.
+A filesystem that can share blocks between files likely requires the tuple
+.RI "(" "device" ", " "physical" ", " "owner" ", " "offset" ", " "flags" ")"
+to uniquely index any filesystem mapping record.
+Classic non-sharing filesystems might be able to identify any record with only
+.RI "(" "device" ", " "physical" ", " "flags" ")."
+For example, if the low key is set to (0, 36864, 0, 0, 0), the filesystem will
+only return records for extents starting at or above 36KiB on disk.
+If the high key is set to (0, 1048576, 0, 0, 0), only records below 1MiB will
+be returned.
+By convention, the field
+.B fsmap_head.fmh_keys[0]
+must contain the low key and
+.B fsmap_head.fmh_keys[1]
+must contain the high key for the request.
+.PP
+For convenience, if
+.B fmr_length
+is set in the low key, it will be added to
+.IR fmr_block " or " fmr_offset
+as appropriate.
+The caller can take advantage of this subtlety to set up subsequent calls
+by copying
+.B fsmap_head.fmh_recs[fsmap_head.fmh_entries - 1]
+into the low key.
+
+.SS Fields of struct fsmap
+.PP
+The
+.I fmr_device
+field contains a 32-bit cookie to uniquely identify the underlying storage
+device.
+If the
+.B FMH_OF_DEV_T
+flag is set in the header's
+.I fmh_oflags
+field, this field contains a
+.B dev_t
+from which major and minor numbers can be extracted.
+If the flag is not set, this field contains a value that must be unique
+for each unique storage device.
+
+.PP
+The
+.I fmr_physical
+field contains the disk address of the extent in bytes.
+
+.PP
+The
+.I fmr_owner
+field contains the owner of the extent.
+This is an inode number unless
+.B FMR_OF_SPECIAL_OWNER
+is set in the
+.I fmr_flags
+field, in which case the owner value is one of the following special values:
+.RS 0.4i
+.TP
+.B FMR_OWN_FREE
+Free space.
+.TP
+.B FMR_OWN_UNKNOWN
+This extent has an unknown owner.
+.TP
+.B FMR_OWN_FS
+Static filesystem metadata which exists at a fixed address.
+On XFS these are the AG superblock, AGF, AGFL, and AGI headers.
+.TP
+.B FMR_OWN_LOG
+The filesystem journal.
+.TP
+.B FMR_OWN_AG
+Allocation group metadata.
+On XFS these are the free space btrees or the reverse mapping btrees.
+.TP
+.B FMR_OWN_INOBT
+Inode indexing, if any are provided.
+On XFS these are the inode and free inode btrees.
+.TP
+.B FMR_OWN_INODES
+Inode records.
+.TP
+.B FMR_OWN_REFC
+Reference count information.
+On XFS this is the reference count btree.
+.TP
+.B FMR_OWN_COW
+This extent is being used to stage a copy-on-write.
+.TP
+.B FMR_OWN_DEFECTIVE:
+This extent has been marked defective either by the filesystem or the
+underlying device.
+.RE
+
+.PP
+The
+.I fmr_offset
+field contains the logical address in the mapping record in bytes.
+This field has no meaning if the
+.BR FMR_OF_SPECIAL_OWNER " or " FMR_OF_EXTENT_MAP
+flags are set in
+.IR fmr_flags "."
+
+.PP
+The
+.I fmr_length
+field contains the length of the extent in bytes.
+
+.PP
+The
+.I fmr_flags
+field is a bitmask of extent state flags.
+The bits are:
+.RS 0.4i
+.TP
+.B FMR_OF_PREALLOC
+The extent is allocated but not yet written.
+.TP
+.B FMR_OF_ATTR_FORK
+This extent contains extended attribute data.
+.TP
+.B FMR_OF_EXTENT_MAP
+This extent contains extent map information for the owner.
+.TP
+.B FMR_OF_SHARED
+Parts of this extent may be shared.
+.TP
+.B FMR_OF_SPECIAL_OWNER
+The
+.I fmr_owner
+field contains a special value instead of an inode number.
+.TP
+.B FMR_OF_LAST
+This is the last record in the filesystem.
+.RE
+
+.PP
+The
+.I fmr_reserved
+field will be set to zero.
+
+.SH RETURN VALUE
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.PP
+.SH ERRORS
+Error codes can be one of, but are not limited to, the following:
+.TP
+.B EINVAL
+The array is not long enough, or a non-zero value was passed in one of the
+fields that must be zero.
+.TP
+.B EFAULT
+The pointer passed in was not mapped to a valid memory address.
+.TP
+.B EBADF
+.IR fd
+is not open for reading.
+.TP
+.B EPERM
+This query is not allowed.
+.TP
+.B EOPNOTSUPP
+The filesystem does not support this command.
+.TP
+.B EUCLEAN
+The filesystem metadata is corrupt and needs repair.
+.TP
+.B EBADMSG
+The filesystem has detected a checksum error in the metadata.
+.TP
+.B ENOMEM
+Insufficient memory to process the request.
+
+.SH EXAMPLE
+.TP
+Please see io/fsmap.c in the xfsprogs distribution for a sample program.
+
+.SH CONFORMING TO
+This API is Linux-specific.
+Not all filesystems support it.
+.fi
+.in
+.SH SEE ALSO
+.BR ioctl (2)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
  2016-08-30 19:09       ` Darrick J. Wong
  (?)
@ 2016-09-08 23:38         ` Dave Chinner
  -1 siblings, 0 replies; 26+ messages in thread
From: Dave Chinner @ 2016-09-08 23:38 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andreas Dilger, linux-man, linux-api, xfs, linux-xfs,
	mtk.manpages, linux-fsdevel, linux-btrfs

On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > I recall for FIEMAP that some filesystems may not have files aligned
> > to sector offsets, and we just used byte offsets.  Storage like
> > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > tie this to old disk sector sizes.  Alternately, the units could be
> > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> 
> Ugh.  I'd rather just change the units to bytes rather than force all
> the users to multiply things. :)

Yup, units need to be either in disk addresses (i.e. 512 byte units)
or bytes. If people can't handle disk addresses (seems to be the
case), the bytes it should be.

> I'd much rather just add more special owner codes for any other
> filesystem that has distinguishable metadata types that are not
> covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> values, so it's not like we're going to run out.

This is diagnositc information as much as anything, just like
fiemap is diagnostic information. So if we have specific type
information, it needs to be reported accurately to be useful.

Hence I really don't care if the users and developers of other fs
types don't understand what the special owner codes that a specific
filesystem returns mean. i.e. it's not useful user information -
only a tool that groks the specific filesystem is going to be able
to anything useful with special owner codes. So, IMO, there's little
point trying to make them generic or to even trying to define and
explain them in the man page....

> > It seems like there are several fields in the structure that are used for
> > only input or only output?  Does it make more sense to have one structure
> > used only for the input request, and then the array of values returned be
> > in a different structure?  I'm not necessarily requesting that it be changed,
> > but it definitely is something I noticed a few times while reading this doc.
> 
> I've been thinking about rearranging this a bit, since the flags
> handling is very awkward with the current array structure.  Each
> rmap has its own flags; we may someday want to pass operation flags
> into the ioctl; and we currently have one operation flag to pass back
> to userspace.  Each of those flags can be a separate field.  I think
> people will get confused about FMV_OF_* and FMV_HOF_* being referenced
> in oflags, and iflags has no meaning for returned records.

Yup, that's what I initially noticed when I glanced at this. The XFS
getbmap interface is just plain nasty, and we shouldn't be copying
that API pattern if we can help it.

> So, this instead?
> 
> struct getfsmap_rec {
> 	u32 device;		/* device id */
> 	u32 flags;		/* mapping flags */
> 	u64 block;		/* physical addr, bytes */
> 	u64 owner;		/* inode or special owner code */
> 	u64 offset;		/* file offset of mapping, bytes */
> 	u64 length;		/* length of segment, bytes */
> 	u64 reserved;		/* will be set to zero */
> }; /* 48 bytes */
> 
> struct getfsmap_head {
> 	u32 iflags;		/* none defined yet */
> 	u32 oflags;		/* FMV_HOF_DEV_T */
> 	u32 count;		/* # entries in recs array */
> 	u32 entries;		/* # entries filled in (output) */
> 	u64 reserved[2]; 	/* must be zero */
> 
> 	struct getfsmap_rec keys[2]; /* low and high keys for the mapping search */
> 	struct getfsmap_rec recs[0];
> }; /* 32 bytes + 2*48 = 128 bytes */
> 
> #define XFS_IOC_GETFSMAP	_IOWR('X', 59, struct getfsmap_head)
> 
> This also means that userspace can set up for the next ioctl
> invocation with memcpy(&head->keys[0], &head->recs[head->entries - 1]).
> 
> Yes, I think I like this better.  Everyone else, please chime in. :)

That's pretty much the structure I was going to suggest - it matches
the fiemap pattern. i.e control parameters are separated from record
data. I'd dump a bit more reserved space in the structure, though;
we've got heaps of flag space for future expansion, but if we need
to pass new parameters into/out of the kernel we'll quickly use the
reserved space.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
@ 2016-09-08 23:38         ` Dave Chinner
  0 siblings, 0 replies; 26+ messages in thread
From: Dave Chinner @ 2016-09-08 23:38 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andreas Dilger, linux-man, linux-api, xfs, linux-xfs,
	mtk.manpages, linux-fsdevel, linux-btrfs

On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > I recall for FIEMAP that some filesystems may not have files aligned
> > to sector offsets, and we just used byte offsets.  Storage like
> > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > tie this to old disk sector sizes.  Alternately, the units could be
> > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> 
> Ugh.  I'd rather just change the units to bytes rather than force all
> the users to multiply things. :)

Yup, units need to be either in disk addresses (i.e. 512 byte units)
or bytes. If people can't handle disk addresses (seems to be the
case), the bytes it should be.

> I'd much rather just add more special owner codes for any other
> filesystem that has distinguishable metadata types that are not
> covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> values, so it's not like we're going to run out.

This is diagnositc information as much as anything, just like
fiemap is diagnostic information. So if we have specific type
information, it needs to be reported accurately to be useful.

Hence I really don't care if the users and developers of other fs
types don't understand what the special owner codes that a specific
filesystem returns mean. i.e. it's not useful user information -
only a tool that groks the specific filesystem is going to be able
to anything useful with special owner codes. So, IMO, there's little
point trying to make them generic or to even trying to define and
explain them in the man page....

> > It seems like there are several fields in the structure that are used for
> > only input or only output?  Does it make more sense to have one structure
> > used only for the input request, and then the array of values returned be
> > in a different structure?  I'm not necessarily requesting that it be changed,
> > but it definitely is something I noticed a few times while reading this doc.
> 
> I've been thinking about rearranging this a bit, since the flags
> handling is very awkward with the current array structure.  Each
> rmap has its own flags; we may someday want to pass operation flags
> into the ioctl; and we currently have one operation flag to pass back
> to userspace.  Each of those flags can be a separate field.  I think
> people will get confused about FMV_OF_* and FMV_HOF_* being referenced
> in oflags, and iflags has no meaning for returned records.

Yup, that's what I initially noticed when I glanced at this. The XFS
getbmap interface is just plain nasty, and we shouldn't be copying
that API pattern if we can help it.

> So, this instead?
> 
> struct getfsmap_rec {
> 	u32 device;		/* device id */
> 	u32 flags;		/* mapping flags */
> 	u64 block;		/* physical addr, bytes */
> 	u64 owner;		/* inode or special owner code */
> 	u64 offset;		/* file offset of mapping, bytes */
> 	u64 length;		/* length of segment, bytes */
> 	u64 reserved;		/* will be set to zero */
> }; /* 48 bytes */
> 
> struct getfsmap_head {
> 	u32 iflags;		/* none defined yet */
> 	u32 oflags;		/* FMV_HOF_DEV_T */
> 	u32 count;		/* # entries in recs array */
> 	u32 entries;		/* # entries filled in (output) */
> 	u64 reserved[2]; 	/* must be zero */
> 
> 	struct getfsmap_rec keys[2]; /* low and high keys for the mapping search */
> 	struct getfsmap_rec recs[0];
> }; /* 32 bytes + 2*48 = 128 bytes */
> 
> #define XFS_IOC_GETFSMAP	_IOWR('X', 59, struct getfsmap_head)
> 
> This also means that userspace can set up for the next ioctl
> invocation with memcpy(&head->keys[0], &head->recs[head->entries - 1]).
> 
> Yes, I think I like this better.  Everyone else, please chime in. :)

That's pretty much the structure I was going to suggest - it matches
the fiemap pattern. i.e control parameters are separated from record
data. I'd dump a bit more reserved space in the structure, though;
we've got heaps of flag space for future expansion, but if we need
to pass new parameters into/out of the kernel we'll quickly use the
reserved space.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
@ 2016-09-08 23:38         ` Dave Chinner
  0 siblings, 0 replies; 26+ messages in thread
From: Dave Chinner @ 2016-09-08 23:38 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andreas Dilger, linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, xfs,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > I recall for FIEMAP that some filesystems may not have files aligned
> > to sector offsets, and we just used byte offsets.  Storage like
> > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > tie this to old disk sector sizes.  Alternately, the units could be
> > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> 
> Ugh.  I'd rather just change the units to bytes rather than force all
> the users to multiply things. :)

Yup, units need to be either in disk addresses (i.e. 512 byte units)
or bytes. If people can't handle disk addresses (seems to be the
case), the bytes it should be.

> I'd much rather just add more special owner codes for any other
> filesystem that has distinguishable metadata types that are not
> covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> values, so it's not like we're going to run out.

This is diagnositc information as much as anything, just like
fiemap is diagnostic information. So if we have specific type
information, it needs to be reported accurately to be useful.

Hence I really don't care if the users and developers of other fs
types don't understand what the special owner codes that a specific
filesystem returns mean. i.e. it's not useful user information -
only a tool that groks the specific filesystem is going to be able
to anything useful with special owner codes. So, IMO, there's little
point trying to make them generic or to even trying to define and
explain them in the man page....

> > It seems like there are several fields in the structure that are used for
> > only input or only output?  Does it make more sense to have one structure
> > used only for the input request, and then the array of values returned be
> > in a different structure?  I'm not necessarily requesting that it be changed,
> > but it definitely is something I noticed a few times while reading this doc.
> 
> I've been thinking about rearranging this a bit, since the flags
> handling is very awkward with the current array structure.  Each
> rmap has its own flags; we may someday want to pass operation flags
> into the ioctl; and we currently have one operation flag to pass back
> to userspace.  Each of those flags can be a separate field.  I think
> people will get confused about FMV_OF_* and FMV_HOF_* being referenced
> in oflags, and iflags has no meaning for returned records.

Yup, that's what I initially noticed when I glanced at this. The XFS
getbmap interface is just plain nasty, and we shouldn't be copying
that API pattern if we can help it.

> So, this instead?
> 
> struct getfsmap_rec {
> 	u32 device;		/* device id */
> 	u32 flags;		/* mapping flags */
> 	u64 block;		/* physical addr, bytes */
> 	u64 owner;		/* inode or special owner code */
> 	u64 offset;		/* file offset of mapping, bytes */
> 	u64 length;		/* length of segment, bytes */
> 	u64 reserved;		/* will be set to zero */
> }; /* 48 bytes */
> 
> struct getfsmap_head {
> 	u32 iflags;		/* none defined yet */
> 	u32 oflags;		/* FMV_HOF_DEV_T */
> 	u32 count;		/* # entries in recs array */
> 	u32 entries;		/* # entries filled in (output) */
> 	u64 reserved[2]; 	/* must be zero */
> 
> 	struct getfsmap_rec keys[2]; /* low and high keys for the mapping search */
> 	struct getfsmap_rec recs[0];
> }; /* 32 bytes + 2*48 = 128 bytes */
> 
> #define XFS_IOC_GETFSMAP	_IOWR('X', 59, struct getfsmap_head)
> 
> This also means that userspace can set up for the next ioctl
> invocation with memcpy(&head->keys[0], &head->recs[head->entries - 1]).
> 
> Yes, I think I like this better.  Everyone else, please chime in. :)

That's pretty much the structure I was going to suggest - it matches
the fiemap pattern. i.e control parameters are separated from record
data. I'd dump a bit more reserved space in the structure, though;
we've got heaps of flag space for future expansion, but if we need
to pass new parameters into/out of the kernel we'll quickly use the
reserved space.

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
  2016-09-08 23:38         ` Dave Chinner
  (?)
@ 2016-09-09  6:07           ` Darrick J. Wong
  -1 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-09-09  6:07 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, linux-man, linux-api, xfs, linux-xfs,
	mtk.manpages, linux-fsdevel, linux-btrfs

On Fri, Sep 09, 2016 at 09:38:06AM +1000, Dave Chinner wrote:
> On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > > I recall for FIEMAP that some filesystems may not have files aligned
> > > to sector offsets, and we just used byte offsets.  Storage like
> > > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > > tie this to old disk sector sizes.  Alternately, the units could be
> > > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> > 
> > Ugh.  I'd rather just change the units to bytes rather than force all
> > the users to multiply things. :)
> 
> Yup, units need to be either in disk addresses (i.e. 512 byte units)
> or bytes. If people can't handle disk addresses (seems to be the
> case), the bytes it should be.

<nod>

> > I'd much rather just add more special owner codes for any other
> > filesystem that has distinguishable metadata types that are not
> > covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> > values, so it's not like we're going to run out.
> 
> This is diagnositc information as much as anything, just like
> fiemap is diagnostic information. So if we have specific type
> information, it needs to be reported accurately to be useful.
> 
> Hence I really don't care if the users and developers of other fs
> types don't understand what the special owner codes that a specific
> filesystem returns mean. i.e. it's not useful user information -
> only a tool that groks the specific filesystem is going to be able
> to anything useful with special owner codes. So, IMO, there's little
> point trying to make them generic or to even trying to define and
> explain them in the man page....

<shrug> I'm ok with describing generally what each special owner code
means.  Maybe the manpage could be more explicit about "None of these
codes are useful unless you're a low level filesystem tool"?

> > > It seems like there are several fields in the structure that are used for
> > > only input or only output?  Does it make more sense to have one structure
> > > used only for the input request, and then the array of values returned be
> > > in a different structure?  I'm not necessarily requesting that it be changed,
> > > but it definitely is something I noticed a few times while reading this doc.
> > 
> > I've been thinking about rearranging this a bit, since the flags
> > handling is very awkward with the current array structure.  Each
> > rmap has its own flags; we may someday want to pass operation flags
> > into the ioctl; and we currently have one operation flag to pass back
> > to userspace.  Each of those flags can be a separate field.  I think
> > people will get confused about FMV_OF_* and FMV_HOF_* being referenced
> > in oflags, and iflags has no meaning for returned records.
> 
> Yup, that's what I initially noticed when I glanced at this. The XFS
> getbmap interface is just plain nasty, and we shouldn't be copying
> that API pattern if we can help it.

Lol ok. :)

> > So, this instead?
> > 
> > struct getfsmap_rec {
> > 	u32 device;		/* device id */
> > 	u32 flags;		/* mapping flags */
> > 	u64 block;		/* physical addr, bytes */
> > 	u64 owner;		/* inode or special owner code */
> > 	u64 offset;		/* file offset of mapping, bytes */
> > 	u64 length;		/* length of segment, bytes */
> > 	u64 reserved;		/* will be set to zero */
> > }; /* 48 bytes */
> > 
> > struct getfsmap_head {
> > 	u32 iflags;		/* none defined yet */
> > 	u32 oflags;		/* FMV_HOF_DEV_T */
> > 	u32 count;		/* # entries in recs array */
> > 	u32 entries;		/* # entries filled in (output) */
> > 	u64 reserved[2]; 	/* must be zero */
> > 
> > 	struct getfsmap_rec keys[2]; /* low and high keys for the mapping search */
> > 	struct getfsmap_rec recs[0];
> > }; /* 32 bytes + 2*48 = 128 bytes */
> > 
> > #define XFS_IOC_GETFSMAP	_IOWR('X', 59, struct getfsmap_head)
> > 
> > This also means that userspace can set up for the next ioctl
> > invocation with memcpy(&head->keys[0], &head->recs[head->entries - 1]).
> > 
> > Yes, I think I like this better.  Everyone else, please chime in. :)
> 
> That's pretty much the structure I was going to suggest - it matches
> the fiemap pattern. i.e control parameters are separated from record
> data. I'd dump a bit more reserved space in the structure, though;
> we've got heaps of flag space for future expansion, but if we need
> to pass new parameters into/out of the kernel we'll quickly use the
> reserved space.

I padded struct fsmap with enough reserved space to make it an even 64 bytes,
and padded struct fsmap_head so that the space before keys is 64 bytes in
length.  See v3 patch of the ioctl manpage.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
@ 2016-09-09  6:07           ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-09-09  6:07 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, linux-man, linux-api, xfs, linux-xfs,
	mtk.manpages, linux-fsdevel, linux-btrfs

On Fri, Sep 09, 2016 at 09:38:06AM +1000, Dave Chinner wrote:
> On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > > I recall for FIEMAP that some filesystems may not have files aligned
> > > to sector offsets, and we just used byte offsets.  Storage like
> > > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > > tie this to old disk sector sizes.  Alternately, the units could be
> > > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> > 
> > Ugh.  I'd rather just change the units to bytes rather than force all
> > the users to multiply things. :)
> 
> Yup, units need to be either in disk addresses (i.e. 512 byte units)
> or bytes. If people can't handle disk addresses (seems to be the
> case), the bytes it should be.

<nod>

> > I'd much rather just add more special owner codes for any other
> > filesystem that has distinguishable metadata types that are not
> > covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> > values, so it's not like we're going to run out.
> 
> This is diagnositc information as much as anything, just like
> fiemap is diagnostic information. So if we have specific type
> information, it needs to be reported accurately to be useful.
> 
> Hence I really don't care if the users and developers of other fs
> types don't understand what the special owner codes that a specific
> filesystem returns mean. i.e. it's not useful user information -
> only a tool that groks the specific filesystem is going to be able
> to anything useful with special owner codes. So, IMO, there's little
> point trying to make them generic or to even trying to define and
> explain them in the man page....

<shrug> I'm ok with describing generally what each special owner code
means.  Maybe the manpage could be more explicit about "None of these
codes are useful unless you're a low level filesystem tool"?

> > > It seems like there are several fields in the structure that are used for
> > > only input or only output?  Does it make more sense to have one structure
> > > used only for the input request, and then the array of values returned be
> > > in a different structure?  I'm not necessarily requesting that it be changed,
> > > but it definitely is something I noticed a few times while reading this doc.
> > 
> > I've been thinking about rearranging this a bit, since the flags
> > handling is very awkward with the current array structure.  Each
> > rmap has its own flags; we may someday want to pass operation flags
> > into the ioctl; and we currently have one operation flag to pass back
> > to userspace.  Each of those flags can be a separate field.  I think
> > people will get confused about FMV_OF_* and FMV_HOF_* being referenced
> > in oflags, and iflags has no meaning for returned records.
> 
> Yup, that's what I initially noticed when I glanced at this. The XFS
> getbmap interface is just plain nasty, and we shouldn't be copying
> that API pattern if we can help it.

Lol ok. :)

> > So, this instead?
> > 
> > struct getfsmap_rec {
> > 	u32 device;		/* device id */
> > 	u32 flags;		/* mapping flags */
> > 	u64 block;		/* physical addr, bytes */
> > 	u64 owner;		/* inode or special owner code */
> > 	u64 offset;		/* file offset of mapping, bytes */
> > 	u64 length;		/* length of segment, bytes */
> > 	u64 reserved;		/* will be set to zero */
> > }; /* 48 bytes */
> > 
> > struct getfsmap_head {
> > 	u32 iflags;		/* none defined yet */
> > 	u32 oflags;		/* FMV_HOF_DEV_T */
> > 	u32 count;		/* # entries in recs array */
> > 	u32 entries;		/* # entries filled in (output) */
> > 	u64 reserved[2]; 	/* must be zero */
> > 
> > 	struct getfsmap_rec keys[2]; /* low and high keys for the mapping search */
> > 	struct getfsmap_rec recs[0];
> > }; /* 32 bytes + 2*48 = 128 bytes */
> > 
> > #define XFS_IOC_GETFSMAP	_IOWR('X', 59, struct getfsmap_head)
> > 
> > This also means that userspace can set up for the next ioctl
> > invocation with memcpy(&head->keys[0], &head->recs[head->entries - 1]).
> > 
> > Yes, I think I like this better.  Everyone else, please chime in. :)
> 
> That's pretty much the structure I was going to suggest - it matches
> the fiemap pattern. i.e control parameters are separated from record
> data. I'd dump a bit more reserved space in the structure, though;
> we've got heaps of flag space for future expansion, but if we need
> to pass new parameters into/out of the kernel we'll quickly use the
> reserved space.

I padded struct fsmap with enough reserved space to make it an even 64 bytes,
and padded struct fsmap_head so that the space before keys is 64 bytes in
length.  See v3 patch of the ioctl manpage.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
@ 2016-09-09  6:07           ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-09-09  6:07 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, xfs,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 09, 2016 at 09:38:06AM +1000, Dave Chinner wrote:
> On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > > I recall for FIEMAP that some filesystems may not have files aligned
> > > to sector offsets, and we just used byte offsets.  Storage like
> > > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > > tie this to old disk sector sizes.  Alternately, the units could be
> > > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> > 
> > Ugh.  I'd rather just change the units to bytes rather than force all
> > the users to multiply things. :)
> 
> Yup, units need to be either in disk addresses (i.e. 512 byte units)
> or bytes. If people can't handle disk addresses (seems to be the
> case), the bytes it should be.

<nod>

> > I'd much rather just add more special owner codes for any other
> > filesystem that has distinguishable metadata types that are not
> > covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> > values, so it's not like we're going to run out.
> 
> This is diagnositc information as much as anything, just like
> fiemap is diagnostic information. So if we have specific type
> information, it needs to be reported accurately to be useful.
> 
> Hence I really don't care if the users and developers of other fs
> types don't understand what the special owner codes that a specific
> filesystem returns mean. i.e. it's not useful user information -
> only a tool that groks the specific filesystem is going to be able
> to anything useful with special owner codes. So, IMO, there's little
> point trying to make them generic or to even trying to define and
> explain them in the man page....

<shrug> I'm ok with describing generally what each special owner code
means.  Maybe the manpage could be more explicit about "None of these
codes are useful unless you're a low level filesystem tool"?

> > > It seems like there are several fields in the structure that are used for
> > > only input or only output?  Does it make more sense to have one structure
> > > used only for the input request, and then the array of values returned be
> > > in a different structure?  I'm not necessarily requesting that it be changed,
> > > but it definitely is something I noticed a few times while reading this doc.
> > 
> > I've been thinking about rearranging this a bit, since the flags
> > handling is very awkward with the current array structure.  Each
> > rmap has its own flags; we may someday want to pass operation flags
> > into the ioctl; and we currently have one operation flag to pass back
> > to userspace.  Each of those flags can be a separate field.  I think
> > people will get confused about FMV_OF_* and FMV_HOF_* being referenced
> > in oflags, and iflags has no meaning for returned records.
> 
> Yup, that's what I initially noticed when I glanced at this. The XFS
> getbmap interface is just plain nasty, and we shouldn't be copying
> that API pattern if we can help it.

Lol ok. :)

> > So, this instead?
> > 
> > struct getfsmap_rec {
> > 	u32 device;		/* device id */
> > 	u32 flags;		/* mapping flags */
> > 	u64 block;		/* physical addr, bytes */
> > 	u64 owner;		/* inode or special owner code */
> > 	u64 offset;		/* file offset of mapping, bytes */
> > 	u64 length;		/* length of segment, bytes */
> > 	u64 reserved;		/* will be set to zero */
> > }; /* 48 bytes */
> > 
> > struct getfsmap_head {
> > 	u32 iflags;		/* none defined yet */
> > 	u32 oflags;		/* FMV_HOF_DEV_T */
> > 	u32 count;		/* # entries in recs array */
> > 	u32 entries;		/* # entries filled in (output) */
> > 	u64 reserved[2]; 	/* must be zero */
> > 
> > 	struct getfsmap_rec keys[2]; /* low and high keys for the mapping search */
> > 	struct getfsmap_rec recs[0];
> > }; /* 32 bytes + 2*48 = 128 bytes */
> > 
> > #define XFS_IOC_GETFSMAP	_IOWR('X', 59, struct getfsmap_head)
> > 
> > This also means that userspace can set up for the next ioctl
> > invocation with memcpy(&head->keys[0], &head->recs[head->entries - 1]).
> > 
> > Yes, I think I like this better.  Everyone else, please chime in. :)
> 
> That's pretty much the structure I was going to suggest - it matches
> the fiemap pattern. i.e control parameters are separated from record
> data. I'd dump a bit more reserved space in the structure, though;
> we've got heaps of flag space for future expansion, but if we need
> to pass new parameters into/out of the kernel we'll quickly use the
> reserved space.

I padded struct fsmap with enough reserved space to make it an even 64 bytes,
and padded struct fsmap_head so that the space before keys is 64 bytes in
length.  See v3 patch of the ioctl manpage.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 3/3] ioctl_getfsmap.2: document the GETFSMAP ioctl
@ 2016-09-09  6:17     ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-09-09  6:17 UTC (permalink / raw)
  To: Dave Chinner, Theodore Ts'o, Josef Bacik, Mark Fasheh
  Cc: linux-fsdevel, linux-api, linux-man, adilger, linux-xfs, xfs,
	linux-btrfs, mtk.manpages, linux-ext4

Document the new GETFSMAP ioctl that returns the physical layout of a
(disk-based) filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 man2/ioctl_getfsmap.2 |  313 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 313 insertions(+)
 create mode 100644 man2/ioctl_getfsmap.2

diff --git a/man2/ioctl_getfsmap.2 b/man2/ioctl_getfsmap.2
new file mode 100644
index 0000000..fac3ff4
--- /dev/null
+++ b/man2/ioctl_getfsmap.2
@@ -0,0 +1,313 @@
+.\" Copyright (c) 2016, Oracle.  All rights reserved.
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" This is free documentation; you can redistribute it and/or
+.\" modify it under the terms of the GNU General Public License as
+.\" published by the Free Software Foundation; either version 2 of
+.\" the License, or (at your option) any later version.
+.\"
+.\" The GNU General Public License's references to "object code"
+.\" and "executables" are to be interpreted as the output of any
+.\" document formatting or typesetting system, including
+.\" intermediate and printed output.
+.\"
+.\" This manual is distributed in the hope that it will be useful,
+.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
+.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+.\" GNU General Public License for more details.
+.\"
+.\" You should have received a copy of the GNU General Public
+.\" License along with this manual; if not, see
+.\" <http://www.gnu.org/licenses/>.
+.\" %%%LICENSE_END
+.TH IOCTL-GETFSMAP 2 2016-09-08 "Linux" "Linux Programmer's Manual"
+.SH NAME
+ioctl_getfsmap \- retrieve the physical layout of the filesystem
+.SH SYNOPSIS
+.br
+.B #include <sys/ioctl.h>
+.br
+.B #include <linux/fs.h>
+.sp
+.BI "int ioctl(int " fd ", GETFSMAP, struct fsmap_head * " arg );
+.SH DESCRIPTION
+This
+.BR ioctl (2)
+retrieves physical extent mappings for a filesystem.
+This information can be used to discover which files are mapped to a physical
+block, examine free space, or find known bad blocks, among other things.
+
+The sole argument to this ioctl should be a pointer to a single
+.BR "struct fsmap_head" ":"
+.in +4n
+.nf
+
+struct fsmap {
+	__u32		fmr_device;	/* device id */
+	__u32		fmr_flags;	/* mapping flags */
+	__u64		fmr_physical;	/* device offset of segment */
+	__u64		fmr_owner;	/* owner id */
+	__u64		fmr_offset;	/* file offset of segment */
+	__u64		fmr_length;	/* length of segment */
+	__u64		fmr_reserved[3];	/* must be zero */
+};
+
+struct fsmap_head {
+	__u32		fmh_iflags;	/* control flags */
+	__u32		fmh_oflags;	/* output flags */
+	__u32		fmh_count;	/* # of entries in array incl. input */
+	__u32		fmh_entries;	/* # of entries filled in (output). */
+	__u64		fmh_reserved[6];	/* must be zero */
+
+	struct fsmap	fmh_keys[2];	/* low and high keys for the mapping search */
+	struct fsmap	fmh_recs[];	/* returned records */
+};
+
+.fi
+.in
+The two
+.I fmh_keys
+array elements specify the lowest and highest reverse-mapping
+keys, respectively, for which userspace would like physical mapping
+information.
+A reverse mapping key consists of the tuple (device, block, owner, offset).
+The owner and offset fields are part of the key because some filesystems
+support sharing physical blocks between multiple files and
+therefore may return multiple mappings for a given physical block.
+.PP
+Filesystem mappings are copied into the
+.I fmh_recs
+array, which immediately follows the header data.
+.SS Fields of struct fsmap_head
+.PP
+The
+.I fmh_iflags
+field is a bitmask passed to the kernel to alter the output.
+There are no flags defined, so this value must be zero.
+
+.PP
+The
+.I fmh_oflags
+field is a bitmask of flags that concern all output mappings.
+If
+.B FMH_OF_DEV_T
+is set, then the
+.I fmr_device
+field represents a
+.B dev_t
+structure containing the major and minor numbers of the block device.
+
+.PP
+The
+.I fmh_count
+field contains the number of elements in the array being passed to the
+kernel.
+If this value is 0,
+.I fmh_entries
+will be set to the number of records that would have been returned had
+the array been large enough;
+no mapping information will be returned.
+
+.PP
+The
+.I fmh_entries
+field contains the number of elements in the
+.I fmh_recs
+array that contain useful information.
+
+.PP
+The
+.I fmh_reserved
+fields must be set to zero.
+
+.SS Keys
+.PP
+The two key records in
+.B fsmap_head.fmh_keys
+specify the lowest and highest extent records in the keyspace that the caller
+wants returned.
+A filesystem that can share blocks between files likely requires the tuple
+.RI "(" "device" ", " "physical" ", " "owner" ", " "offset" ", " "flags" ")"
+to uniquely index any filesystem mapping record.
+Classic non-sharing filesystems might be able to identify any record with only
+.RI "(" "device" ", " "physical" ", " "flags" ")."
+For example, if the low key is set to (0, 36864, 0, 0, 0), the filesystem will
+only return records for extents starting at or above 36KiB on disk.
+If the high key is set to (0, 1048576, 0, 0, 0), only records below 1MiB will
+be returned.
+By convention, the field
+.B fsmap_head.fmh_keys[0]
+must contain the low key and
+.B fsmap_head.fmh_keys[1]
+must contain the high key for the request.
+.PP
+For convenience, if
+.B fmr_length
+is set in the low key, it will be added to
+.IR fmr_block " or " fmr_offset
+as appropriate.
+The caller can take advantage of this subtlety to set up subsequent calls
+by copying
+.B fsmap_head.fmh_recs[fsmap_head.fmh_entries - 1]
+into the low key.
+
+.SS Fields of struct fsmap
+.PP
+The
+.I fmr_device
+field contains a 32-bit cookie to uniquely identify the underlying storage
+device.
+If the
+.B FMH_OF_DEV_T
+flag is set in the header's
+.I fmh_oflags
+field, this field contains a
+.B dev_t
+from which major and minor numbers can be extracted.
+If the flag is not set, this field contains a value that must be unique
+for each unique storage device.
+
+.PP
+The
+.I fmr_physical
+field contains the disk address of the extent in bytes.
+
+.PP
+The
+.I fmr_owner
+field contains the owner of the extent.
+This is an inode number unless
+.B FMR_OF_SPECIAL_OWNER
+is set in the
+.I fmr_flags
+field, in which case the owner value is one of the following special values:
+.RS 0.4i
+.TP
+.B FMR_OWN_FREE
+Free space.
+.TP
+.B FMR_OWN_UNKNOWN
+This extent is in use but its owner is not known.
+.TP
+.B FMR_OWN_FS
+Static filesystem metadata which exists at a fixed address.
+On XFS these are the AG superblock, AGF, AGFL, and AGI headers.
+.TP
+.B FMR_OWN_LOG
+The filesystem journal.
+.TP
+.B FMR_OWN_AG
+Allocation group metadata.
+On XFS these are the free space btrees or the reverse mapping btrees.
+.TP
+.B FMR_OWN_INOBT
+Inode indexing, if any are provided.
+On XFS these are the inode and free inode btrees.
+.TP
+.B FMR_OWN_INODES
+Inode records.
+.TP
+.B FMR_OWN_REFC
+Reference count information.
+On XFS this is the reference count btree.
+.TP
+.B FMR_OWN_COW
+This extent is being used to stage a copy-on-write.
+.TP
+.B FMR_OWN_DEFECTIVE:
+This extent has been marked defective either by the filesystem or the
+underlying device.
+.RE
+
+.PP
+The
+.I fmr_offset
+field contains the logical address in the mapping record in bytes.
+This field has no meaning if the
+.BR FMR_OF_SPECIAL_OWNER " or " FMR_OF_EXTENT_MAP
+flags are set in
+.IR fmr_flags "."
+
+.PP
+The
+.I fmr_length
+field contains the length of the extent in bytes.
+
+.PP
+The
+.I fmr_flags
+field is a bitmask of extent state flags.
+The bits are:
+.RS 0.4i
+.TP
+.B FMR_OF_PREALLOC
+The extent is allocated but not yet written.
+.TP
+.B FMR_OF_ATTR_FORK
+This extent contains extended attribute data.
+.TP
+.B FMR_OF_EXTENT_MAP
+This extent contains extent map information for the owner.
+.TP
+.B FMR_OF_SHARED
+Parts of this extent may be shared.
+.TP
+.B FMR_OF_SPECIAL_OWNER
+The
+.I fmr_owner
+field contains a special value instead of an inode number.
+.TP
+.B FMR_OF_LAST
+This is the last record in the filesystem.
+.RE
+
+.PP
+The
+.I fmr_reserved
+field will be set to zero.
+
+.SH RETURN VALUE
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.PP
+.SH ERRORS
+Error codes can be one of, but are not limited to, the following:
+.TP
+.B EINVAL
+The array is not long enough, or a non-zero value was passed in one of the
+fields that must be zero.
+.TP
+.B EFAULT
+The pointer passed in was not mapped to a valid memory address.
+.TP
+.B EBADF
+.IR fd
+is not open for reading.
+.TP
+.B EPERM
+This query is not allowed.
+.TP
+.B EOPNOTSUPP
+The filesystem does not support this command.
+.TP
+.B EUCLEAN
+The filesystem metadata is corrupt and needs repair.
+.TP
+.B EBADMSG
+The filesystem has detected a checksum error in the metadata.
+.TP
+.B ENOMEM
+Insufficient memory to process the request.
+
+.SH EXAMPLE
+.TP
+Please see io/fsmap.c in the xfsprogs distribution for a sample program.
+
+.SH CONFORMING TO
+This API is Linux-specific.
+Not all filesystems support it.
+.fi
+.in
+.SH SEE ALSO
+.BR ioctl (2)

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 3/3] ioctl_getfsmap.2: document the GETFSMAP ioctl
@ 2016-09-09  6:17     ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-09-09  6:17 UTC (permalink / raw)
  To: Dave Chinner, Theodore Ts'o, Josef Bacik, Mark Fasheh
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA, adilger-m1MBpc4rdrD3fQ9qLvQP4Q,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA, xfs-VZNHf3L845pBDgjK7y7TUQ,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, linux-ext4

Document the new GETFSMAP ioctl that returns the physical layout of a
(disk-based) filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 man2/ioctl_getfsmap.2 |  313 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 313 insertions(+)
 create mode 100644 man2/ioctl_getfsmap.2

diff --git a/man2/ioctl_getfsmap.2 b/man2/ioctl_getfsmap.2
new file mode 100644
index 0000000..fac3ff4
--- /dev/null
+++ b/man2/ioctl_getfsmap.2
@@ -0,0 +1,313 @@
+.\" Copyright (c) 2016, Oracle.  All rights reserved.
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" This is free documentation; you can redistribute it and/or
+.\" modify it under the terms of the GNU General Public License as
+.\" published by the Free Software Foundation; either version 2 of
+.\" the License, or (at your option) any later version.
+.\"
+.\" The GNU General Public License's references to "object code"
+.\" and "executables" are to be interpreted as the output of any
+.\" document formatting or typesetting system, including
+.\" intermediate and printed output.
+.\"
+.\" This manual is distributed in the hope that it will be useful,
+.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
+.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+.\" GNU General Public License for more details.
+.\"
+.\" You should have received a copy of the GNU General Public
+.\" License along with this manual; if not, see
+.\" <http://www.gnu.org/licenses/>.
+.\" %%%LICENSE_END
+.TH IOCTL-GETFSMAP 2 2016-09-08 "Linux" "Linux Programmer's Manual"
+.SH NAME
+ioctl_getfsmap \- retrieve the physical layout of the filesystem
+.SH SYNOPSIS
+.br
+.B #include <sys/ioctl.h>
+.br
+.B #include <linux/fs.h>
+.sp
+.BI "int ioctl(int " fd ", GETFSMAP, struct fsmap_head * " arg );
+.SH DESCRIPTION
+This
+.BR ioctl (2)
+retrieves physical extent mappings for a filesystem.
+This information can be used to discover which files are mapped to a physical
+block, examine free space, or find known bad blocks, among other things.
+
+The sole argument to this ioctl should be a pointer to a single
+.BR "struct fsmap_head" ":"
+.in +4n
+.nf
+
+struct fsmap {
+	__u32		fmr_device;	/* device id */
+	__u32		fmr_flags;	/* mapping flags */
+	__u64		fmr_physical;	/* device offset of segment */
+	__u64		fmr_owner;	/* owner id */
+	__u64		fmr_offset;	/* file offset of segment */
+	__u64		fmr_length;	/* length of segment */
+	__u64		fmr_reserved[3];	/* must be zero */
+};
+
+struct fsmap_head {
+	__u32		fmh_iflags;	/* control flags */
+	__u32		fmh_oflags;	/* output flags */
+	__u32		fmh_count;	/* # of entries in array incl. input */
+	__u32		fmh_entries;	/* # of entries filled in (output). */
+	__u64		fmh_reserved[6];	/* must be zero */
+
+	struct fsmap	fmh_keys[2];	/* low and high keys for the mapping search */
+	struct fsmap	fmh_recs[];	/* returned records */
+};
+
+.fi
+.in
+The two
+.I fmh_keys
+array elements specify the lowest and highest reverse-mapping
+keys, respectively, for which userspace would like physical mapping
+information.
+A reverse mapping key consists of the tuple (device, block, owner, offset).
+The owner and offset fields are part of the key because some filesystems
+support sharing physical blocks between multiple files and
+therefore may return multiple mappings for a given physical block.
+.PP
+Filesystem mappings are copied into the
+.I fmh_recs
+array, which immediately follows the header data.
+.SS Fields of struct fsmap_head
+.PP
+The
+.I fmh_iflags
+field is a bitmask passed to the kernel to alter the output.
+There are no flags defined, so this value must be zero.
+
+.PP
+The
+.I fmh_oflags
+field is a bitmask of flags that concern all output mappings.
+If
+.B FMH_OF_DEV_T
+is set, then the
+.I fmr_device
+field represents a
+.B dev_t
+structure containing the major and minor numbers of the block device.
+
+.PP
+The
+.I fmh_count
+field contains the number of elements in the array being passed to the
+kernel.
+If this value is 0,
+.I fmh_entries
+will be set to the number of records that would have been returned had
+the array been large enough;
+no mapping information will be returned.
+
+.PP
+The
+.I fmh_entries
+field contains the number of elements in the
+.I fmh_recs
+array that contain useful information.
+
+.PP
+The
+.I fmh_reserved
+fields must be set to zero.
+
+.SS Keys
+.PP
+The two key records in
+.B fsmap_head.fmh_keys
+specify the lowest and highest extent records in the keyspace that the caller
+wants returned.
+A filesystem that can share blocks between files likely requires the tuple
+.RI "(" "device" ", " "physical" ", " "owner" ", " "offset" ", " "flags" ")"
+to uniquely index any filesystem mapping record.
+Classic non-sharing filesystems might be able to identify any record with only
+.RI "(" "device" ", " "physical" ", " "flags" ")."
+For example, if the low key is set to (0, 36864, 0, 0, 0), the filesystem will
+only return records for extents starting at or above 36KiB on disk.
+If the high key is set to (0, 1048576, 0, 0, 0), only records below 1MiB will
+be returned.
+By convention, the field
+.B fsmap_head.fmh_keys[0]
+must contain the low key and
+.B fsmap_head.fmh_keys[1]
+must contain the high key for the request.
+.PP
+For convenience, if
+.B fmr_length
+is set in the low key, it will be added to
+.IR fmr_block " or " fmr_offset
+as appropriate.
+The caller can take advantage of this subtlety to set up subsequent calls
+by copying
+.B fsmap_head.fmh_recs[fsmap_head.fmh_entries - 1]
+into the low key.
+
+.SS Fields of struct fsmap
+.PP
+The
+.I fmr_device
+field contains a 32-bit cookie to uniquely identify the underlying storage
+device.
+If the
+.B FMH_OF_DEV_T
+flag is set in the header's
+.I fmh_oflags
+field, this field contains a
+.B dev_t
+from which major and minor numbers can be extracted.
+If the flag is not set, this field contains a value that must be unique
+for each unique storage device.
+
+.PP
+The
+.I fmr_physical
+field contains the disk address of the extent in bytes.
+
+.PP
+The
+.I fmr_owner
+field contains the owner of the extent.
+This is an inode number unless
+.B FMR_OF_SPECIAL_OWNER
+is set in the
+.I fmr_flags
+field, in which case the owner value is one of the following special values:
+.RS 0.4i
+.TP
+.B FMR_OWN_FREE
+Free space.
+.TP
+.B FMR_OWN_UNKNOWN
+This extent is in use but its owner is not known.
+.TP
+.B FMR_OWN_FS
+Static filesystem metadata which exists at a fixed address.
+On XFS these are the AG superblock, AGF, AGFL, and AGI headers.
+.TP
+.B FMR_OWN_LOG
+The filesystem journal.
+.TP
+.B FMR_OWN_AG
+Allocation group metadata.
+On XFS these are the free space btrees or the reverse mapping btrees.
+.TP
+.B FMR_OWN_INOBT
+Inode indexing, if any are provided.
+On XFS these are the inode and free inode btrees.
+.TP
+.B FMR_OWN_INODES
+Inode records.
+.TP
+.B FMR_OWN_REFC
+Reference count information.
+On XFS this is the reference count btree.
+.TP
+.B FMR_OWN_COW
+This extent is being used to stage a copy-on-write.
+.TP
+.B FMR_OWN_DEFECTIVE:
+This extent has been marked defective either by the filesystem or the
+underlying device.
+.RE
+
+.PP
+The
+.I fmr_offset
+field contains the logical address in the mapping record in bytes.
+This field has no meaning if the
+.BR FMR_OF_SPECIAL_OWNER " or " FMR_OF_EXTENT_MAP
+flags are set in
+.IR fmr_flags "."
+
+.PP
+The
+.I fmr_length
+field contains the length of the extent in bytes.
+
+.PP
+The
+.I fmr_flags
+field is a bitmask of extent state flags.
+The bits are:
+.RS 0.4i
+.TP
+.B FMR_OF_PREALLOC
+The extent is allocated but not yet written.
+.TP
+.B FMR_OF_ATTR_FORK
+This extent contains extended attribute data.
+.TP
+.B FMR_OF_EXTENT_MAP
+This extent contains extent map information for the owner.
+.TP
+.B FMR_OF_SHARED
+Parts of this extent may be shared.
+.TP
+.B FMR_OF_SPECIAL_OWNER
+The
+.I fmr_owner
+field contains a special value instead of an inode number.
+.TP
+.B FMR_OF_LAST
+This is the last record in the filesystem.
+.RE
+
+.PP
+The
+.I fmr_reserved
+field will be set to zero.
+
+.SH RETURN VALUE
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.PP
+.SH ERRORS
+Error codes can be one of, but are not limited to, the following:
+.TP
+.B EINVAL
+The array is not long enough, or a non-zero value was passed in one of the
+fields that must be zero.
+.TP
+.B EFAULT
+The pointer passed in was not mapped to a valid memory address.
+.TP
+.B EBADF
+.IR fd
+is not open for reading.
+.TP
+.B EPERM
+This query is not allowed.
+.TP
+.B EOPNOTSUPP
+The filesystem does not support this command.
+.TP
+.B EUCLEAN
+The filesystem metadata is corrupt and needs repair.
+.TP
+.B EBADMSG
+The filesystem has detected a checksum error in the metadata.
+.TP
+.B ENOMEM
+Insufficient memory to process the request.
+
+.SH EXAMPLE
+.TP
+Please see io/fsmap.c in the xfsprogs distribution for a sample program.
+
+.SH CONFORMING TO
+This API is Linux-specific.
+Not all filesystems support it.
+.fi
+.in
+.SH SEE ALSO
+.BR ioctl (2)

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
  2016-09-09  6:07           ` Darrick J. Wong
@ 2016-09-10  0:00             ` Dave Chinner
  -1 siblings, 0 replies; 26+ messages in thread
From: Dave Chinner @ 2016-09-10  0:00 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andreas Dilger, linux-man, linux-api, xfs, linux-xfs,
	mtk.manpages, linux-fsdevel, linux-btrfs

On Thu, Sep 08, 2016 at 11:07:16PM -0700, Darrick J. Wong wrote:
> On Fri, Sep 09, 2016 at 09:38:06AM +1000, Dave Chinner wrote:
> > On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > > > I recall for FIEMAP that some filesystems may not have files aligned
> > > > to sector offsets, and we just used byte offsets.  Storage like
> > > > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > > > tie this to old disk sector sizes.  Alternately, the units could be
> > > > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > > > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> > > 
> > > Ugh.  I'd rather just change the units to bytes rather than force all
> > > the users to multiply things. :)
> > 
> > Yup, units need to be either in disk addresses (i.e. 512 byte units)
> > or bytes. If people can't handle disk addresses (seems to be the
> > case), the bytes it should be.
> 
> <nod>
> 
> > > I'd much rather just add more special owner codes for any other
> > > filesystem that has distinguishable metadata types that are not
> > > covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> > > values, so it's not like we're going to run out.
> > 
> > This is diagnositc information as much as anything, just like
> > fiemap is diagnostic information. So if we have specific type
> > information, it needs to be reported accurately to be useful.
> > 
> > Hence I really don't care if the users and developers of other fs
> > types don't understand what the special owner codes that a specific
> > filesystem returns mean. i.e. it's not useful user information -
> > only a tool that groks the specific filesystem is going to be able
> > to anything useful with special owner codes. So, IMO, there's little
> > point trying to make them generic or to even trying to define and
> > explain them in the man page....
> 
> <shrug> I'm ok with describing generally what each special owner code
> means.  Maybe the manpage could be more explicit about "None of these
> codes are useful unless you're a low level filesystem tool"?

You can add that, but it doesn't address the underlying problem.
i.e.  that we can add/change the codes, their name, meaning, etc,
and now there's a third party man page that is incorrect and out of
date. It's the same problem with documenting filesystem specific
mount options in mount(8). Better, IMO, is to simple say "refer to
filesystem specific documentation for a description of these special
values". e.g. refer them to the XFS Filesystem Structure
document where this is all spelled out in enough detail to be useful
for someone thinking that they might want to use them....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
@ 2016-09-10  0:00             ` Dave Chinner
  0 siblings, 0 replies; 26+ messages in thread
From: Dave Chinner @ 2016-09-10  0:00 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andreas Dilger, linux-man, linux-api, xfs, linux-xfs,
	mtk.manpages, linux-fsdevel, linux-btrfs

On Thu, Sep 08, 2016 at 11:07:16PM -0700, Darrick J. Wong wrote:
> On Fri, Sep 09, 2016 at 09:38:06AM +1000, Dave Chinner wrote:
> > On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > > > I recall for FIEMAP that some filesystems may not have files aligned
> > > > to sector offsets, and we just used byte offsets.  Storage like
> > > > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > > > tie this to old disk sector sizes.  Alternately, the units could be
> > > > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > > > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> > > 
> > > Ugh.  I'd rather just change the units to bytes rather than force all
> > > the users to multiply things. :)
> > 
> > Yup, units need to be either in disk addresses (i.e. 512 byte units)
> > or bytes. If people can't handle disk addresses (seems to be the
> > case), the bytes it should be.
> 
> <nod>
> 
> > > I'd much rather just add more special owner codes for any other
> > > filesystem that has distinguishable metadata types that are not
> > > covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> > > values, so it's not like we're going to run out.
> > 
> > This is diagnositc information as much as anything, just like
> > fiemap is diagnostic information. So if we have specific type
> > information, it needs to be reported accurately to be useful.
> > 
> > Hence I really don't care if the users and developers of other fs
> > types don't understand what the special owner codes that a specific
> > filesystem returns mean. i.e. it's not useful user information -
> > only a tool that groks the specific filesystem is going to be able
> > to anything useful with special owner codes. So, IMO, there's little
> > point trying to make them generic or to even trying to define and
> > explain them in the man page....
> 
> <shrug> I'm ok with describing generally what each special owner code
> means.  Maybe the manpage could be more explicit about "None of these
> codes are useful unless you're a low level filesystem tool"?

You can add that, but it doesn't address the underlying problem.
i.e.  that we can add/change the codes, their name, meaning, etc,
and now there's a third party man page that is incorrect and out of
date. It's the same problem with documenting filesystem specific
mount options in mount(8). Better, IMO, is to simple say "refer to
filesystem specific documentation for a description of these special
values". e.g. refer them to the XFS Filesystem Structure
document where this is all spelled out in enough detail to be useful
for someone thinking that they might want to use them....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
  2016-09-10  0:00             ` Dave Chinner
@ 2016-09-11 18:56               ` Darrick J. Wong
  -1 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-09-11 18:56 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, linux-man, linux-api, xfs, linux-xfs,
	mtk.manpages, linux-fsdevel, linux-btrfs

On Sat, Sep 10, 2016 at 10:00:29AM +1000, Dave Chinner wrote:
> On Thu, Sep 08, 2016 at 11:07:16PM -0700, Darrick J. Wong wrote:
> > On Fri, Sep 09, 2016 at 09:38:06AM +1000, Dave Chinner wrote:
> > > On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > > > > I recall for FIEMAP that some filesystems may not have files aligned
> > > > > to sector offsets, and we just used byte offsets.  Storage like
> > > > > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > > > > tie this to old disk sector sizes.  Alternately, the units could be
> > > > > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > > > > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> > > > 
> > > > Ugh.  I'd rather just change the units to bytes rather than force all
> > > > the users to multiply things. :)
> > > 
> > > Yup, units need to be either in disk addresses (i.e. 512 byte units)
> > > or bytes. If people can't handle disk addresses (seems to be the
> > > case), the bytes it should be.
> > 
> > <nod>
> > 
> > > > I'd much rather just add more special owner codes for any other
> > > > filesystem that has distinguishable metadata types that are not
> > > > covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> > > > values, so it's not like we're going to run out.
> > > 
> > > This is diagnositc information as much as anything, just like
> > > fiemap is diagnostic information. So if we have specific type
> > > information, it needs to be reported accurately to be useful.
> > > 
> > > Hence I really don't care if the users and developers of other fs
> > > types don't understand what the special owner codes that a specific
> > > filesystem returns mean. i.e. it's not useful user information -
> > > only a tool that groks the specific filesystem is going to be able
> > > to anything useful with special owner codes. So, IMO, there's little
> > > point trying to make them generic or to even trying to define and
> > > explain them in the man page....
> > 
> > <shrug> I'm ok with describing generally what each special owner code
> > means.  Maybe the manpage could be more explicit about "None of these
> > codes are useful unless you're a low level filesystem tool"?
> 
> You can add that, but it doesn't address the underlying problem.
> i.e.  that we can add/change the codes, their name, meaning, etc,
> and now there's a third party man page that is incorrect and out of
> date. It's the same problem with documenting filesystem specific
> mount options in mount(8). Better, IMO, is to simple say "refer to
> filesystem specific documentation for a description of these special
> values". e.g. refer them to the XFS Filesystem Structure
> document where this is all spelled out in enough detail to be useful
> for someone thinking that they might want to use them....

We could simply put a manpage in the xfsprogs source documenting the XFS
owner codes and let other implementers make their own manpage with a
discussion of the owner codes (and whatever other quirks they have).
Sort of fragments things, but that's probably unavoidable. :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl
@ 2016-09-11 18:56               ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2016-09-11 18:56 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, linux-man, linux-api, xfs, linux-xfs,
	mtk.manpages, linux-fsdevel, linux-btrfs

On Sat, Sep 10, 2016 at 10:00:29AM +1000, Dave Chinner wrote:
> On Thu, Sep 08, 2016 at 11:07:16PM -0700, Darrick J. Wong wrote:
> > On Fri, Sep 09, 2016 at 09:38:06AM +1000, Dave Chinner wrote:
> > > On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > > > > I recall for FIEMAP that some filesystems may not have files aligned
> > > > > to sector offsets, and we just used byte offsets.  Storage like
> > > > > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > > > > tie this to old disk sector sizes.  Alternately, the units could be
> > > > > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > > > > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> > > > 
> > > > Ugh.  I'd rather just change the units to bytes rather than force all
> > > > the users to multiply things. :)
> > > 
> > > Yup, units need to be either in disk addresses (i.e. 512 byte units)
> > > or bytes. If people can't handle disk addresses (seems to be the
> > > case), the bytes it should be.
> > 
> > <nod>
> > 
> > > > I'd much rather just add more special owner codes for any other
> > > > filesystem that has distinguishable metadata types that are not
> > > > covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> > > > values, so it's not like we're going to run out.
> > > 
> > > This is diagnositc information as much as anything, just like
> > > fiemap is diagnostic information. So if we have specific type
> > > information, it needs to be reported accurately to be useful.
> > > 
> > > Hence I really don't care if the users and developers of other fs
> > > types don't understand what the special owner codes that a specific
> > > filesystem returns mean. i.e. it's not useful user information -
> > > only a tool that groks the specific filesystem is going to be able
> > > to anything useful with special owner codes. So, IMO, there's little
> > > point trying to make them generic or to even trying to define and
> > > explain them in the man page....
> > 
> > <shrug> I'm ok with describing generally what each special owner code
> > means.  Maybe the manpage could be more explicit about "None of these
> > codes are useful unless you're a low level filesystem tool"?
> 
> You can add that, but it doesn't address the underlying problem.
> i.e.  that we can add/change the codes, their name, meaning, etc,
> and now there's a third party man page that is incorrect and out of
> date. It's the same problem with documenting filesystem specific
> mount options in mount(8). Better, IMO, is to simple say "refer to
> filesystem specific documentation for a description of these special
> values". e.g. refer them to the XFS Filesystem Structure
> document where this is all spelled out in enough detail to be useful
> for someone thinking that they might want to use them....

We could simply put a manpage in the xfsprogs source documenting the XFS
owner codes and let other implementers make their own manpage with a
discussion of the owner codes (and whatever other quirks they have).
Sort of fragments things, but that's probably unavoidable. :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-09-11 18:56 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-25 23:26 [PATCH v8 0/3] man-pages: fix reflink/dedupe ioctl manpages Darrick J. Wong
2016-08-25 23:26 ` Darrick J. Wong
2016-08-25 23:26 ` [PATCH 1/3] man2/fallocate.2: document behavior with shared blocks Darrick J. Wong
2016-08-25 23:26   ` Darrick J. Wong
2016-08-25 23:26 ` [PATCH 2/3] man2/ioctl_fideduperange.2: clarify operation some more Darrick J. Wong
2016-08-25 23:26   ` Darrick J. Wong
2016-08-25 23:26 ` [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl Darrick J. Wong
2016-08-25 23:26   ` Darrick J. Wong
2016-08-29 21:34   ` Andreas Dilger
2016-08-30 19:09     ` Darrick J. Wong
2016-08-30 19:09       ` Darrick J. Wong
2016-08-30 19:09       ` Darrick J. Wong
2016-09-08 23:38       ` Dave Chinner
2016-09-08 23:38         ` Dave Chinner
2016-09-08 23:38         ` Dave Chinner
2016-09-09  6:07         ` Darrick J. Wong
2016-09-09  6:07           ` Darrick J. Wong
2016-09-09  6:07           ` Darrick J. Wong
2016-09-10  0:00           ` Dave Chinner
2016-09-10  0:00             ` Dave Chinner
2016-09-11 18:56             ` Darrick J. Wong
2016-09-11 18:56               ` Darrick J. Wong
2016-09-04  5:36   ` [PATCH v2 3/3] ioctl_getfsmap.2: document the GETFSMAP ioctl Darrick J. Wong
2016-09-04  5:36     ` Darrick J. Wong
2016-09-09  6:17   ` [PATCH v3 " Darrick J. Wong
2016-09-09  6:17     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.