Re: [PATCH man-pages v6] Document encoded I/O - Alejandro Colomar (man-pages)

From: "Alejandro Colomar (man-pages)" <alx.manpages@gmail.com>
To: Omar Sandoval <osandov@osandov.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>
Cc: linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>, Jann Horn <jannh@google.com>,
	Amir Goldstein <amir73il@gmail.com>,
	Aleksa Sarai <cyphar@cyphar.com>,
	linux-api@vger.kernel.org, kernel-team@fb.com,
	linux-man <linux-man@vger.kernel.org>
Subject: Re: [PATCH man-pages v6] Document encoded I/O
Date: Fri, 20 Nov 2020 16:03:44 +0100	[thread overview]
Message-ID: <05e1f13c-5776-961b-edc4-0d09d02b7829@gmail.com> (raw)
In-Reply-To: <fb4a4270-eb7a-06d5-e703-9ee470b61f8b@gmail.com>

Hi Omar,

I found a wording of mine to be a bit confusing.
Please see below.

Thanks,

Alex

On 11/20/20 3:06 PM, Alejandro Colomar (man-pages) wrote:
> Hi Omar and Michael,
> 
> please, see below.
> 
> Thanks,
> 
> Alex
> 
> On 11/20/20 12:29 AM, Alejandro Colomar (mailing lists; readonly) wrote:
>> Hi Omar,
>>
>> Please, see some fixes below:
>>
>> Michael, I've also some questions for you below
>> (you can grep for mtk to find those).
>>
>> Thanks,
>>
>> Alex
>>
>> On 11/18/20 8:18 PM, Omar Sandoval wrote:
>>> From: Omar Sandoval <osandov@fb.com>
>>>
>>> This adds a new page, encoded_io(7), providing an overview of encoded
>>> I/O and updates fcntl(2), open(2), and preadv2(2)/pwritev2(2) to
>>> reference it.
>>>
>>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>>> Cc: linux-man <linux-man@vger.kernel.org>
>>> Signed-off-by: Omar Sandoval <osandov@fb.com>
>>> ---
>>> This feature is not yet upstream.
>>>
>>>  man2/fcntl.2      |  10 +-
>>>  man2/open.2       |  23 +++
>>>  man2/readv.2      |  70 +++++++++
>>>  man7/encoded_io.7 | 369 ++++++++++++++++++++++++++++++++++++++++++++++
>>>  4 files changed, 471 insertions(+), 1 deletion(-)
>>>  create mode 100644 man7/encoded_io.7
>>>
>>> diff --git a/man2/fcntl.2 b/man2/fcntl.2
>>> index 546016617..b0d7fa2c3 100644
>>> --- a/man2/fcntl.2
>>> +++ b/man2/fcntl.2
>>> @@ -221,8 +221,9 @@ On Linux, this command can change only the
>>>  .BR O_ASYNC ,
>>>  .BR O_DIRECT ,
>>>  .BR O_NOATIME ,
>>> +.BR O_NONBLOCK ,
>>>  and
>>> -.B O_NONBLOCK
>>> +.B O_ALLOW_ENCODED
>>>  flags.
>>>  It is not possible to change the
>>>  .BR O_DSYNC
>>> @@ -1820,6 +1821,13 @@ Attempted to clear the
>>>  flag on a file that has the append-only attribute set.
>>>  .TP
>>>  .B EPERM
>>> +Attempted to set the
>>> +.B O_ALLOW_ENCODED
>>> +flag and the calling process did not have the
>>> +.B CAP_SYS_ADMIN
>>> +capability.
>>> +.TP
>>> +.B EPERM
>>>  .I cmd
>>>  was
>>>  .BR F_ADD_SEALS ,
>>> diff --git a/man2/open.2 b/man2/open.2
>>> index f587b0d95..84697dfa8 100644
>>> --- a/man2/open.2
>>> +++ b/man2/open.2
>>> @@ -437,6 +437,16 @@ was followed by a call to
>>>  .BR fdatasync (2)).
>>>  .IR "See NOTES below" .
>>>  .TP
>>> +.B O_ALLOW_ENCODED
>>
>> The list is alphabetically sorted;
>> please, follow that
>> (O_ALLOW_ENCODED should be the first one).
>>
>>> +Open the file with encoded I/O permissions;
>>> +see
>>> +.BR encoded_io (7).
>>> +.B O_CLOEXEC
>>> +must be specified in conjuction with this flag.
>>> +The caller must have the
>>> +.B CAP_SYS_ADMIN
>>> +capability.
>>> +.TP
>>>  .B O_EXCL
>>>  Ensure that this call creates the file:
>>>  if this flag is specified in conjunction with
>>> @@ -1082,6 +1092,14 @@ is invalid
>>>  (e.g., it contains characters not permitted by the underlying filesystem).
>>>  .TP
>>>  .B EINVAL
>>> +.B O_ALLOW_ENCODED
>>> +was specified in
>>> +.IR flags ,
>>> +but
>>> +.B O_CLOEXEC
>>> +was not specified.
>>> +.TP
>>> +.B EINVAL
>>>  The final component ("basename") of
>>>  .I pathname
>>>  is invalid
>>> @@ -1238,6 +1256,11 @@ did not match the owner of the file and the caller was not privileged.
>>>  The operation was prevented by a file seal; see
>>>  .BR fcntl (2).
>>>  .TP
>>> +.B EPERM
>>> +The
>>> +.B O_ALLOW_ENCODED
>>> +flag was specified, but the caller was not privileged.
>>> +.TP
>>>  .B EROFS
>>>  .I pathname
>>>  refers to a file on a read-only filesystem and write access was
>>> diff --git a/man2/readv.2 b/man2/readv.2
>>> index 5a8b74168..c9933acf0 100644
>>> --- a/man2/readv.2
>>> +++ b/man2/readv.2
>>> @@ -264,6 +264,11 @@ the data is always appended to the end of the file.
>>>  However, if the
>>>  .I offset
>>>  argument is \-1, the current file offset is updated.
>>> +.TP
>>> +.BR RWF_ENCODED " (since Linux 5.12)"
>>> +Read or write encoded (e.g., compressed) data.
>>> +See
>>> +.BR encoded_io (7).
>>>  .SH RETURN VALUE
>>>  On success,
>>>  .BR readv (),
>>> @@ -283,6 +288,13 @@ than requested (see
>>>  and
>>>  .BR write (2)).
>>>  .PP
>>> +If
>>> +.B
>>> +RWF_ENCODED
>>
>> RWF_ENCODED should go in the same line as .B:
>>
>> [
>> .B RWF_ENCODED
>> ]
>>
>>> +was specified in
>>> +.IR flags ,
>>> +then the return value is the number of encoded bytes.
>>> +.PP
>>>  On error, \-1 is returned, and \fIerrno\fP is set appropriately.
>>>  .SH ERRORS
>>>  The errors are as given for
>>> @@ -313,6 +325,64 @@ is less than zero or greater than the permitted maximum.
>>>  .TP
>>>  .B EOPNOTSUPP
>>>  An unknown flag is specified in \fIflags\fP.
>>> +.TP
>>> +.B EOPNOTSUPP
>>> +.B RWF_ENCODED
>>> +is specified in
>>> +.I flags
>>> +and the filesystem does not implement encoded I/O.
>>> +.TP
>>> +.B EPERM
>>> +.B RWF_ENCODED
>>> +is specified in
>>> +.I flags
>>> +and the file was not opened with the
>>> +.B O_ALLOW_ENCODED
>>> +flag.
>>> +.PP
>>> +.BR preadv2 ()
>>> +can fail for the following reasons:
>>
>> The wording is a bit unclear:
>>
>> Above your additions (old text, not yours),
>> it says that some errors apply to preadv2
>> (as well as to other functions):
>>
>> [
>> ERRORS
>>        The errors are as given for read(2) and write(2).  Furthermore,
>>        preadv(),  preadv2(),  pwritev(),  and pwritev2() can also fail
>>        for the same reasons as lseek(2).  Additionally, the  following
>>        errors are defined:
>>
>>        EINVAL The  sum  of  the  iov_len  values  overflows an ssize_t
>>               value.
>>
>>        EINVAL The vector count, iovcnt, is less than zero  or  greater
>>               than the permitted maximum.
>>
>>        EOPNOTSUPP
>>               An unknown flag is specified in flags.
>>
>>        EOPNOTSUPP
>>               RWF_ENCODED  is  specified  in  flags and the filesystem
>>               does not implement encoded I/O.
>>
>>        EPERM  RWF_ENCODED is specified in flags and the file  was  not
>>               opened with the O_ALLOW_ENCODED flag.
>> ]
>>
>> And then you added a line that says:
>>
>> [
>>        preadv2() can fail for the following reasons:
>> ]
>>
>> Which if read strictly, it says that [only] the following errors apply.
>>
>> Did you mean that
>> "preadv3() can _additionally_ fail for the following reasons"?
>>
>> Could you please be a bit more specific?
>>
>> The same applies for pwritev2() below.
>>
>>> +.TP
>>> +.B E2BIG
>>> +.B RWF_ENCODED
>>> +is specified in
>>> +.I flags
>>> +and
>>> +.I iov[0]
>>> +is not large enough to return the encoding metadata.
>>> +.TP
>>> +.B ENOBUFS
>>> +.B RWF_ENCODED
>>> +is specified in
>>> +.I flags
>>> +and the buffers in
>>> +.I iov
>>> +are not big enough to return the encoded data.
>>> +.PP
>>> +.BR pwritev2 ()
>>> +can fail for the following reasons:
>>> +.TP
>>> +.B E2BIG
>>> +.B RWF_ENCODED
>>> +is specified in
>>> +.I flags
>>> +and
>>> +.I iov[0]
>>> +contains non-zero fields
>>> +after the kernel's
>>> +.IR "sizeof(struct\ encoded_iov)" .
>>
>> Don't escape the space, if the string is already in "".
>>
>>> +.TP
>>> +.B EINVAL
>>> +.B RWF_ENCODED
>>> +is specified in
>>> +.I flags
>>> +and the encoding is unknown or not supported by the filesystem.
>>> +.TP
>>> +.B EINVAL
>>> +.B RWF_ENCODED
>>> +is specified in
>>> +.I flags
>>> +and the alignment and/or size requirements are not met.
>>>  .SH VERSIONS
>>>  .BR preadv ()
>>>  and
>>> diff --git a/man7/encoded_io.7 b/man7/encoded_io.7
>>> new file mode 100644
>>> index 000000000..106fa587b
>>> --- /dev/null
>>> +++ b/man7/encoded_io.7
>>> @@ -0,0 +1,369 @@
>>> +.\" Copyright (c) 2020 by Omar Sandoval <osandov@fb.com>
>>> +.\"
>>> +.\" %%%LICENSE_START(VERBATIM)
>>> +.\" Permission is granted to make and distribute verbatim copies of this
>>> +.\" manual provided the copyright notice and this permission notice are
>>> +.\" preserved on all copies.
>>> +.\"
>>> +.\" Permission is granted to copy and distribute modified versions of this
>>> +.\" manual under the conditions for verbatim copying, provided that the
>>> +.\" entire resulting derived work is distributed under the terms of a
>>> +.\" permission notice identical to this one.
>>> +.\"
>>> +.\" Since the Linux kernel and libraries are constantly changing, this
>>> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
>>> +.\" responsibility for errors or omissions, or for damages resulting from
>>> +.\" the use of the information contained herein.  The author(s) may not
>>> +.\" have taken the same level of care in the production of this manual,
>>> +.\" which is licensed free of charge, as they might when working
>>> +.\" professionally.
>>> +.\"
>>> +.\" Formatted or processed versions of this manual, if unaccompanied by
>>> +.\" the source, must acknowledge the copyright and authors of this work.
>>> +.\" %%%LICENSE_END
>>> +.\"
>>> +.\"
>>> +.TH ENCODED_IO  7 2020-11-11 "Linux" "Linux Programmer's Manual"
>>> +.SH NAME
>>> +encoded_io \- overview of encoded I/O
>>> +.SH DESCRIPTION
>>> +Several filesystems (e.g., Btrfs) support transparent encoding
>>> +(e.g., compression, encryption) of data on disk:
>>> +written data is encoded by the kernel before it is written to disk,
>>> +and read data is decoded before being returned to the user.
>>> +In some cases, it is useful to skip this encoding step.
>>
>> Here I would use ';' instead of '.'
>> (and next letter would be lowercase, then).
>>
>>> +For example, the user may want to read the compressed contents of a file
>>> +or write pre-compressed data directly to a file.
>>> +This is referred to as "encoded I/O".
>>> +.SS Encoded I/O API
>>> +Encoded I/O is specified with the
>>> +.B RWF_ENCODED
>>> +flag to
>>> +.BR preadv2 (2)
>>> +and
>>> +.BR pwritev2 (2).
>>> +If
>>> +.B RWF_ENCODED
>>> +is specified, then
>>> +.I iov[0].iov_base
>>> +points to an
>>> +.I
>>> +encoded_iov
>>
>> On the same line, please.
>>
>>> +structure, defined in
>>> +.I <linux/fs.h>
>>> +as:
>>> +.PP
>>> +.in +4n
>>> +.EX
>>> +struct encoded_iov {
>>> +    __aligned_u64 len;
>>> +    __aligned_u64 unencoded_len;
>>> +    __aligned_u64 unencoded_offset;
>>> +    __u32 compression;
>>> +    __u32 encryption;
>>> +};
>>> +.EE
>>> +.in
>>> +.PP
>>> +This may be extended in the future, so
>>> +.I iov[0].iov_len
>>> +must be set to
>>> +.I "sizeof(struct\ encoded_iov)"
>>> +for forward/backward compatibility.
>>> +The remaining buffers contain the encoded data.
>>> +.PP
>>> +.I compression
>>> +and
>>> +.I encryption
>>> +are the encoding fields.
>>> +.I compression
>>> +is
>>> +.B ENCODED_IOV_COMPRESSION_NONE
>>> +(zero)
>>> +or a filesystem-specific
>>> +.B ENCODED_IOV_COMPRESSION
>>
>> Maybe s/ENCODED_IOV_COMPRESSION/ENCODED_IOV_COMPRESSION_*/
> 
> Or s/ENCODED_IOV_COMPRESSION/ENCODED_IOV_COMPRESSION_/
> 
> I'm not sure about existing practice.
> 
> Michael (mtk), what would you do here?
> 
>>
>>> +constant;
>>> +see
>>> +.BR Filesystem\ support .
>>
>> Please, write it as [.BR "Filesystem support" .]
>>
>> and maybe I would change it, to be more specific, to the following:
>>
>> [
>> see
>> .B Filesystem support
>> below.
>> ]
>>
>> So that the reader clearly understands it's on the same page.
>>
>>> +.I encryption
>>> +is currently always
>>> +.B ENCODED_IOV_ENCRYPTION_NONE
>>> +(zero).
>>> +.PP
>>> +.I unencoded_len
>>> +is the length of the unencoded (i.e., decrypted and decompressed) data.
>>> +.I unencoded_offset
>>> +is the offset into the unencoded data where the data in the file begins
>>
>> The above wording is a bit unclear to me.
>>
>> I suggest the following:
>>
>> [
>> .I unencoded_offset
>> is the offset from the begining of the file
>> to the first byte of the unencoded data
>> ]

Now I've read it again, and my wording was even worse than yours.
I think yours can be understood after a few reads.

However, I'll still try to reword mine to see if I add some value:

[
.I unencoded_offset
is the offset from the first byte of the unencoded data
to the first byte of logical data.
]

If you prefer yours, or a mix, that's fine.

>>
>>> +(less than or equal to
>>> +.IR unencoded_len ).
>>> +.I len
>>> +is the length of the data in the file
>>> +(less than or equal to
>>> +.I unencoded_len
>>> +-
>>
>> Here's a question for Michael (mtk):
>>
>> I've seen (many) cases where these math operations
>> are written without spaces,
>> and in the same line (e.g., [.IR a + b]).
>>
>> I'd like to know your preferences on this,
>> or what is actually more extended in the manual pages,
>> to stick with only one of them.
>>
>>> +.IR unencoded_offset ).
>>> +See
>>> +.B Extent layout
>>> +below for some examples.
>>> +.I
>>
>> Were you maybe going to add something there?
>>
>> If not, please remove that [.I].
>>
>>> +.PP
>>> +If the unencoded data is actually longer than
>>> +.IR unencoded_len ,
>>> +then it is truncated;
>>> +if it is shorter, then it is extended with zeroes.
>>> +.PP
>>> +
>>
>> Please, remove that blank line.
>>
>>> +.BR pwritev2 ()
>>
>> Should be [.BR pwritev2 (2)]
>>
>> Michael (mtk),
>>
>> Am I right in that?  Please, confirm.
>>
>>> +uses the metadata specified in
>>> +.IR iov[0] ,
>>> +writes the encoded data from the remaining buffers,
>>> +and returns the number of encoded bytes written
>>> +(that is, the sum of
>>> +.I iov[n].iov_len
>>> +for 1 <=
>>> +.I n
>>> +<
>>> +.IR iovcnt ;
>>> +partial writes will not occur).
>>> +At least one encoding field must be non-zero.
>>> +Note that the encoded data is not validated when it is written;
>>> +if it is not valid (e.g., it cannot be decompressed),
>>> +then a subsequent read may return an error.
>>> +If the
>>> +.I offset
>>> +argument to
>>> +.BR pwritev2 ()
>>
>> Same as above: specify (2).
>>
>>> +is -1, then the file offset is incremented by
>>> +.IR len .
>>> +If
>>> +.I iov[0].iov_len
>>> +is less than
>>> +.I "sizeof(struct\ encoded_iov)"
>>
>> [.I] allows spaces, so it should be:
>>
>> [
>> .I sizeof(struct encoded_iov)
>> ]
>>
>>> +in the kernel,
>>> +then any fields unknown to userspace are treated as if they were zero;
>>
>> s/userspace/user space/
>>
>> See man-pages(7)::STYLE GUIDE::Preferred terms
>>
>>> +if it is greater and any fields unknown to the kernel are non-zero,
>>> +then this returns -1 and sets
>>> +.I errno
>>> +to
>>> +.BR E2BIG .
>>> +.PP
>>> +.BR preadv2 ()
>>
>> Same as above: specify (2).
>>
>>> +populates the metadata in
>>> +.IR iov[0] ,
>>> +the encoded data in the remaining buffers,
>>> +and returns the number of encoded bytes read.
>>> +This will only return one extent per call.
>>> +This can also read data which is not encoded;
>>> +all encoding fields will be zero in that case.
>>> +If the
>>> +.I offset
>>> +argument to
>>> +.BR preadv2 ()
>>
>> Smae as above: specify (2).
>>
>>> +is -1, then the file offset is incremented by
>>> +.IR len .
>>> +If
>>> +.I iov[0].iov_len
>>> +is less than
>>> +.I "sizeof(struct\ encoded_iov)"
>>
>> Don't need '"' nor '\', as above.
>>
>>> +in the kernel and any fields unknown to userspace are non-zero,
>>
>> s/userspace/user space/
>>
>>> +then
>>> +.BR preadv2 ()
>>
>> (2)
>>
>>> +returns -1 and sets
>>> +.I errno
>>> +to
>>> +.BR E2BIG ;
>>> +if it is greater,
>>> +then any fields unknown to the kernel are returned as zero.
>>> +If the provided buffers are not large enough to return an entire encoded
>>> +extent,
>>
>> Please use semantic newlines.
>> I haven't checked that in the text above,
>> so if you happen to find that there's any other line
>> that should also be fixed in that sense, please do so.
>>
>> To understand 'semantic newlines',
>> please have a look at
>> man-pages(7)::STYLE GUIDE::Use semantic newlines
>>
>> Basically, split lines at the most natural separation point,
>> instead of just when the line gets over the margin.
>>
>>> +then
>>> +.BR preadv2 ()
>>
>> (2)
>>
>>> +returns -1 and sets
>>> +.I errno
>>> +to
>>> +.BR ENOBUFS .
>>> +.PP
>>> +As the filesystem page cache typically contains decoded data,
>>> +encoded I/O bypasses the page cache.
>>> +.SS Extent layout
>>> +By using
>>> +.IR len ,
>>> +.IR unencoded_len ,
>>> +and
>>> +.IR unencoded_offset ,
>>> +it is possible to refer to a subset of an unencoded extent.
>>> +.PP
>>> +In the simplest case,
>>> +.I len
>>> +is equal to
>>> +.I unencoded_len
>>> +and
>>> +.I unencoded_offset
>>> +is zero.
>>> +This means that the entire unencoded extent is used.
>>> +.PP
>>> +However, suppose we read 50 bytes into a file
>>> +which contains a single compressed extent.
>>> +The filesystem must still return the entire compressed extent
>>> +for us to be able to decompress it,
>>> +so
>>> +.I unencoded_len
>>> +would be the length of the entire decompressed extent.
>>> +However, because the read was at offset 50,
>>> +the first 50 bytes should be ignored.
>>> +Therefore,
>>> +.I unencoded_offset
>>> +would be 50,
>>> +and
>>> +.I len
>>> +would accordingly be
>>> +.IR unencoded_len\ -\ 50 .
>>
>> This formats everything as I, except for the last dot.
>> Replace by:
>>
>> [
>> .I unencoded
>> - 50.
>> ]
>>
>> Michael (mtk), same as above:
>> to space, or not to space?  That is the question :p
>>
>> Personally, I find spaces more clear.
>>
>>> +.PP
>>> +Additionally, suppose we want to create an encrypted file with length 500,
>>> +but the file is encrypted with a block cipher using a block size of 4096.
>>> +The unencoded data would therefore include the appropriate padding,
>>> +and
>>> +.I unencoded_len
>>> +would be 4096.
>>> +However, to represent the logical size of the file,
>>> +.I len
>>> +would be 500
>>> +(and
>>> +.I unencoded_offset
>>> +would be 0).
>>> +.PP
>>> +Similar situations can arise in other cases:
>>> +.IP * 3
>>> +If the filesystem pads data to the filesystem block size before compressing,
>>> +then compressed files with a size unaligned to the filesystem block size will
>>> +end with an extent with
>>> +.I len
>>> +<
>>> +.IR unencoded_len .
>>> +.IP *
>>> +Extents cloned from the middle of a larger encoded extent with
>>> +.B FICLONERANGE
>>> +may have a non-zero
>>> +.I unencoded_offset
>>> +and/or
>>> +.I len
>>> +<
>>> +.IR unencoded_len .
>>> +.IP *
>>> +If the middle of an encoded extent is overwritten,
>>> +the filesystem may create extents with a non-zero
>>> +.I unencoded_offset
>>> +and/or
>>> +.I len
>>> +<
>>> +.I unencoded_len
>>> +for the parts that were not overwritten.
>>> +.SS Security
>>> +Encoded I/O creates the potential for some security issues:
>>> +.IP * 3
>>> +Encoded writes allow writing arbitrary data which the kernel will decode on
>>> +a subsequent read. Decompression algorithms are complex and may have bugs
>>> +which can be exploited by maliciously crafted data.
>>> +.IP *
>>> +Encoded reads may return data which is not logically present in the file
>>> +(see the discussion of
>>> +.I len
>>> +vs.
>>
>> Please, s/vs./vs/
>> See the reasons below:
>>
>> Michael (mtk),
>>
>> Here the renderer outputs a double space
>> (as for separating two sentences).
>>
>> Are you okay with that?
>>
>> I haven't found any other "\<vs\>\.".
>> However, I've found a few "\<vs\>[^\.]".
>>
>>> +.I unencoded_len
>>> +above).
>>> +It may not be intended for this data to be readable.
>>> +.PP
>>> +Therefore, encoded I/O requires privilege.
>>> +Namely, the
>>> +.B RWF_ENCODED
>>> +flag may only be used when the file was opened with the
>>> +.B O_ALLOW_ENCODED
>>> +flag to
>>> +.BR open (2),
>>> +which requires the
>>> +.B CAP_SYS_ADMIN
>>> +capability.
>>> +The
>>> +.B O_CLOEXEC
>>> +flag must be specified in conjunction with
>>> +.BR O_ALLOW_ENCODED .
>>> +This avoids accidentally leaking the encoded I/O privilege
>>> +(it is not cleared on
>>> +.BR fork (2)
>>> +or
>>> +.BR execve (2)
>>> +otherwise).
>>> +If
>>> +.B O_ALLOW_ENCODED
>>> +without
>>> +.B O_CLOEXEC
>>> +is desired,
>>> +.B O_CLOEXEC
>>> +can be cleared afterwards with
>>> +.BR fnctl (2).
>>> +.BR fcntl (2)
>>> +can also clear or set
>>> +.B O_ALLOW_ENCODED
>>> +(including without
>>> +.BR O_CLOEXEC ).
>>> +.SS Filesystem support
>>> +Encoded I/O is supported on the following filesystems:
>>> +.TP
>>> +Btrfs (since Linux 5.12)
>>> +.IP
>>> +Btrfs supports encoded reads and writes of compressed data.
>>> +The data is encoded as follows:
>>> +.RS
>>> +.IP * 3
>>> +If
>>> +.I compression
>>> +is
>>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_ZLIB ,
>>> +then the encoded data is a single zlib stream.
>>> +.IP *
>>> +If
>>> +.I compression
>>> +is
>>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_ZSTD ,
>>> +then the encoded data is a single zstd frame compressed with the
>>> +.I windowLog
>>> +compression parameter set to no more than 17.
>>> +.IP *
>>> +If
>>> +.I compression
>>> +is one of
>>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_LZO_4K ,
>>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_LZO_8K ,
>>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_LZO_16K ,
>>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_LZO_32K ,
>>> +or
>>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_LZO_64K ,
>>> +then the encoded data is compressed page by page
>>> +(using the page size indicated by the name of the constant)
>>> +with LZO1X
>>> +and wrapped in the format documented in the Linux kernel source file
>>> +.IR fs/btrfs/lzo.c .
>>> +.RE
>>> +.IP
>>> +Additionally, there are some restrictions on
>>> +.BR pwritev2 ():
>>
>> (2)
>>
>>> +.RS
>>> +.IP * 3
>>> +.I offset
>>> +(or the current file offset if
>>> +.I offset
>>> +is -1) must be aligned to the sector size of the filesystem.
>>> +.IP *
>>> +.I len
>>> +must be aligned to the sector size of the filesystem
>>> +unless the data ends at or beyond the current end of the file.
>>> +.IP *
>>> +.I unencoded_len
>>> +and the length of the encoded data must each be no more than 128 KiB.
>>> +This limit may increase in the future.
>>> +.IP *
>>> +The length of the encoded data must be less than or equal to
>>> +.IR unencoded_len .
>>> +.IP *
>>> +If using LZO, the filesystem's page size must match the compression page size.
>>> +.RE
>>>
>>
>> Please, add a SEE ALSO section, which should at least point to
>> preadv2(2) (or pwritev2(2), if you prefer):
>>
>> [
>> .SH SEE ALSO
>> .BR preadv2 (2)
>> ]
>>