linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: "Alejandro Colomar (man-pages)" <alx.manpages@gmail.com>,
	Omar Sandoval <osandov@osandov.com>
Cc: mtk.manpages@gmail.com, linux-fsdevel@vger.kernel.org,
	linux-btrfs@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>, Jann Horn <jannh@google.com>,
	Amir Goldstein <amir73il@gmail.com>,
	Aleksa Sarai <cyphar@cyphar.com>,
	linux-api@vger.kernel.org, kernel-team@fb.com,
	linux-man <linux-man@vger.kernel.org>
Subject: Re: [PATCH man-pages v6] Document encoded I/O
Date: Tue, 1 Dec 2020 21:12:47 +0100	[thread overview]
Message-ID: <dcb0679d-3ac5-dd95-5473-3c66ae4132b6@gmail.com> (raw)
In-Reply-To: <05e1f13c-5776-961b-edc4-0d09d02b7829@gmail.com>

Hello Alex,

On 11/20/20 4:03 PM, Alejandro Colomar (man-pages) wrote:
> Hi Omar,
> 
> I found a wording of mine to be a bit confusing.
> Please see below.
> 
> Thanks,
> 
> Alex
> 
> On 11/20/20 3:06 PM, Alejandro Colomar (man-pages) wrote:
>> Hi Omar and Michael,
>>
>> please, see below.
>>
>> Thanks,
>>
>> Alex
>>
>> On 11/20/20 12:29 AM, Alejandro Colomar (mailing lists; readonly) wrote:
>>> Hi Omar,
>>>
>>> Please, see some fixes below:
>>>
>>> Michael, I've also some questions for you below
>>> (you can grep for mtk to find those).
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>> On 11/18/20 8:18 PM, Omar Sandoval wrote:
>>>> From: Omar Sandoval <osandov@fb.com>
>>>>
>>>> This adds a new page, encoded_io(7), providing an overview of encoded
>>>> I/O and updates fcntl(2), open(2), and preadv2(2)/pwritev2(2) to
>>>> reference it.
>>>>
>>>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>>>> Cc: linux-man <linux-man@vger.kernel.org>
>>>> Signed-off-by: Omar Sandoval <osandov@fb.com>
>>>> ---
>>>> This feature is not yet upstream.
>>>>
>>>>  man2/fcntl.2      |  10 +-
>>>>  man2/open.2       |  23 +++
>>>>  man2/readv.2      |  70 +++++++++
>>>>  man7/encoded_io.7 | 369 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>  4 files changed, 471 insertions(+), 1 deletion(-)
>>>>  create mode 100644 man7/encoded_io.7
>>>>
>>>> diff --git a/man2/fcntl.2 b/man2/fcntl.2
>>>> index 546016617..b0d7fa2c3 100644
>>>> --- a/man2/fcntl.2
>>>> +++ b/man2/fcntl.2
>>>> @@ -221,8 +221,9 @@ On Linux, this command can change only the

[...]

>>>> +.PP
>>>> +This may be extended in the future, so
>>>> +.I iov[0].iov_len
>>>> +must be set to
>>>> +.I "sizeof(struct\ encoded_iov)"
>>>> +for forward/backward compatibility.
>>>> +The remaining buffers contain the encoded data.
>>>> +.PP
>>>> +.I compression
>>>> +and
>>>> +.I encryption
>>>> +are the encoding fields.
>>>> +.I compression
>>>> +is
>>>> +.B ENCODED_IOV_COMPRESSION_NONE
>>>> +(zero)
>>>> +or a filesystem-specific
>>>> +.B ENCODED_IOV_COMPRESSION
>>>
>>> Maybe s/ENCODED_IOV_COMPRESSION/ENCODED_IOV_COMPRESSION_*/
>>
>> Or s/ENCODED_IOV_COMPRESSION/ENCODED_IOV_COMPRESSION_/
>>
>> I'm not sure about existing practice.
>>
>> Michael (mtk), what would you do here?

I think I've tended towards the former
(ENCODED_IOV_COMPRESSION_*) in the past.

>>
>>>
>>>> +constant;
>>>> +see
>>>> +.BR Filesystem\ support .
>>>
>>> Please, write it as [.BR "Filesystem support" .]
>>>
>>> and maybe I would change it, to be more specific, to the following:
>>>
>>> [
>>> see
>>> .B Filesystem support
>>> below.
>>> ]
>>>
>>> So that the reader clearly understands it's on the same page.
>>>
>>>> +.I encryption
>>>> +is currently always
>>>> +.B ENCODED_IOV_ENCRYPTION_NONE
>>>> +(zero).
>>>> +.PP
>>>> +.I unencoded_len
>>>> +is the length of the unencoded (i.e., decrypted and decompressed) data.
>>>> +.I unencoded_offset
>>>> +is the offset into the unencoded data where the data in the file begins
>>>
>>> The above wording is a bit unclear to me.
>>>
>>> I suggest the following:
>>>
>>> [
>>> .I unencoded_offset
>>> is the offset from the begining of the file
>>> to the first byte of the unencoded data
>>> ]
> 
> Now I've read it again, and my wording was even worse than yours.
> I think yours can be understood after a few reads.
> 
> However, I'll still try to reword mine to see if I add some value:
> 
> [
> .I unencoded_offset
> is the offset from the first byte of the unencoded data
> to the first byte of logical data.
> ]
> 
> If you prefer yours, or a mix, that's fine.
> 
>>>
>>>> +(less than or equal to
>>>> +.IR unencoded_len ).
>>>> +.I len
>>>> +is the length of the data in the file
>>>> +(less than or equal to
>>>> +.I unencoded_len
>>>> +-
>>>
>>> Here's a question for Michael (mtk):
>>>
>>> I've seen (many) cases where these math operations
>>> are written without spaces,
>>> and in the same line (e.g., [.IR a + b]).
>>>
>>> I'd like to know your preferences on this,
>>> or what is actually more extended in the manual pages,
>>> to stick with only one of them.

I suspect there's a lot of inconsistency across pages. For simple
cases like this, I think writing it without spaces is fine, and
perhaps even preferable.

>>>
>>>> +.IR unencoded_offset ).
>>>> +See
>>>> +.B Extent layout
>>>> +below for some examples.
>>>> +.I
>>>
>>> Were you maybe going to add something there?
>>>
>>> If not, please remove that [.I].
>>>
>>>> +.PP
>>>> +If the unencoded data is actually longer than
>>>> +.IR unencoded_len ,
>>>> +then it is truncated;
>>>> +if it is shorter, then it is extended with zeroes.
>>>> +.PP
>>>> +
>>>
>>> Please, remove that blank line.
>>>
>>>> +.BR pwritev2 ()
>>>
>>> Should be [.BR pwritev2 (2)]
>>>
>>> Michael (mtk),
>>>
>>> Am I right in that?  Please, confirm.

Yes. References to functions documented in other pages should
include the section number in parentheses.

[...]

>>>> +.PP
>>>> +However, suppose we read 50 bytes into a file
>>>> +which contains a single compressed extent.
>>>> +The filesystem must still return the entire compressed extent
>>>> +for us to be able to decompress it,
>>>> +so
>>>> +.I unencoded_len
>>>> +would be the length of the entire decompressed extent.
>>>> +However, because the read was at offset 50,
>>>> +the first 50 bytes should be ignored.
>>>> +Therefore,
>>>> +.I unencoded_offset
>>>> +would be 50,
>>>> +and
>>>> +.I len
>>>> +would accordingly be
>>>> +.IR unencoded_len\ -\ 50 .
>>>
>>> This formats everything as I, except for the last dot.
>>> Replace by:
>>>
>>> [
>>> .I unencoded
>>> - 50.
>>> ]
>>>
>>> Michael (mtk), same as above:
>>> to space, or not to space?  That is the question :p

In this case, perhaps

.IR unencoded \-1

[...]


>>>> +.SS Security
>>>> +Encoded I/O creates the potential for some security issues:
>>>> +.IP * 3
>>>> +Encoded writes allow writing arbitrary data which the kernel will decode on
>>>> +a subsequent read. Decompression algorithms are complex and may have bugs
>>>> +which can be exploited by maliciously crafted data.
>>>> +.IP *
>>>> +Encoded reads may return data which is not logically present in the file
>>>> +(see the discussion of
>>>> +.I len
>>>> +vs.
>>>
>>> Please, s/vs./vs/
>>> See the reasons below:
>>>
>>> Michael (mtk),
>>>
>>> Here the renderer outputs a double space
>>> (as for separating two sentences).
>>>
>>> Are you okay with that?

Yes, that should probably be avoided. I'm not sure what the
correct way is to prevent that in groff though. I mean, one
could write

.RI "vs.\ " unencoded_len

but I think that simply creates a nonbreaking space,
which is not exactly what is desired.

[....]

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

  parent reply	other threads:[~2020-12-01 20:13 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-18 19:18 [PATCH v6 00/11] fs: interface for directly reading/writing compressed data Omar Sandoval
2020-11-18 19:18 ` [PATCH man-pages v6] Document encoded I/O Omar Sandoval
2020-11-19 23:29   ` Alejandro Colomar (mailing lists; readonly)
2020-11-20 14:06     ` Alejandro Colomar (man-pages)
2020-11-20 15:03       ` Alejandro Colomar (man-pages)
2020-11-30 19:35         ` Omar Sandoval
2020-12-01 14:36         ` Ping: " Alejandro Colomar (man-pages)
2020-12-01 20:12         ` Michael Kerrisk (man-pages) [this message]
2020-12-01 20:20           ` Michael Kerrisk (man-pages)
2020-12-01 21:35             ` Alejandro Colomar (man-pages)
2020-12-01 21:56               ` Michael Kerrisk (man-pages)
2020-12-18 10:32                 ` Ping: " Alejandro Colomar (man-pages)
2021-01-12  1:12                   ` Omar Sandoval
2020-12-01 20:21           ` G. Branden Robinson
2020-12-01 21:34             ` Alejandro Colomar (man-pages)
2020-12-01 21:58             ` Michael Kerrisk (man-pages)
2020-11-18 19:18 ` [PATCH v6 01/11] iov_iter: add copy_struct_from_iter() Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 02/11] fs: add O_ALLOW_ENCODED open flag Omar Sandoval
2020-11-19  7:02   ` Amir Goldstein
2020-11-20 23:41     ` Jann Horn
2020-11-30 19:26       ` Omar Sandoval
2020-12-01  8:15         ` Amir Goldstein
2020-12-01 20:31           ` Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 03/11] fs: add RWF_ENCODED for reading/writing compressed data Omar Sandoval
2020-11-19  7:38   ` Amir Goldstein
2021-01-11 23:06     ` Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 04/11] btrfs: fix btrfs_write_check() Omar Sandoval
2020-11-23 17:08   ` David Sterba
2020-11-30 19:18     ` Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 05/11] btrfs: fix check_data_csum() error message for direct I/O Omar Sandoval
2020-11-23 17:09   ` David Sterba
2020-11-30 19:20     ` Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 06/11] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 07/11] btrfs: add ram_bytes and offset to btrfs_ordered_extent Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 08/11] btrfs: support different disk extent size for delalloc Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 09/11] btrfs: optionally extend i_size in cow_file_range_inline() Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 10/11] btrfs: implement RWF_ENCODED reads Omar Sandoval
2020-12-03 14:32   ` Josef Bacik
2021-01-11 20:21     ` Omar Sandoval
2021-01-11 20:35       ` Josef Bacik
2021-01-11 20:58         ` Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 11/11] btrfs: implement RWF_ENCODED writes Omar Sandoval
2020-12-02 22:03   ` Josef Bacik
2020-12-03 14:37   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dcb0679d-3ac5-dd95-5473-3c66ae4132b6@gmail.com \
    --to=mtk.manpages@gmail.com \
    --cc=alx.manpages@gmail.com \
    --cc=amir73il@gmail.com \
    --cc=cyphar@cyphar.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jannh@google.com \
    --cc=kernel-team@fb.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).