All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: "Alejandro Colomar (man-pages)" <alx.manpages@gmail.com>,
	Omar Sandoval <osandov@osandov.com>
Cc: mtk.manpages@gmail.com, linux-fsdevel@vger.kernel.org,
	linux-btrfs@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>, Jann Horn <jannh@google.com>,
	Amir Goldstein <amir73il@gmail.com>,
	Aleksa Sarai <cyphar@cyphar.com>,
	linux-api@vger.kernel.org, kernel-team@fb.com,
	linux-man <linux-man@vger.kernel.org>
Subject: Re: [PATCH man-pages v6] Document encoded I/O
Date: Tue, 1 Dec 2020 21:12:47 +0100	[thread overview]
Message-ID: <dcb0679d-3ac5-dd95-5473-3c66ae4132b6@gmail.com> (raw)
In-Reply-To: <05e1f13c-5776-961b-edc4-0d09d02b7829@gmail.com>

Hello Alex,

On 11/20/20 4:03 PM, Alejandro Colomar (man-pages) wrote:
> Hi Omar,
> 
> I found a wording of mine to be a bit confusing.
> Please see below.
> 
> Thanks,
> 
> Alex
> 
> On 11/20/20 3:06 PM, Alejandro Colomar (man-pages) wrote:
>> Hi Omar and Michael,
>>
>> please, see below.
>>
>> Thanks,
>>
>> Alex
>>
>> On 11/20/20 12:29 AM, Alejandro Colomar (mailing lists; readonly) wrote:
>>> Hi Omar,
>>>
>>> Please, see some fixes below:
>>>
>>> Michael, I've also some questions for you below
>>> (you can grep for mtk to find those).
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>> On 11/18/20 8:18 PM, Omar Sandoval wrote:
>>>> From: Omar Sandoval <osandov@fb.com>
>>>>
>>>> This adds a new page, encoded_io(7), providing an overview of encoded
>>>> I/O and updates fcntl(2), open(2), and preadv2(2)/pwritev2(2) to
>>>> reference it.
>>>>
>>>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>>>> Cc: linux-man <linux-man@vger.kernel.org>
>>>> Signed-off-by: Omar Sandoval <osandov@fb.com>
>>>> ---
>>>> This feature is not yet upstream.
>>>>
>>>>  man2/fcntl.2      |  10 +-
>>>>  man2/open.2       |  23 +++
>>>>  man2/readv.2      |  70 +++++++++
>>>>  man7/encoded_io.7 | 369 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>  4 files changed, 471 insertions(+), 1 deletion(-)
>>>>  create mode 100644 man7/encoded_io.7
>>>>
>>>> diff --git a/man2/fcntl.2 b/man2/fcntl.2
>>>> index 546016617..b0d7fa2c3 100644
>>>> --- a/man2/fcntl.2
>>>> +++ b/man2/fcntl.2
>>>> @@ -221,8 +221,9 @@ On Linux, this command can change only the

[...]

>>>> +.PP
>>>> +This may be extended in the future, so
>>>> +.I iov[0].iov_len
>>>> +must be set to
>>>> +.I "sizeof(struct\ encoded_iov)"
>>>> +for forward/backward compatibility.
>>>> +The remaining buffers contain the encoded data.
>>>> +.PP
>>>> +.I compression
>>>> +and
>>>> +.I encryption
>>>> +are the encoding fields.
>>>> +.I compression
>>>> +is
>>>> +.B ENCODED_IOV_COMPRESSION_NONE
>>>> +(zero)
>>>> +or a filesystem-specific
>>>> +.B ENCODED_IOV_COMPRESSION
>>>
>>> Maybe s/ENCODED_IOV_COMPRESSION/ENCODED_IOV_COMPRESSION_*/
>>
>> Or s/ENCODED_IOV_COMPRESSION/ENCODED_IOV_COMPRESSION_/
>>
>> I'm not sure about existing practice.
>>
>> Michael (mtk), what would you do here?

I think I've tended towards the former
(ENCODED_IOV_COMPRESSION_*) in the past.

>>
>>>
>>>> +constant;
>>>> +see
>>>> +.BR Filesystem\ support .
>>>
>>> Please, write it as [.BR "Filesystem support" .]
>>>
>>> and maybe I would change it, to be more specific, to the following:
>>>
>>> [
>>> see
>>> .B Filesystem support
>>> below.
>>> ]
>>>
>>> So that the reader clearly understands it's on the same page.
>>>
>>>> +.I encryption
>>>> +is currently always
>>>> +.B ENCODED_IOV_ENCRYPTION_NONE
>>>> +(zero).
>>>> +.PP
>>>> +.I unencoded_len
>>>> +is the length of the unencoded (i.e., decrypted and decompressed) data.
>>>> +.I unencoded_offset
>>>> +is the offset into the unencoded data where the data in the file begins
>>>
>>> The above wording is a bit unclear to me.
>>>
>>> I suggest the following:
>>>
>>> [
>>> .I unencoded_offset
>>> is the offset from the begining of the file
>>> to the first byte of the unencoded data
>>> ]
> 
> Now I've read it again, and my wording was even worse than yours.
> I think yours can be understood after a few reads.
> 
> However, I'll still try to reword mine to see if I add some value:
> 
> [
> .I unencoded_offset
> is the offset from the first byte of the unencoded data
> to the first byte of logical data.
> ]
> 
> If you prefer yours, or a mix, that's fine.
> 
>>>
>>>> +(less than or equal to
>>>> +.IR unencoded_len ).
>>>> +.I len
>>>> +is the length of the data in the file
>>>> +(less than or equal to
>>>> +.I unencoded_len
>>>> +-
>>>
>>> Here's a question for Michael (mtk):
>>>
>>> I've seen (many) cases where these math operations
>>> are written without spaces,
>>> and in the same line (e.g., [.IR a + b]).
>>>
>>> I'd like to know your preferences on this,
>>> or what is actually more extended in the manual pages,
>>> to stick with only one of them.

I suspect there's a lot of inconsistency across pages. For simple
cases like this, I think writing it without spaces is fine, and
perhaps even preferable.

>>>
>>>> +.IR unencoded_offset ).
>>>> +See
>>>> +.B Extent layout
>>>> +below for some examples.
>>>> +.I
>>>
>>> Were you maybe going to add something there?
>>>
>>> If not, please remove that [.I].
>>>
>>>> +.PP
>>>> +If the unencoded data is actually longer than
>>>> +.IR unencoded_len ,
>>>> +then it is truncated;
>>>> +if it is shorter, then it is extended with zeroes.
>>>> +.PP
>>>> +
>>>
>>> Please, remove that blank line.
>>>
>>>> +.BR pwritev2 ()
>>>
>>> Should be [.BR pwritev2 (2)]
>>>
>>> Michael (mtk),
>>>
>>> Am I right in that?  Please, confirm.

Yes. References to functions documented in other pages should
include the section number in parentheses.

[...]

>>>> +.PP
>>>> +However, suppose we read 50 bytes into a file
>>>> +which contains a single compressed extent.
>>>> +The filesystem must still return the entire compressed extent
>>>> +for us to be able to decompress it,
>>>> +so
>>>> +.I unencoded_len
>>>> +would be the length of the entire decompressed extent.
>>>> +However, because the read was at offset 50,
>>>> +the first 50 bytes should be ignored.
>>>> +Therefore,
>>>> +.I unencoded_offset
>>>> +would be 50,
>>>> +and
>>>> +.I len
>>>> +would accordingly be
>>>> +.IR unencoded_len\ -\ 50 .
>>>
>>> This formats everything as I, except for the last dot.
>>> Replace by:
>>>
>>> [
>>> .I unencoded
>>> - 50.
>>> ]
>>>
>>> Michael (mtk), same as above:
>>> to space, or not to space?  That is the question :p

In this case, perhaps

.IR unencoded \-1

[...]


>>>> +.SS Security
>>>> +Encoded I/O creates the potential for some security issues:
>>>> +.IP * 3
>>>> +Encoded writes allow writing arbitrary data which the kernel will decode on
>>>> +a subsequent read. Decompression algorithms are complex and may have bugs
>>>> +which can be exploited by maliciously crafted data.
>>>> +.IP *
>>>> +Encoded reads may return data which is not logically present in the file
>>>> +(see the discussion of
>>>> +.I len
>>>> +vs.
>>>
>>> Please, s/vs./vs/
>>> See the reasons below:
>>>
>>> Michael (mtk),
>>>
>>> Here the renderer outputs a double space
>>> (as for separating two sentences).
>>>
>>> Are you okay with that?

Yes, that should probably be avoided. I'm not sure what the
correct way is to prevent that in groff though. I mean, one
could write

.RI "vs.\ " unencoded_len

but I think that simply creates a nonbreaking space,
which is not exactly what is desired.

[....]

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

  parent reply	other threads:[~2020-12-01 20:13 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-18 19:18 [PATCH v6 00/11] fs: interface for directly reading/writing compressed data Omar Sandoval
2020-11-18 19:18 ` [PATCH man-pages v6] Document encoded I/O Omar Sandoval
2020-11-19 23:29   ` Alejandro Colomar (mailing lists; readonly)
2020-11-20 14:06     ` Alejandro Colomar (man-pages)
2020-11-20 15:03       ` Alejandro Colomar (man-pages)
2020-11-30 19:35         ` Omar Sandoval
2020-12-01 14:36         ` Ping: " Alejandro Colomar (man-pages)
2020-12-01 20:12         ` Michael Kerrisk (man-pages) [this message]
2020-12-01 20:20           ` Michael Kerrisk (man-pages)
2020-12-01 21:35             ` Alejandro Colomar (man-pages)
2020-12-01 21:56               ` Michael Kerrisk (man-pages)
2020-12-18 10:32                 ` Ping: " Alejandro Colomar (man-pages)
2021-01-12  1:12                   ` Omar Sandoval
2020-12-01 20:21           ` G. Branden Robinson
2020-12-01 21:34             ` Alejandro Colomar (man-pages)
2020-12-01 21:58             ` Michael Kerrisk (man-pages)
2020-11-18 19:18 ` [PATCH v6 01/11] iov_iter: add copy_struct_from_iter() Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 02/11] fs: add O_ALLOW_ENCODED open flag Omar Sandoval
2020-11-19  7:02   ` Amir Goldstein
2020-11-20 23:41     ` Jann Horn
2020-11-30 19:26       ` Omar Sandoval
2020-12-01  8:15         ` Amir Goldstein
2020-12-01 20:31           ` Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 03/11] fs: add RWF_ENCODED for reading/writing compressed data Omar Sandoval
2020-11-19  7:38   ` Amir Goldstein
2021-01-11 23:06     ` Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 04/11] btrfs: fix btrfs_write_check() Omar Sandoval
2020-11-23 17:08   ` David Sterba
2020-11-30 19:18     ` Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 05/11] btrfs: fix check_data_csum() error message for direct I/O Omar Sandoval
2020-11-23 17:09   ` David Sterba
2020-11-30 19:20     ` Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 06/11] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 07/11] btrfs: add ram_bytes and offset to btrfs_ordered_extent Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 08/11] btrfs: support different disk extent size for delalloc Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 09/11] btrfs: optionally extend i_size in cow_file_range_inline() Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 10/11] btrfs: implement RWF_ENCODED reads Omar Sandoval
2020-12-03 14:32   ` Josef Bacik
2021-01-11 20:21     ` Omar Sandoval
2021-01-11 20:35       ` Josef Bacik
2021-01-11 20:58         ` Omar Sandoval
2020-11-18 19:18 ` [PATCH v6 11/11] btrfs: implement RWF_ENCODED writes Omar Sandoval
2020-12-02 22:03   ` Josef Bacik
2020-12-03 14:37   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dcb0679d-3ac5-dd95-5473-3c66ae4132b6@gmail.com \
    --to=mtk.manpages@gmail.com \
    --cc=alx.manpages@gmail.com \
    --cc=amir73il@gmail.com \
    --cc=cyphar@cyphar.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jannh@google.com \
    --cc=kernel-team@fb.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.