linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Alejandro Colomar (man-pages)" <alx.manpages@gmail.com>
To: Peter Collingbourne <pcc@google.com>
Cc: Jann Horn <jannh@google.com>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	linux-mm@kvack.org, kernel test robot <lkp@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Linux API <linux-api@vger.kernel.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	linux-doc@vger.kernel.org, linux-man@vger.kernel.org,
	John Hubbard <jhubbard@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Evgenii Stepanov <eugenis@google.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>
Subject: Re: [PATCH] refpage_create.2: Document refpage_create(2)
Date: Thu, 29 Jul 2021 14:09:54 +0200	[thread overview]
Message-ID: <d4a138c2-88cc-704b-5bb5-cafe580376aa@gmail.com> (raw)
In-Reply-To: <20210717025951.3946505-1-pcc@google.com>

Hi Peter,

On 7/17/21 4:59 AM, Peter Collingbourne wrote:
> ---
> The syscall has not landed in the kernel yet.
> Therefore, as usual, the patch should not be taken yet
> and I've used 5.x as the introducing kernel version for now.

Thanks!  Please see a few comments below.
Apart from formatting and code issues I noted,
the text looks good to me.

Please, ping us when this is merged in the kernel :)

Regards,

Alex

> 
>   man2/refpage_create.2 | 167 ++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 167 insertions(+)
>   create mode 100644 man2/refpage_create.2
> 
> diff --git a/man2/refpage_create.2 b/man2/refpage_create.2
> new file mode 100644
> index 000000000..c0b928b92
> --- /dev/null
> +++ b/man2/refpage_create.2
> @@ -0,0 +1,167 @@
> +.\" Copyright (C) 2021 Google LLC
> +.\" Author: Peter Collingbourne <pcc@google.com>
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of this
> +.\" manual under the conditions for verbatim copying, provided that the
> +.\" entire resulting derived work is distributed under the terms of a
> +.\" permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
> +.\" responsibility for errors or omissions, or for damages resulting from
> +.\" the use of the information contained herein.  The author(s) may not
> +.\" have taken the same level of care in the production of this manual,
> +.\" which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH REFPAGE_CREATE 2 2021-07-16 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +refpage_create \- create a reference page file descriptor
> +.SH SYNOPSIS
> +.nf
> +.BR "#include <unistd.h>"
> +.PP
> +.BI "int syscall(SYS_refpage_create, void *" content ", unsigned int " size ,
> +.BI "            unsigned long " flags ");"
> +.fi
> +.PP
> +.IR Note :
> +glibc provides no wrapper for
> +.BR refpage_create (),
> +necessitating the use of
> +.BR syscall (2).
> +.SH DESCRIPTION
> +The
> +.BR refpage_create ()
> +system call is used to create a file descriptor
> +that conceptually refers to a read-only file
> +whose contents are an infinite repetition of
> +.I size
> +bytes of data read from the
> +.I content
> +argument to the system call,
> +and which may be mapped into memory with
> +.BR mmap (2).
> +The file descriptor is created as if by passing
> +.BR O_RDONLY | O_CLOEXEC
> +to
> +.BR open (2).
> +.PP
> +In reality, any read-only pages in the mapping are backed
> +by a so-called reference page,
> +whose contents are specified using the arguments to
> +.BR refpage_create ().
> +.PP
> +The reference page will consist of repetitions of
> +.I size
> +bytes read
> +from
> +.IR content ,
> +as many as are required to fill the page. The
> +.I size
> +argument must be a power of two less than or equal to the page size, and the
> +.I content
> +argument must have at least
> +.I size
> +alignment. The behavior is as if a copy of this data

s/\. /.\n/

Rationale: semantic newlines.

> +is made while servicing the system call;
> +any updates to the data after the system call has returned
> +will not be reflected in the reference page.
> +.PP
> +If the architecture specifies that // metadata may be associated /J/

Please, use semantic newlines (see man-pages(7))

> +with memory addresses, // that metadata if present is copied
> +into the reference page along with the data itself,
> +but only if the size argument is at least as large
> +as the granularity of the metadata.
> +For example, with the ARMv8.5 Memory Tagging Extension,
> +the memory tags are copied, // but only if the size is greater than /J/
> +or equal to // the architecturally specified tag granule size of 16 bytes.
> +.PP
> +Writable private mappings trigger specific copy-on-write behavior
> +when a page in the mapping is written to.
> +The behavior is as if the reference page is copied,
> +but the kernel may use a more efficient technique such as
> +.BR memset (3)
> +to produce the copy if the
> +.I size
> +argument originally used to create the reference page file descriptor
> +is sufficiently small.
> +For this reason it is recommended to specify as small of a
> +.I size
> +argument as possible
> +in order to activate any such optimizations implemented in the kernel.
> +.PP
> +The advantage of using this system call
> +over creating normal anonymous mappings
> +and manually initializing the pages from userspace
> +is that it is more efficient.
> +If it is not known that all of the pages in the mapping
> +will be faulted (for example, if the system call is used
> +by a general purpose memory allocator
> +where the behavior of the client program is unknown),
> +letting the pages be prepared on fault only if needed
> +is more efficient from both a performance
> +and memory consumption perspective.
> +Even if all of the pages would end up being faulted,
> +it would still be more efficient
> +to have the kernel initialize the pages with the required contents once
> +than to have the kernel zero initialize them on fault
> +and then have userspace initialize them again with different contents.
> +.SH EXAMPLES
> +The following program creates a 128KB memory mapping

The SI mandates that a space shall be inserted between a number and the 
associated unit.

Also, if it really means 128 KiB, which I guess, please use KiB.  See 
units(7).

Use a non-breaking space to make sure that the unit goes with the number.

With all that, it would be:

... creates a 128\ KiB memory ...

> +preinitialized with the pattern byte 0xAA
> +and verifies that the contents of the mapping are correct.
> +.PP
> +.EX
> +#include <linux/unistd.h>
> +#include <stdio.h>
> +#include <sys/mman.h>
> +#include <unistd.h>
> +
> +int main() {
> +    unsigned char pattern = 0xaa;

Please use capital AA to help visually differentiate x and a.

> +    unsigned long mmap_size = 131072;

Why that magic number?
Maybe a shift to indicate that it's a power of 2...  or 128 * 1024...
I don't know from the top of my head powers of 2 that high :)

Also, why 'unsigned long'?  The SYNOPSIS says it's an 'unsigned int'.

> +
> +    int fd = syscall(SYS_refpage_create, &pattern, 1, 0);

Please use sizeof(pattern) instead of 1 to communicate the relationship 
between them.

> +    if (fd < 0) {
> +        perror("refpage_create");
> +        return 1;

Please use EXIT_FAILURE (<stdlib.h>).  Also use exit(3) instead of 
return, as is common practice in manual pages.

> +    }
> +    unsigned char *p = mmap(0, mmap_size, PROT_READ | PROT_WRITE,

Use NULL instead of 0 for pointers.  The first argument of mmap(2) is 
'void *addr'.

> +                            MAP_PRIVATE, fd, 0);
> +    if (p == MAP_FAILED) {
> +        perror("mmap");
> +        return 1;
> +    }
> +    for (unsigned i = 0; i != mmap_size; ++i) {

s/unsigned/unsigned int/

> +        if (p[i] != pattern) {
> +            fprintf(stderr, "refpage failed contents check @ %u: "
> +                    "0x%x != 0x%x\n",

I prefer 0x%X, which is already in use in some manual pages (seccomp(2)).

Also, 'i' may be more readable in hex, given it's an offset of an 
address (actually the concept of a size_t, even if the kernel doesn't 
use that type) don't you think?

> +                    i, p[i], pattern);
> +            return 1;

exit(3)

> +        }
> +    }
> +}
> +.EE
> +.SH NOTE
> +Reading from a reference page file descriptor, e.g. with
> +.BR read (2),
> +is not supported, nor would this be particularly useful.
> +.SH VERSIONS
> +This system call first appeared in Linux 5.x. > +.SH CONFORMING TO
> +The
> +.BR refpage_create ()
> +system call is Linux-specific.
> +.SH SEE ALSO
> +.BR mmap (2),
> +.BR open (2).
> 


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/


      reply	other threads:[~2021-07-29 12:10 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-17  2:59 Peter Collingbourne
2021-07-29 12:09 ` Alejandro Colomar (man-pages) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d4a138c2-88cc-704b-5bb5-cafe580376aa@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=eugenis@google.com \
    --cc=jannh@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mtk.manpages@gmail.com \
    --cc=pcc@google.com \
    --cc=willy@infradead.org \
    --subject='Re: [PATCH] refpage_create.2: Document refpage_create(2)' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).