linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yury Norov <ynorov@caviumnetworks.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Dan Williams <dan.j.williams@intel.com>,
	Huang Ying <ying.huang@intel.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Michel Lespinasse <walken@google.com>,
	Souptick Joarder <jrdr.linux@gmail.com>, Willy Tarreau <w@1wt.eu>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] mm: fix COW faults after mlock()
Date: Tue, 25 Sep 2018 02:48:43 +0300	[thread overview]
Message-ID: <20180924234843.GA23726@yury-thinkpad> (raw)
In-Reply-To: <20180924212246.vmmsmgd5qw6xkfwh@kshutemo-mobl1>

On Tue, Sep 25, 2018 at 12:22:47AM +0300, Kirill A. Shutemov wrote:
> External Email
> 
> On Mon, Sep 24, 2018 at 04:08:52PM +0300, Yury Norov wrote:
> > After mlock() on newly mmap()ed shared memory I observe page faults.
> >
> > The problem is that populate_vma_page_range() doesn't set FOLL_WRITE
> > flag for writable shared memory in mlock() path, arguing that like:
> > /*
> >  * We want to touch writable mappings with a write fault in order
> >  * to break COW, except for shared mappings because these don't COW
> >  * and we would not want to dirty them for nothing.
> >  */
> >
> > But they are actually COWed. The most straightforward way to avoid it
> > is to set FOLL_WRITE flag for shared mappings as well as for private ones.
> 
> Huh? How do shared mapping get CoWed?
> 
> In this context CoW means to create a private copy of the  page for the
> process. It only makes sense for private mappings as all pages in shared
> mappings do not belong to the process.
> 
> Shared mappings will still get faults, but a bit later -- after the page
> is written back to disc, the page get clear and write protected to catch
> the next write access.
> 
> Noticeable exception is tmpfs/shmem. These pages do not belong to normal
> write back process. But the code path is used for other filesystems as
> well.
> 
> Therefore, NAK. You only create unneeded write back traffic.

Hi Kirill,

(My first reaction was exactly like yours indeed, but) on my real
system (Cavium OcteonTX2), and on my qemu simulation I can reproduce
the same behavior: just mlock()ed memory causes faults. That faults
happen because page is mapped to the process as read-only, while
underlying VMA is read-write. So faults get resolved well by just
setting write access to the page.

Maybe I use term COW wrongly here, but this is how faultin_page()
works, and it sets FOLL_COW bit before return (which is ignored 
on upper level).

I realize that proper fix may be more complex, and if so I'll
thankfully take it and drop this patch from my tree, but this is
all that I have so far to address the problem.

The user code below is reproducer. 

Thanks,
Yury

        int i, ret, len = getpagesize() * 1000;
        char tmpfile[] = "/tmp/my_tmp-XXXXXX";
        int fd = mkstemp(tmpfile);

        ret = ftruncate(fd, len);
        if (ret) {
                printf("Failed to ftruncate: %d\n", errno);
                goto out;
        }

        ptr = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
        if (ptr == MAP_FAILED) {
                printf("Failed to mmap memory: %d\n", errno);
                goto out;
        }

        ret = mlock(ptr, len);
        if (ret) {
                printf("Failed to mlock: %d\n", errno);
                goto out;
        }

        printf("Touch...\n");

        for (i = 0; i < len; i++)
                ptr[i] = (char) i; /* Faults here. */

        printf("\t... done\n");
out:
        close(fd);
        unlink(tmpfile);

  reply	other threads:[~2018-09-24 23:49 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-24 13:08 [PATCH] mm: fix COW faults after mlock() Yury Norov
2018-09-24 21:22 ` Kirill A. Shutemov
2018-09-24 23:48   ` Yury Norov [this message]
2018-09-25 10:48     ` Kirill A. Shutemov
2018-10-11  5:37 ` [LKP] [mm] dd12385915: vm-scalability.median 18.6% improvement kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180924234843.GA23726@yury-thinkpad \
    --to=ynorov@caviumnetworks.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=jrdr.linux@gmail.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mst@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=w@1wt.eu \
    --cc=walken@google.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).