All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Peter Xu <peterx@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Maya B . Gokhale" <gokhale2@llnl.gov>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	Marty Mcfadden <mcfadden8@llnl.gov>,
	Kirill Shutemov <kirill@shutemov.name>,
	Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>,
	Jan Kara <jack@suse.cz>, Kirill Tkhai <ktkhai@virtuozzo.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification
Date: Fri, 18 Sep 2020 14:32:40 -0300	[thread overview]
Message-ID: <20200918173240.GY8409@ziepe.ca> (raw)
In-Reply-To: <20200918164032.GA5962@xz-x1>

On Fri, Sep 18, 2020 at 12:40:32PM -0400, Peter Xu wrote:

> Firstly in the draft patch mm->has_pinned is introduced and it's written to 1
> as long as FOLL_GUP is called once.  It's never reset after set.

Worth thinking about also adding FOLL_LONGTERM here, at last as long
as it is not a counter. That further limits the impact.

> One more thing (I think) we need to do is to pass the new vma from
> copy_page_range() down into the end because if we want to start cow during
> fork() then we need to operate on that new vma too when new page linked to it
> rather than the parent's.

Makes sense to me

> One issue is when we charge for cgroup we probably can't do that onto the new
> mm/task, since copy_namespaces() is called after copy_mm().  I don't know
> enough about cgroup, I thought the child will inherit the parent's, but I'm not
> sure.  Or, can we change that order of copy_namespaces() && copy_mm()?  I don't
> see a problem so far but I'd like to ask first..

Know nothing about cgroups, but I would have guessed that the page
table allocations would want to be in the cgroup too, is the struct
page a different bucket?

> The other thing is on how to fail.  E.g., when COW failed due to either
> charging of cgroup or ENOMEM, ideally we should fail fork() too.  Though that
> might need more changes - current patch silently kept the shared page for
> simplicity.

I didn't notice anything tricky here.. Something a bit gross but
simple seemed workable:

@@ -852,7 +852,7 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			continue;
 		}
 		entry.val = copy_one_pte(dst_mm, src_mm, dst_pte, src_pte,
-							vma, addr, rss);
+							vma, addr, rss, &err);
 		if (entry.val)
 			break;
 		progress += 8;
@@ -865,6 +865,9 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	pte_unmap_unlock(orig_dst_pte, dst_ptl);
 	cond_resched();
 
+	if (err)
+		return err;
+
 	if (entry.val) {
 		if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
 			return -ENOMEM;

It is not really any different from add_swap_count_continuation()
failure, which already works..

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 496c3ff97cce..b3812fa6383f 100644
> +++ b/include/linux/mm_types.h
> @@ -458,6 +458,7 @@ struct mm_struct {
>  
>  		unsigned long total_vm;	   /* Total pages mapped */
>  		unsigned long locked_vm;   /* Pages that have PG_mlocked set */
> +		unsigned long has_pinned;  /* Whether this mm has pinned any page */

Can be unsigned int or smaller, if there is a better hole in mm_struct

> diff --git a/mm/gup.c b/mm/gup.c
> index e5739a1974d5..cab10cefefe4 100644
> +++ b/mm/gup.c
> @@ -1255,6 +1255,17 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
>  		BUG_ON(*locked != 1);
>  	}
>  
> +	/*
> +	 * Mark the mm struct if there's any page pinning attempt.  We're
> +	 * aggresive on this bit since even if the pinned pages were unpinned
> +	 * later on, we'll still keep this bit set for this address space just
> +	 * to make everything easy.
> +	 *
> +	 * TODO: Ideally we can use mm->pinned_vm but only until it's stable.
> +	 */
> +	if (flags & FOLL_PIN)
> +		WRITE_ONCE(mm->has_pinned, 1);

This should probably be its own commit, here is a stab at a commit
message:

Reduce the chance of false positive from page_maybe_dma_pinned() by
keeping track if the mm_struct has ever been used with
pin_user_pages(). mm_structs that have never been passed to
pin_user_pages() cannot have a positive page_maybe_dma_pinned() by
definition. This allows cases that might drive up the page ref_count
to avoid any penalty from handling dma_pinned pages.

Due to complexities with unpining this trivial version is a permanent
sticky bit, future work will be needed to make this a counter.

> +/*
> + * Do early cow for the page and the pte. Return true if page duplicate
> + * succeeded, false otherwise.
> + */
> +static bool duplicate_page(struct mm_struct *mm, struct vm_area_struct *vma,

Suggest calling 'vma' 'new' here for consistency

> +			   unsigned long address, struct page *page,
> +			   pte_t *newpte)
> +{
> +       struct page *new_page;
> +       pte_t entry;
> +
> +       new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
> +       if (!new_page)
> +               return false;
> +
> +       copy_user_highpage(new_page, page, address, vma);
> +       if (mem_cgroup_charge(new_page, mm, GFP_KERNEL)) {
> +	       put_page(new_page);
> +	       return false;
> +       }
> +       cgroup_throttle_swaprate(new_page, GFP_KERNEL);
> +       __SetPageUptodate(new_page);

It looks like these GFP flags can't be GFP_KERNEL, this is called
inside the pte_alloc_map_lock() which is a spinlock

One thought is to lift this logic out to around
add_swap_count_continuation()? Would need some serious rework to be
able to store the dst_pte though.

Can't help about the cgroup question

Jason

  parent reply	other threads:[~2020-09-18 17:32 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21 23:49 [PATCH 0/4] mm: Simplfy cow handling Peter Xu
2020-08-21 23:49 ` [PATCH 1/4] mm: Trial do_wp_page() simplification Peter Xu
2020-08-24  8:36   ` Kirill Tkhai
2020-08-24 14:30     ` Jan Kara
2020-08-24 15:37       ` Kirill Tkhai
2020-08-24 18:22         ` Linus Torvalds
2020-08-24 18:22           ` Linus Torvalds
2020-09-01  7:01           ` Hugh Dickins
2020-09-01  7:01             ` Hugh Dickins
2020-09-14 14:38   ` Jason Gunthorpe
2020-09-14 17:32     ` Linus Torvalds
2020-09-14 17:32       ` Linus Torvalds
2020-09-14 18:34       ` Peter Xu
2020-09-14 21:15         ` Peter Xu
2020-09-14 22:55           ` Jason Gunthorpe
2020-09-14 22:59             ` Linus Torvalds
2020-09-14 22:59               ` Linus Torvalds
2020-09-14 23:28               ` Jason Gunthorpe
2020-09-15  0:19                 ` Linus Torvalds
2020-09-15  0:19                   ` Linus Torvalds
2020-09-15 14:50                 ` Peter Xu
2020-09-15 15:17                   ` Peter Xu
2020-09-15 16:05                   ` Jason Gunthorpe
2020-09-15 18:29                     ` Jason Gunthorpe
2020-09-15 19:13                       ` Peter Xu
2020-09-15 19:38                         ` Jason Gunthorpe
2020-09-15 21:33                           ` Peter Xu
2020-09-15 23:22                             ` Jason Gunthorpe
2020-09-16  1:50                               ` John Hubbard
2020-09-16 17:48                                 ` Jason Gunthorpe
2020-09-16 18:46                                   ` Peter Xu
2020-09-17 11:25                                     ` Jason Gunthorpe
2020-09-17 18:11                                       ` Linus Torvalds
2020-09-17 18:11                                         ` Linus Torvalds
2020-09-17 19:38                                         ` Jason Gunthorpe
2020-09-17 19:51                                           ` Linus Torvalds
2020-09-17 19:51                                             ` Linus Torvalds
2020-09-18 16:40                                             ` Peter Xu
2020-09-18 17:16                                               ` Linus Torvalds
2020-09-18 17:16                                                 ` Linus Torvalds
2020-09-18 19:57                                                 ` Peter Xu
2020-09-18 17:32                                               ` Jason Gunthorpe [this message]
2020-09-18 20:40                                                 ` Peter Xu
2020-09-18 20:59                                                   ` Linus Torvalds
2020-09-18 20:59                                                     ` Linus Torvalds
2020-09-19  0:28                                                     ` Jason Gunthorpe
2020-09-18 21:06                                                   ` John Hubbard
2020-09-19  0:01                                                     ` Jason Gunthorpe
2020-09-21  8:35                                                       ` Jan Kara
2020-09-21 12:03                                                         ` Jason Gunthorpe
2022-02-16 16:59                                                           ` Oded Gabbay
2022-02-16 17:24                                                             ` Oded Gabbay
2022-02-16 19:04                                                             ` Linus Torvalds
2022-02-16 19:20                                                               ` Oded Gabbay
2022-02-16 19:24                                                               ` David Hildenbrand
2020-09-21 13:42                                               ` Michal Hocko
2020-09-21 14:18                                                 ` Peter Xu
2020-09-21 14:28                                                   ` Michal Hocko
2020-09-21 14:38                                                     ` Tejun Heo
2020-09-21 14:43                                                       ` Christian Brauner
2020-09-21 14:55                                                         ` Michal Hocko
2020-09-21 15:04                                                           ` Christian Brauner
2020-09-21 16:06                                                             ` Michal Hocko
2020-09-23  7:53                                                               ` Michal Hocko
2020-09-21 14:41                                                 ` Christian Brauner
2020-09-21 14:57                                                   ` Michal Hocko
2020-09-21 16:31                                                     ` Peter Xu
2020-09-17 18:14                                       ` Peter Xu
2020-09-17 18:26                                         ` Linus Torvalds
2020-09-17 18:26                                           ` Linus Torvalds
2020-09-17 19:03                                           ` Peter Xu
2020-09-17 19:42                                             ` Linus Torvalds
2020-09-17 19:42                                               ` Linus Torvalds
2020-09-17 19:55                                               ` John Hubbard
2020-09-17 20:06                                               ` Jason Gunthorpe
2020-09-17 20:19                                                 ` John Hubbard
2020-09-17 20:25                                                   ` Jason Gunthorpe
2020-09-17 20:35                                                 ` Linus Torvalds
2020-09-17 20:35                                                   ` Linus Torvalds
2020-09-17 21:40                                                   ` Peter Xu
2020-09-17 22:09                                                     ` Jason Gunthorpe
2020-09-17 22:25                                                       ` Linus Torvalds
2020-09-17 22:25                                                         ` Linus Torvalds
2020-09-17 22:48                                                       ` Ira Weiny
2020-09-18  9:36                                                         ` Jan Kara
2020-09-18  9:44                                                       ` Jan Kara
2020-09-18 16:19                                             ` Jason Gunthorpe
2020-09-15 10:23           ` Leon Romanovsky
2020-09-15 15:56           ` Jason Gunthorpe
2020-09-15 15:03   ` Oleg Nesterov
2020-09-15 16:18     ` Peter Xu
2020-08-21 23:49 ` [PATCH 2/4] mm/ksm: Remove reuse_ksm_page() Peter Xu
2020-08-21 23:49 ` [PATCH 3/4] mm/gup: Remove enfornced COW mechanism Peter Xu
2020-09-14 14:27   ` Oleg Nesterov
2020-09-14 17:59     ` Peter Xu
2020-09-14 19:03       ` Linus Torvalds
2020-09-14 19:03         ` Linus Torvalds
2020-08-21 23:49 ` [PATCH 4/4] mm: Add PGREUSE counter Peter Xu
2020-08-22 16:14   ` Linus Torvalds
2020-08-22 16:14     ` Linus Torvalds
2020-08-24  0:24     ` Peter Xu
2020-08-22 16:05 ` [PATCH 0/4] mm: Simplfy cow handling Linus Torvalds
2020-08-22 16:05   ` Linus Torvalds
2020-08-23 23:58   ` Peter Xu
2020-08-24  8:38 ` Kirill Tkhai
2020-08-27 14:15 ` Peter Xu
2021-02-02 14:40 [PATCH 1/4] mm: Trial do_wp_page() simplification Gal Pressman
2021-02-02 16:31 ` Peter Xu
2021-02-02 16:44   ` Jason Gunthorpe
2021-02-02 17:05     ` Peter Xu
2021-02-02 17:13       ` Jason Gunthorpe
2021-02-03 12:43         ` Gal Pressman
2021-02-03 14:00           ` Jason Gunthorpe
2021-02-03 14:47             ` Gal Pressman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200918173240.GY8409@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=gokhale2@llnl.gov \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=ktkhai@virtuozzo.com \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcfadden8@llnl.gov \
    --cc=oleg@redhat.com \
    --cc=peterx@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.