From: Jason Gunthorpe <jgg@ziepe.ca> To: Peter Xu <peterx@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com>, Linus Torvalds <torvalds@linux-foundation.org>, Leon Romanovsky <leonro@nvidia.com>, Linux-MM <linux-mm@kvack.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, "Maya B . Gokhale" <gokhale2@llnl.gov>, Yang Shi <yang.shi@linux.alibaba.com>, Marty Mcfadden <mcfadden8@llnl.gov>, Kirill Shutemov <kirill@shutemov.name>, Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>, Jan Kara <jack@suse.cz>, Kirill Tkhai <ktkhai@virtuozzo.com>, Andrea Arcangeli <aarcange@redhat.com>, Christoph Hellwig <hch@lst.de>, Andrew Morton <akpm@linux-foundation.org> Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification Date: Thu, 17 Sep 2020 08:25:38 -0300 Message-ID: <20200917112538.GD8409@ziepe.ca> (raw) In-Reply-To: <20200916184619.GB40154@xz-x1> On Wed, Sep 16, 2020 at 02:46:19PM -0400, Peter Xu wrote: > My understanding is this may only work for the case when the fork()ed child > quitted before we reach here (so we still have mapcount==1 for the > page). Yes > What if not? Then mapcount will be greater than 1, and cow will > still trigger. Is that what we want? That doesn't work today anyhow, so it is fine continuing to be broken. > Another problem is that, aiui, one of the major change previous patch proposed > is to avoid using lock_page() so that we never block in this path. I saw you mention this before, but it looks like the change was to lift some of the atomc_reads out of the lock and avoid the lock if they indicate failure, checking also for page_maybe_dma_pinned() outside the lock just means the rare case of FOLL_PIN we will take the lock again. > Maybe even more complicated, because "correctness" should be even harder > than "best effort reuse" since it can cause data corruption if we didn't do it > right... The only correct way is for the application to avoid write protect on FOLL_PIN pages. The purpose here is to allow applications that hadn't hit "bad luck" and failed to keep working. Another thought is to insert a warning print here as well that the program is working improperly? At least it would give a transition period to evaluate the extent of the problem. We are thinking it is going to be a notable regression. I botched the last version of the patch, here is something a bit better. Does it seem like it could be OK? I know very little about this part of the kernel Thanks, Jason diff --git a/mm/memory.c b/mm/memory.c index 469af373ae76e1..332de777854f8b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2889,6 +2889,24 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf) return ret; } +static bool cow_needed(struct vm_fault *vmf) +{ + int total_map_swapcount; + + if (!reuse_swap_page(vmf->page, &total_map_swapcount)) + return true; + + if (total_map_swapcount == 1) { + /* + * The page is all ours. Move it to our anon_vma so the rmap + * code will not search our parent or siblings. Protected + * against the rmap code by the page lock. + */ + page_move_anon_rmap(vmf->page, vmf->vma); + } + return false; +} + /* * This routine handles present pages, when users try to write * to a shared page. It is done by copying the page to a new address @@ -2942,13 +2960,27 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) struct page *page = vmf->page; /* PageKsm() doesn't necessarily raise the page refcount */ - if (PageKsm(page) || page_count(page) != 1) + if (PageKsm(page)) goto copy; + if (page_count(page) != 1) { + /* + * If the page is DMA pinned we can't rely on the + * above to know if there are other CPU references as + * page_count() will be elevated by the + * pin. Needlessly copying the page will cause the DMA + * pin to break, try harder to avoid that. + */ + if (!page_maybe_dma_pinned(page)) + goto copy; + } + if (!trylock_page(page)) goto copy; if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) { - unlock_page(page); - goto copy; + if (cow_needed(vmf)) { + unlock_page(page); + goto copy; + } } /* * Ok, we've got the only map reference, and the only
next prev parent reply index Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-08-21 23:49 [PATCH 0/4] mm: Simplfy cow handling Peter Xu 2020-08-21 23:49 ` [PATCH 1/4] mm: Trial do_wp_page() simplification Peter Xu 2020-08-24 8:36 ` Kirill Tkhai 2020-08-24 14:30 ` Jan Kara 2020-08-24 15:37 ` Kirill Tkhai 2020-08-24 18:22 ` Linus Torvalds 2020-09-01 7:01 ` Hugh Dickins 2020-09-14 14:38 ` Jason Gunthorpe 2020-09-14 17:32 ` Linus Torvalds 2020-09-14 18:34 ` Peter Xu 2020-09-14 21:15 ` Peter Xu 2020-09-14 22:55 ` Jason Gunthorpe 2020-09-14 22:59 ` Linus Torvalds 2020-09-14 23:28 ` Jason Gunthorpe 2020-09-15 0:19 ` Linus Torvalds 2020-09-15 14:50 ` Peter Xu 2020-09-15 15:17 ` Peter Xu 2020-09-15 16:05 ` Jason Gunthorpe 2020-09-15 18:29 ` Jason Gunthorpe 2020-09-15 19:13 ` Peter Xu 2020-09-15 19:38 ` Jason Gunthorpe 2020-09-15 21:33 ` Peter Xu 2020-09-15 23:22 ` Jason Gunthorpe 2020-09-16 1:50 ` John Hubbard 2020-09-16 17:48 ` Jason Gunthorpe 2020-09-16 18:46 ` Peter Xu 2020-09-17 11:25 ` Jason Gunthorpe [this message] 2020-09-17 18:11 ` Linus Torvalds 2020-09-17 19:38 ` Jason Gunthorpe 2020-09-17 19:51 ` Linus Torvalds 2020-09-18 16:40 ` Peter Xu 2020-09-18 17:16 ` Linus Torvalds 2020-09-18 19:57 ` Peter Xu 2020-09-18 17:32 ` Jason Gunthorpe 2020-09-18 20:40 ` Peter Xu 2020-09-18 20:59 ` Linus Torvalds 2020-09-19 0:28 ` Jason Gunthorpe 2020-09-18 21:06 ` John Hubbard 2020-09-19 0:01 ` Jason Gunthorpe 2020-09-21 8:35 ` Jan Kara 2020-09-21 12:03 ` Jason Gunthorpe 2020-09-21 13:42 ` Michal Hocko 2020-09-21 14:18 ` Peter Xu 2020-09-21 14:28 ` Michal Hocko 2020-09-21 14:38 ` Tejun Heo 2020-09-21 14:43 ` Christian Brauner 2020-09-21 14:55 ` Michal Hocko 2020-09-21 15:04 ` Christian Brauner 2020-09-21 16:06 ` Michal Hocko 2020-09-23 7:53 ` Michal Hocko 2020-09-21 14:41 ` Christian Brauner 2020-09-21 14:57 ` Michal Hocko 2020-09-21 16:31 ` Peter Xu 2020-09-17 18:14 ` Peter Xu 2020-09-17 18:26 ` Linus Torvalds 2020-09-17 19:03 ` Peter Xu 2020-09-17 19:42 ` Linus Torvalds 2020-09-17 19:55 ` John Hubbard 2020-09-17 20:06 ` Jason Gunthorpe 2020-09-17 20:19 ` John Hubbard 2020-09-17 20:25 ` Jason Gunthorpe 2020-09-17 20:35 ` Linus Torvalds 2020-09-17 21:40 ` Peter Xu 2020-09-17 22:09 ` Jason Gunthorpe 2020-09-17 22:25 ` Linus Torvalds 2020-09-17 22:48 ` Ira Weiny 2020-09-18 9:36 ` Jan Kara 2020-09-18 9:44 ` Jan Kara 2020-09-18 16:19 ` Jason Gunthorpe 2020-09-15 10:23 ` Leon Romanovsky 2020-09-15 15:56 ` Jason Gunthorpe 2020-09-15 15:03 ` Oleg Nesterov 2020-09-15 16:18 ` Peter Xu 2020-08-21 23:49 ` [PATCH 2/4] mm/ksm: Remove reuse_ksm_page() Peter Xu 2020-08-21 23:49 ` [PATCH 3/4] mm/gup: Remove enfornced COW mechanism Peter Xu 2020-09-14 14:27 ` Oleg Nesterov 2020-09-14 17:59 ` Peter Xu 2020-09-14 19:03 ` Linus Torvalds 2020-08-21 23:49 ` [PATCH 4/4] mm: Add PGREUSE counter Peter Xu 2020-08-22 16:14 ` Linus Torvalds 2020-08-24 0:24 ` Peter Xu 2020-08-22 16:05 ` [PATCH 0/4] mm: Simplfy cow handling Linus Torvalds 2020-08-23 23:58 ` Peter Xu 2020-08-24 8:38 ` Kirill Tkhai 2020-08-27 14:15 ` Peter Xu 2021-02-02 14:40 [PATCH 1/4] mm: Trial do_wp_page() simplification Gal Pressman 2021-02-02 16:31 ` Peter Xu 2021-02-02 16:44 ` Jason Gunthorpe 2021-02-02 17:05 ` Peter Xu 2021-02-02 17:13 ` Jason Gunthorpe 2021-02-03 12:43 ` Gal Pressman 2021-02-03 14:00 ` Jason Gunthorpe 2021-02-03 14:47 ` Gal Pressman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200917112538.GD8409@ziepe.ca \ --to=jgg@ziepe.ca \ --cc=aarcange@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=gokhale2@llnl.gov \ --cc=hch@lst.de \ --cc=jack@suse.cz \ --cc=jannh@google.com \ --cc=jhubbard@nvidia.com \ --cc=kirill@shutemov.name \ --cc=ktkhai@virtuozzo.com \ --cc=leonro@nvidia.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mcfadden8@llnl.gov \ --cc=oleg@redhat.com \ --cc=peterx@redhat.com \ --cc=torvalds@linux-foundation.org \ --cc=yang.shi@linux.alibaba.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
LKML Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \ linux-kernel@vger.kernel.org public-inbox-index lkml Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git