From: Jason Gunthorpe <jgg@ziepe.ca> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com>, John Hubbard <jhubbard@nvidia.com>, Leon Romanovsky <leonro@nvidia.com>, Linux-MM <linux-mm@kvack.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, "Maya B . Gokhale" <gokhale2@llnl.gov>, Yang Shi <yang.shi@linux.alibaba.com>, Marty Mcfadden <mcfadden8@llnl.gov>, Kirill Shutemov <kirill@shutemov.name>, Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>, Jan Kara <jack@suse.cz>, Kirill Tkhai <ktkhai@virtuozzo.com>, Andrea Arcangeli <aarcange@redhat.com>, Christoph Hellwig <hch@lst.de>, Andrew Morton <akpm@linux-foundation.org> Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification Date: Thu, 17 Sep 2020 17:06:38 -0300 Message-ID: <20200917200638.GM8409@ziepe.ca> (raw) In-Reply-To: <CAHk-=wgw3GNyF_6euymOFxM62Y3B=C=f2iUJn8Py-u5YELJ5JA@mail.gmail.com> On Thu, Sep 17, 2020 at 12:42:11PM -0700, Linus Torvalds wrote: > Because the whole "do page pinning without MADV_DONTFORK and then fork > the area" is I feel a very very invalid load. It sure as hell isn't > something we should care about performance for, and in fact it is > something we should very well warn for exactly to let people know > "this process is doing bad things". It is easy for things like iouring that can just allocate the queue memory they care about and MADV_DONTFORK it. Other things work more like O_DIRECT - the data it is working on is arbtiary app memory, not controlled in anyway. In RDMA we have this ugly scheme were we automatically call MADV_DONTFORK on the virtual address and hope it doesn't explode. It is very hard to call MADV_DONTFORK if you don't control the allocation. Don't want to break huge pages, have to hope really really hard that a fork doesn't need that memory. Hope you don't run out of vmas beause it causes a vma split. So ugly. So much overhead. Considering almost anything can do a fork() - we've seen app authors become confused. They say stuff is busted, support folks ask if they use fork, author says no.. Investigation later shows some hidden library did system() or whatever. In this case the tests that found this failed because they were written in Python and buried in there was some subprocess.call(). I would prefer the kernel consider it a valid work load with the semantics the sketch patch shows.. > Is there possibly somethign else we can filter on than just > GUP_PIN_COUNTING_BIAS? Because it could be as simple as just marking > the vma itself and saying "this vma has had a page pinning event done > on it". We'd have to give up pin_user_pages_fast() to do that as we can't fast walk and get vmas? Hmm, there are many users. I remember that the hfi1 folks really wanted the fast version for some reason.. > Because if we only start copying the page *iff* the vma is marked by > that "this vma had page pinning" _and_ the page count is bigger than > GUP_PIN_COUNTING_BIAS, than I think we can rest pretty easily knowing > that we aren't going to hit some regular old-fashioned UNIX server > cases with a lot of forks.. Agree Given that this is a user visible regression, it is nearly rc6, what do you prefer for next steps? Sorting out this for fork, especially if it has the vma change is probably more than a weeks time. Revert this patch and try again next cycle? Thanks, Jason
next prev parent reply index Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-08-21 23:49 [PATCH 0/4] mm: Simplfy cow handling Peter Xu 2020-08-21 23:49 ` [PATCH 1/4] mm: Trial do_wp_page() simplification Peter Xu 2020-08-24 8:36 ` Kirill Tkhai 2020-08-24 14:30 ` Jan Kara 2020-08-24 15:37 ` Kirill Tkhai 2020-08-24 18:22 ` Linus Torvalds 2020-09-01 7:01 ` Hugh Dickins 2020-09-14 14:38 ` Jason Gunthorpe 2020-09-14 17:32 ` Linus Torvalds 2020-09-14 18:34 ` Peter Xu 2020-09-14 21:15 ` Peter Xu 2020-09-14 22:55 ` Jason Gunthorpe 2020-09-14 22:59 ` Linus Torvalds 2020-09-14 23:28 ` Jason Gunthorpe 2020-09-15 0:19 ` Linus Torvalds 2020-09-15 14:50 ` Peter Xu 2020-09-15 15:17 ` Peter Xu 2020-09-15 16:05 ` Jason Gunthorpe 2020-09-15 18:29 ` Jason Gunthorpe 2020-09-15 19:13 ` Peter Xu 2020-09-15 19:38 ` Jason Gunthorpe 2020-09-15 21:33 ` Peter Xu 2020-09-15 23:22 ` Jason Gunthorpe 2020-09-16 1:50 ` John Hubbard 2020-09-16 17:48 ` Jason Gunthorpe 2020-09-16 18:46 ` Peter Xu 2020-09-17 11:25 ` Jason Gunthorpe 2020-09-17 18:11 ` Linus Torvalds 2020-09-17 19:38 ` Jason Gunthorpe 2020-09-17 19:51 ` Linus Torvalds 2020-09-18 16:40 ` Peter Xu 2020-09-18 17:16 ` Linus Torvalds 2020-09-18 19:57 ` Peter Xu 2020-09-18 17:32 ` Jason Gunthorpe 2020-09-18 20:40 ` Peter Xu 2020-09-18 20:59 ` Linus Torvalds 2020-09-19 0:28 ` Jason Gunthorpe 2020-09-18 21:06 ` John Hubbard 2020-09-19 0:01 ` Jason Gunthorpe 2020-09-21 8:35 ` Jan Kara 2020-09-21 12:03 ` Jason Gunthorpe 2020-09-21 13:42 ` Michal Hocko 2020-09-21 14:18 ` Peter Xu 2020-09-21 14:28 ` Michal Hocko 2020-09-21 14:38 ` Tejun Heo 2020-09-21 14:43 ` Christian Brauner 2020-09-21 14:55 ` Michal Hocko 2020-09-21 15:04 ` Christian Brauner 2020-09-21 16:06 ` Michal Hocko 2020-09-23 7:53 ` Michal Hocko 2020-09-21 14:41 ` Christian Brauner 2020-09-21 14:57 ` Michal Hocko 2020-09-21 16:31 ` Peter Xu 2020-09-17 18:14 ` Peter Xu 2020-09-17 18:26 ` Linus Torvalds 2020-09-17 19:03 ` Peter Xu 2020-09-17 19:42 ` Linus Torvalds 2020-09-17 19:55 ` John Hubbard 2020-09-17 20:06 ` Jason Gunthorpe [this message] 2020-09-17 20:19 ` John Hubbard 2020-09-17 20:25 ` Jason Gunthorpe 2020-09-17 20:35 ` Linus Torvalds 2020-09-17 21:40 ` Peter Xu 2020-09-17 22:09 ` Jason Gunthorpe 2020-09-17 22:25 ` Linus Torvalds 2020-09-17 22:48 ` Ira Weiny 2020-09-18 9:36 ` Jan Kara 2020-09-18 9:44 ` Jan Kara 2020-09-18 16:19 ` Jason Gunthorpe 2020-09-15 10:23 ` Leon Romanovsky 2020-09-15 15:56 ` Jason Gunthorpe 2020-09-15 15:03 ` Oleg Nesterov 2020-09-15 16:18 ` Peter Xu 2020-08-21 23:49 ` [PATCH 2/4] mm/ksm: Remove reuse_ksm_page() Peter Xu 2020-08-21 23:49 ` [PATCH 3/4] mm/gup: Remove enfornced COW mechanism Peter Xu 2020-09-14 14:27 ` Oleg Nesterov 2020-09-14 17:59 ` Peter Xu 2020-09-14 19:03 ` Linus Torvalds 2020-08-21 23:49 ` [PATCH 4/4] mm: Add PGREUSE counter Peter Xu 2020-08-22 16:14 ` Linus Torvalds 2020-08-24 0:24 ` Peter Xu 2020-08-22 16:05 ` [PATCH 0/4] mm: Simplfy cow handling Linus Torvalds 2020-08-23 23:58 ` Peter Xu 2020-08-24 8:38 ` Kirill Tkhai 2020-08-27 14:15 ` Peter Xu 2021-02-02 14:40 [PATCH 1/4] mm: Trial do_wp_page() simplification Gal Pressman 2021-02-02 16:31 ` Peter Xu 2021-02-02 16:44 ` Jason Gunthorpe 2021-02-02 17:05 ` Peter Xu 2021-02-02 17:13 ` Jason Gunthorpe 2021-02-03 12:43 ` Gal Pressman 2021-02-03 14:00 ` Jason Gunthorpe 2021-02-03 14:47 ` Gal Pressman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200917200638.GM8409@ziepe.ca \ --to=jgg@ziepe.ca \ --cc=aarcange@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=gokhale2@llnl.gov \ --cc=hch@lst.de \ --cc=jack@suse.cz \ --cc=jannh@google.com \ --cc=jhubbard@nvidia.com \ --cc=kirill@shutemov.name \ --cc=ktkhai@virtuozzo.com \ --cc=leonro@nvidia.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mcfadden8@llnl.gov \ --cc=oleg@redhat.com \ --cc=peterx@redhat.com \ --cc=torvalds@linux-foundation.org \ --cc=yang.shi@linux.alibaba.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
LKML Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \ linux-kernel@vger.kernel.org public-inbox-index lkml Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git