From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 592A2C43463 for ; Thu, 17 Sep 2020 20:06:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DF72020725 for ; Thu, 17 Sep 2020 20:06:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="lcRgZ5Gz" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DF72020725 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2D8D16B005C; Thu, 17 Sep 2020 16:06:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 288976B005D; Thu, 17 Sep 2020 16:06:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18A948E0001; Thu, 17 Sep 2020 16:06:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0079.hostedemail.com [216.40.44.79]) by kanga.kvack.org (Postfix) with ESMTP id F08DA6B005C for ; Thu, 17 Sep 2020 16:06:40 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id BB67F1E01 for ; Thu, 17 Sep 2020 20:06:40 +0000 (UTC) X-FDA: 77273636160.14.desk60_1f0a8f327125 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 95FD718229818 for ; Thu, 17 Sep 2020 20:06:40 +0000 (UTC) X-HE-Tag: desk60_1f0a8f327125 X-Filterd-Recvd-Size: 6525 Received: from mail-qv1-f66.google.com (mail-qv1-f66.google.com [209.85.219.66]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Thu, 17 Sep 2020 20:06:40 +0000 (UTC) Received: by mail-qv1-f66.google.com with SMTP id q10so1670602qvs.1 for ; Thu, 17 Sep 2020 13:06:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=3Q1qejjjhezfyiurKJRhlDxB+DXAgXxFnb9TedTa7L4=; b=lcRgZ5GzpJ04pDPiF882GvGQfA2klPwdj72TcNSCwqEw7IIEOzRf1pXGmBCRxdByGn ZWILbFyH5h5Ta8bppvRX+L9ZRICb1mWgtPJygZk1EKPsj6r/MAzLelBVl8cLBeYtHyH3 /3vsvX2OX8dryVJtfjjDzZvTbdEwkpwD59PpIZbA/E6lJO6me1kJwS3tyKHzxBiAIwfJ Pe/e2mKvSQH0zN04VtLbscoqrBNtuoKgeObD+sjm+uwH+XFb7qPHR4wQSzhsg6E+ZoYb Rlm7sR59h57IQ3F+Hja05Bmrnqm/3RwZyUn2nQRoejUm1fTynhjzSShvJScutXEAm/SJ jaXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=3Q1qejjjhezfyiurKJRhlDxB+DXAgXxFnb9TedTa7L4=; b=Ziu4doXO0MmZApKTBJYlAW0/lRSeBi41xnaId8svQ94PG0CYp7Fm+yLJnWLeDiVJHS BCK/h9A3B1OV15rZgkpF26MLbq2iJBdPADKZnQihW3NX4EguVYb1b1uf42zFuCEot3pD YsEgBEiHV07J4v+a+RRQ8OsdeF+Mfdkqj5LmL9XFReJVovQmDCwSpkmgPei9rEDGK5Xi cTYfAorv4t1KkcnAy9mO7mUSuiMwnqtnZWtWzaxTn0MkR968MSGqEHQKYe++M5XTdRxH nVTCpe922jpHCKjI2CjtI77esAQxSsWg1/kJ1msbMGkDYoyMnrTCJ046smNoBMzLXlJE kYmw== X-Gm-Message-State: AOAM531yAp/1edg7h1IUgwEnLoy3u3uG436RQql5ssxiFHg5Z6ojDD5L 2XkAbi+ejw4G2ODNaC6QKQn2Sw== X-Google-Smtp-Source: ABdhPJxuZnt5wzjBKuVgR/rcdMf+nNahk7ll2ZTJLpD1Brk0GLK2cYvnAB3V+7hnzlBYIm81lj3K7g== X-Received: by 2002:a0c:d443:: with SMTP id r3mr30442508qvh.20.1600373199344; Thu, 17 Sep 2020 13:06:39 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-156-34-48-30.dhcp-dynamic.fibreop.ns.bellaliant.net. [156.34.48.30]) by smtp.gmail.com with ESMTPSA id w59sm574671qtd.1.2020.09.17.13.06.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Sep 2020 13:06:38 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kJ0Aw-000lCJ-5W; Thu, 17 Sep 2020 17:06:38 -0300 Date: Thu, 17 Sep 2020 17:06:38 -0300 From: Jason Gunthorpe To: Linus Torvalds Cc: Peter Xu , John Hubbard , Leon Romanovsky , Linux-MM , Linux Kernel Mailing List , "Maya B . Gokhale" , Yang Shi , Marty Mcfadden , Kirill Shutemov , Oleg Nesterov , Jann Horn , Jan Kara , Kirill Tkhai , Andrea Arcangeli , Christoph Hellwig , Andrew Morton Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification Message-ID: <20200917200638.GM8409@ziepe.ca> References: <20200915213330.GE2949@xz-x1> <20200915232238.GO1221970@ziepe.ca> <20200916174804.GC8409@ziepe.ca> <20200916184619.GB40154@xz-x1> <20200917112538.GD8409@ziepe.ca> <20200917181411.GA133226@xz-x1> <20200917190332.GB133226@xz-x1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 17, 2020 at 12:42:11PM -0700, Linus Torvalds wrote: > Because the whole "do page pinning without MADV_DONTFORK and then fork > the area" is I feel a very very invalid load. It sure as hell isn't > something we should care about performance for, and in fact it is > something we should very well warn for exactly to let people know > "this process is doing bad things". It is easy for things like iouring that can just allocate the queue memory they care about and MADV_DONTFORK it. Other things work more like O_DIRECT - the data it is working on is arbtiary app memory, not controlled in anyway. In RDMA we have this ugly scheme were we automatically call MADV_DONTFORK on the virtual address and hope it doesn't explode. It is very hard to call MADV_DONTFORK if you don't control the allocation. Don't want to break huge pages, have to hope really really hard that a fork doesn't need that memory. Hope you don't run out of vmas beause it causes a vma split. So ugly. So much overhead. Considering almost anything can do a fork() - we've seen app authors become confused. They say stuff is busted, support folks ask if they use fork, author says no.. Investigation later shows some hidden library did system() or whatever. In this case the tests that found this failed because they were written in Python and buried in there was some subprocess.call(). I would prefer the kernel consider it a valid work load with the semantics the sketch patch shows.. > Is there possibly somethign else we can filter on than just > GUP_PIN_COUNTING_BIAS? Because it could be as simple as just marking > the vma itself and saying "this vma has had a page pinning event done > on it". We'd have to give up pin_user_pages_fast() to do that as we can't fast walk and get vmas? Hmm, there are many users. I remember that the hfi1 folks really wanted the fast version for some reason.. > Because if we only start copying the page *iff* the vma is marked by > that "this vma had page pinning" _and_ the page count is bigger than > GUP_PIN_COUNTING_BIAS, than I think we can rest pretty easily knowing > that we aren't going to hit some regular old-fashioned UNIX server > cases with a lot of forks.. Agree Given that this is a user visible regression, it is nearly rc6, what do you prefer for next steps? Sorting out this for fork, especially if it has the vma change is probably more than a weeks time. Revert this patch and try again next cycle? Thanks, Jason