From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B29A2C4727F for ; Mon, 28 Sep 2020 23:57:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2D217208FE for ; Mon, 28 Sep 2020 23:57:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="YWHwPdmV" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2D217208FE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3E9046B005C; Mon, 28 Sep 2020 19:57:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39B036B005D; Mon, 28 Sep 2020 19:57:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 265068E0001; Mon, 28 Sep 2020 19:57:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id 118776B005C for ; Mon, 28 Sep 2020 19:57:43 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id CB737181AE86A for ; Mon, 28 Sep 2020 23:57:42 +0000 (UTC) X-FDA: 77314135164.23.bed99_281469927185 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id A92073760C for ; Mon, 28 Sep 2020 23:57:42 +0000 (UTC) X-HE-Tag: bed99_281469927185 X-Filterd-Recvd-Size: 6984 Received: from mail-qt1-f196.google.com (mail-qt1-f196.google.com [209.85.160.196]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Mon, 28 Sep 2020 23:57:42 +0000 (UTC) Received: by mail-qt1-f196.google.com with SMTP id e7so2240385qtj.11 for ; Mon, 28 Sep 2020 16:57:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Jr289Sb1LowRnJCbwbwIDCYxSQ64x8Enfs9iqwh+h9U=; b=YWHwPdmVv40s6TS/bg6Gd0qcM1kga54SKTXTa0doxtZ8oSlAWBebc6gcZ4f2FgsC+r WfJOwambFCHZut+6E+BwrQdk3wO9Jg7F8u9bOBa00F+/b4pGN62J1FFPDd0V+BcM0BjJ z0jCoD9tSujWyOgFJHnH/2o5fP3E7ItHzYPwO8DM1PNj+E73bX3U+h/L4OV2PzeYRdhO v4ZJzoUdh3r97iBBqWovHiu9SpJQFZMDZ8zUycLugaZOB01JSiw2WU8PAWSqSfIz31Ip 8jmYkELq2Hh3rP9rBeg7esrTmOuzyJAn9levkoWMGcF73V3lPmC511TPOCr/YJWdjOPz /UWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Jr289Sb1LowRnJCbwbwIDCYxSQ64x8Enfs9iqwh+h9U=; b=KXjaxtsDdNh7g26YkfCEF+8HyXCDYSahSa1WmbVU5lle5667huH8eNaxQEFVr4TBbd 4aLvTYKJV8Pz1+41YTyBl+cxG/ZHHvNcvGbB5jLOp33JK+PmRSIRwOCNOmVWSSAKqpF2 AEhq9fMDGetGBXvb6H40HADSpOQRGb/ENKD7juX/O1XMeAyLYbQ06ZXgBdJcnYtKWn28 uq/w7lCMEucf7857MpFqaKu7ixOyMwhj89LvQDKCIrgcOmU1vLNTo5Lj0vR8owvBA+xj ado41bV1w0VvUXgWDCF634Wu9wlUKFgcpypEhtJSPIFa64LOYKqeZ99vtHSsCfHsC59C VdIg== X-Gm-Message-State: AOAM532FydD2GHdBYtzSiFgdonHwyHly+XRFuHsJvCzNsAfVx84lP+IC KME3fTHnHoaGvQ6YNJ09IIP7SA== X-Google-Smtp-Source: ABdhPJwKkSTWsdg8TEwhFijWvMsfDsLOlaz4M0kLaunzGu267STOwjT0NPIjZ0foGkitWmFygEvzMw== X-Received: by 2002:ac8:192b:: with SMTP id t40mr678472qtj.60.1601337461606; Mon, 28 Sep 2020 16:57:41 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-156-34-48-30.dhcp-dynamic.fibreop.ns.bellaliant.net. [156.34.48.30]) by smtp.gmail.com with ESMTPSA id r21sm3163199qtj.80.2020.09.28.16.57.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Sep 2020 16:57:40 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kN31Y-002bVC-0Q; Mon, 28 Sep 2020 20:57:40 -0300 Date: Mon, 28 Sep 2020 20:57:39 -0300 From: Jason Gunthorpe To: Linus Torvalds Cc: Peter Xu , Leon Romanovsky , John Hubbard , Linux-MM , Linux Kernel Mailing List , Andrew Morton , Jan Kara , Michal Hocko , Kirill Tkhai , Kirill Shutemov , Hugh Dickins , Christoph Hellwig , Andrea Arcangeli , Oleg Nesterov , Jann Horn Subject: Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned Message-ID: <20200928235739.GU9916@ziepe.ca> References: <20200927062337.GE2280698@unreal> <20200928124937.GN9916@ziepe.ca> <20200928172256.GB59869@xz-x1> <20200928183928.GR9916@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 28, 2020 at 12:29:55PM -0700, Linus Torvalds wrote: > So a read pin action would basically never work for the fast-path for > a few cases, notably a shared read-only mapping - because we could > never mark it in the page tables as "fast pin accessible" Agree, I was assuming we'd loose more of the fast path to create this thing. It would only still be fast if the pages are already writable. I strongly suspect the case of DMA'ing actual read-only data is the minority here, the usual case is probably filling a writable buffer with something interesting and then triggering the DMA. The DMA just happens to be read from the driver view so the driver doesn't set FOLL_WRITE. Looking at the FOLL_LONGTERM users, which should be the banner usecase for this, there are very few that do a read pin and use fast. > And it would basically have no advantages over a writable FOLL_PIN. It > would break the association with any backing store for private pages, > because otherwise it can't follow future writes. Yes, I wasn't clear enough, I'm looking at this from a driver API perspective. We have this API pin_user_pages(FOLL_LONGTERM | FOLL_WRITE) Which now has no decoherence issues with the MM. If the driver naturally wants to do read-only access it might be tempted to do: pin_user_pages(FOLL_LONGTERM) Which is now NOT the same thing and brings all these really surprising mm coherence issues back. The driver author might discover this in testing, then be tempted to hardwire 'FOLL_LONGTERM | FOLL_WRITE'. Now their uAPI is broken for things that are actually read-only like .rodata. If they discover this then they add a FOLL_FORCE to the mix. When someone comes along to read this later it is a big leap to see pin_user_pages(FOLL_LONGTERM | FOLL_FORCE | FOLL_WRITE) and realize this is code for "read only mapping". At least it took me a while to decipher it the first time I saw it. I think this is really hard to use and ugly. My thinking has been to just stick: if (flags & FOLL_LONGTERM) flags |= FOLL_FORCE | FOLL_WRITE In pin_user_pages(). It would make the driver API cleaner. If we can do a bit better somehow by not COW'ing for certain VMA's as you explained then all the better, but not my primary goal.. Basically, I think if a driver is using FOLL_LONGTERM | FOLL_PIN we should guarentee that driver a consistent MM and take the gup_fast performance hit to do it. AFAICT the giant wack of other cases not using FOLL_LONGTERM really shouldn't care about read-decoherence. For those cases the user should really not be racing write's with data under read-only pin, and the new COW logic looks like it solves the other issues with this. I know Jann/John have been careful to not have special behaviors for the DMA case, but I think it makes sense here. It is actually different. Jason