From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53E9AC43387 for ; Wed, 16 Jan 2019 22:51:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 29DF520840 for ; Wed, 16 Jan 2019 22:51:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387939AbfAPWvd (ORCPT ); Wed, 16 Jan 2019 17:51:33 -0500 Received: from ipmail03.adl6.internode.on.net ([150.101.137.143]:37693 "EHLO ipmail03.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728877AbfAPWvd (ORCPT ); Wed, 16 Jan 2019 17:51:33 -0500 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail03.adl6.internode.on.net with ESMTP; 17 Jan 2019 09:21:28 +1030 Received: from dave by dastard with local (Exim 4.80) (envelope-from ) id 1gju1v-0003AG-1d; Thu, 17 Jan 2019 09:51:27 +1100 Date: Thu, 17 Jan 2019 09:51:27 +1100 From: Dave Chinner To: Jerome Glisse Cc: Dan Williams , John Hubbard , Jan Kara , Matthew Wilcox , John Hubbard , Andrew Morton , Linux MM , tom@talpey.com, Al Viro , benve@cisco.com, Christoph Hellwig , Christopher Lameter , "Dalessandro, Dennis" , Doug Ledford , Jason Gunthorpe , Michal Hocko , Mike Marciniszyn , rcampbell@nvidia.com, Linux Kernel Mailing List , linux-fsdevel Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions Message-ID: <20190116225126.GS4205@dastard> References: <20190115080759.GC29524@quack2.suse.cz> <20190115171557.GB3696@redhat.com> <752839e6-6cb3-a6aa-94cb-63d3d4265934@nvidia.com> <20190115221205.GD3696@redhat.com> <99110c19-3168-f6a9-fbde-0a0e57f67279@nvidia.com> <20190116015610.GH3696@redhat.com> <20190116022312.GJ3696@redhat.com> <20190116043455.GP4205@dastard> <20190116145016.GB3617@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190116145016.GB3617@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 16, 2019 at 09:50:16AM -0500, Jerome Glisse wrote: > On Wed, Jan 16, 2019 at 03:34:55PM +1100, Dave Chinner wrote: > > On Tue, Jan 15, 2019 at 09:23:12PM -0500, Jerome Glisse wrote: > > > On Tue, Jan 15, 2019 at 06:01:09PM -0800, Dan Williams wrote: > > > > On Tue, Jan 15, 2019 at 5:56 PM Jerome Glisse wrote: > > > > > On Tue, Jan 15, 2019 at 04:44:41PM -0800, John Hubbard wrote: > > > > [..] > > > > > To make it clear. > > > > > > > > > > Lock code: > > > > > GUP() > > > > > ... > > > > > lock_page(page); > > > > > if (PageWriteback(page)) { > > > > > unlock_page(page); > > > > > wait_stable_page(page); > > > > > goto retry; > > > > > } > > > > > atomic_add(page->refcount, PAGE_PIN_BIAS); > > > > > unlock_page(page); > > > > > > > > > > test_set_page_writeback() > > > > > bool pinned = false; > > > > > ... > > > > > pinned = page_is_pin(page); // could be after TestSetPageWriteback > > > > > TestSetPageWriteback(page); > > > > > ... > > > > > return pinned; > > > > > > > > > > Memory barrier: > > > > > GUP() > > > > > ... > > > > > atomic_add(page->refcount, PAGE_PIN_BIAS); > > > > > smp_mb(); > > > > > if (PageWriteback(page)) { > > > > > atomic_add(page->refcount, -PAGE_PIN_BIAS); > > > > > wait_stable_page(page); > > > > > goto retry; > > > > > } > > > > > > > > > > test_set_page_writeback() > > > > > bool pinned = false; > > > > > ... > > > > > TestSetPageWriteback(page); > > > > > smp_wmb(); > > > > > pinned = page_is_pin(page); > > > > > ... > > > > > return pinned; > > > > > > > > > > > > > > > One is not more complex than the other. One can contend, the other > > > > > will _never_ contend. > > > > > > > > The complexity is in the validation of lockless algorithms. It's > > > > easier to reason about locks than barriers for the long term > > > > maintainability of this code. I'm with Jan and John on wanting to > > > > explore lock_page() before a barrier-based scheme. > > > > > > How is the above hard to validate ? > > > > Well, if you think it's so easy, then please write the test cases so > > we can add them to fstests and make sure that we don't break it in > > future. > > > > If you can't write filesystem test cases that exercise these race > > conditions reliably, then the answer to your question is "it is > > extremely hard to validate" and the correct thing to do is to start > > with the simple lock_page() based algorithm. > > > > Premature optimisation in code this complex is something we really, > > really need to avoid. > > Litmus test shows that this never happens, i am attaching 2 litmus > test one with barrier and one without. Without barrier we can see > the double negative !PageWriteback in GUP and !page_pinned() in > test_set_page_writeback() (0:EAX = 0; 1:EAX = 0; below) That's not a regression test, nor does it actually test the code that the kernel runs. It's just an extremely simplified model of a small part of the algorithm. Sure, that specific interaction is fine, but that in no way reflects the complexity of the code or the interactions with other code that interacts with that state. And it's not something we can use to detect that some future change has broken gup vs writeback synchronisation. Memory barriers might be fast, but they are hell for anyone but the person who wrote the algorithm to understand. If a simple page lock is good enough and doesn't cause performance problems, then use the simple page lock mechanism and stop trying to be clever. Other people who will have to understand and debug issues in this code are much less accepting of such clever algorithms. We've been badly burnt on repeated occasions by broken memory barriers in code heavily optimised for performance (*cough* rwsems *cough*), so I'm extremely wary of using memory ordering dependent algorithms in places where they are not necessary. Cheers, Dave. -- Dave Chinner david@fromorbit.com