From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1433FC4361B for ; Tue, 8 Dec 2020 19:57:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A5FBB23C32 for ; Tue, 8 Dec 2020 19:57:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A5FBB23C32 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3C7FB6B005C; Tue, 8 Dec 2020 14:57:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 378486B005D; Tue, 8 Dec 2020 14:57:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 267766B0068; Tue, 8 Dec 2020 14:57:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0053.hostedemail.com [216.40.44.53]) by kanga.kvack.org (Postfix) with ESMTP id 116BA6B005C for ; Tue, 8 Dec 2020 14:57:57 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id C2818181AEF1D for ; Tue, 8 Dec 2020 19:57:56 +0000 (UTC) X-FDA: 77571175752.24.boys02_5c0e293273e9 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id A262E1A4AA for ; Tue, 8 Dec 2020 19:57:56 +0000 (UTC) X-HE-Tag: boys02_5c0e293273e9 X-Filterd-Recvd-Size: 5574 Received: from mail-qt1-f196.google.com (mail-qt1-f196.google.com [209.85.160.196]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 19:57:56 +0000 (UTC) Received: by mail-qt1-f196.google.com with SMTP id h19so10590436qtq.13 for ; Tue, 08 Dec 2020 11:57:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=qbjcINAqtbAOc5K43cJejwuoqw2i+E4waQYx9or4g2w=; b=avn66uK7KRy26yEbHBId07cZHEdt0HSy9B5na9TIUbDDYvOEFtM0rHOmTf5vXWvKFk bfOlVr3zU5UvYtIRY1yl5JUEXVogg5CYMw/2TeaN7EDC/B74nWBrbQmdnUnNP9GKnCG3 DDhY9KVlDM9sQmNjG9Yqb5DfsVswH4kRZHkemcDAesEbZWx7SyfP52s5mO29ry3YaFqu Tg0GY92TY+vVpsWoC9YKe1pOZAYHXm4aUgx3qwp+d37Y5xr9k57xlQpqXXLr60hw89r8 FZONK73UgKVE/W7QV++wV9pfAbq6InjgcH+xNHSJxSgEgFUsyI3N3qyjAyzA/uO6PJ2Q DC8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=qbjcINAqtbAOc5K43cJejwuoqw2i+E4waQYx9or4g2w=; b=nqafGB8ZIZLl+Mja+t/RxYRjJC+IMjizWlMK6nPXhcYwFIhEh9uVk0nZ+eOn9UDWgA 4TFnDYEOoTnKgx5iFHrCmvvoxtRAH/qrxFtMr+EWVG5IFUfQw2jgyT9HOtdOfyfLeq0x 6H8gxQZvfESjTxmmZIwEPwaTD4l/2KAiVjh1RCHIA+xpbCBH33JXcz2v/zjW38BfNHi4 kAOsZTEFD4Vvbs7VHHWpkjh0IHkY6TyD20oTd5zigL6KSgMNnbOwAbCzAEX15qMTdk/I 6yUhVPWrMEPEcYiFOD2FnQpWl/iaMp38pxNgog85m70HGomKdCgU/MbVZ6wV18MkqYCX Ei/Q== X-Gm-Message-State: AOAM533gdWNVTkGkh1U/Z4anvIyHSMcGo6XDp6vRqWKZ1CF/MUFBf97I nrJ3kGXNRjxdFAFPyCm3S1mmWg== X-Google-Smtp-Source: ABdhPJxegPB9ItxRMcREVuoReJay+obFwi3LdJ5rMKuEn7VasPTXBrdUGZroJzKdxIUseSAEQNNoWQ== X-Received: by 2002:ac8:7288:: with SMTP id v8mr13647070qto.358.1607457475523; Tue, 08 Dec 2020 11:57:55 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-162-115-133.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.115.133]) by smtp.gmail.com with ESMTPSA id u26sm2245637qke.57.2020.12.08.11.57.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 11:57:54 -0800 (PST) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kmj7S-0080oi-C2; Tue, 08 Dec 2020 15:57:54 -0400 Date: Tue, 8 Dec 2020 15:57:54 -0400 From: Jason Gunthorpe To: Joao Martins Cc: linux-mm@kvack.org, Dan Williams , Ira Weiny , linux-nvdimm@lists.01.org, Matthew Wilcox , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton Subject: Re: [PATCH RFC 9/9] mm: Add follow_devmap_page() for devdax vmas Message-ID: <20201208195754.GR5487@ziepe.ca> References: <20201208172901.17384-1-joao.m.martins@oracle.com> <20201208172901.17384-11-joao.m.martins@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201208172901.17384-11-joao.m.martins@oracle.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 08, 2020 at 05:29:01PM +0000, Joao Martins wrote: > Similar to follow_hugetlb_page() add a follow_devmap_page which rather > than calling follow_page() per 4K page in a PMD/PUD it does so for the > entire PMD, where we lock the pmd/pud, get all pages , unlock. > > While doing so, we only change the refcount once when PGMAP_COMPOUND is > passed in. > > This let us improve {pin,get}_user_pages{,_longterm}() considerably: > > $ gup_benchmark -f /dev/dax0.2 -m 16384 -r 10 -S [-U,-b,-L] -n 512 -w > > () [before] -> [after] > (get_user_pages 2M pages) ~150k us -> ~8.9k us > (pin_user_pages 2M pages) ~192k us -> ~9k us > (pin_user_pages_longterm 2M pages) ~200k us -> ~19k us > > Signed-off-by: Joao Martins > --- > I've special-cased this to device-dax vmas given its similar page size > guarantees as hugetlbfs, but I feel this is a bit wrong. I am > replicating follow_hugetlb_page() as RFC ought to seek feedback whether > this should be generalized if no fundamental issues exist. In such case, > should I be changing follow_page_mask() to take either an array of pages > or a function pointer and opaque arguments which would let caller pick > its structure? I would be extremely sad if this was the only way to do this :( We should be trying to make things more general. The hmm_range_fault_path() doesn't have major special cases for device, I am struggling to understand why gup fast and slow do. What we've talked about is changing the calling convention across all of this to something like: struct gup_output { struct page **cur; struct page **end; unsigned long vaddr; [..] } And making the manipulator like you saw for GUP common: gup_output_single_page() gup_output_pages() Then putting this eveywhere. This is the pattern that we ended up with in hmm_range_fault, and it seems to be working quite well. fast/slow should be much more symmetric in code than they are today, IMHO.. I think those differences mainly exist because it used to be siloed in arch code. Some of the differences might be bugs, we've seen that a few times at least.. Jason