From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7303AC6369E for ; Thu, 3 Dec 2020 01:08:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BFA8A208A9 for ; Thu, 3 Dec 2020 01:08:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BFA8A208A9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B68846B005C; Wed, 2 Dec 2020 20:08:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AF32E6B005D; Wed, 2 Dec 2020 20:08:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9BA3C6B0068; Wed, 2 Dec 2020 20:08:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0235.hostedemail.com [216.40.44.235]) by kanga.kvack.org (Postfix) with ESMTP id 81E706B005C for ; Wed, 2 Dec 2020 20:08:12 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 4FCF03638 for ; Thu, 3 Dec 2020 01:08:12 +0000 (UTC) X-FDA: 77550184824.01.coach65_3711191273b7 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id 2D8DE10045C16 for ; Thu, 3 Dec 2020 01:08:12 +0000 (UTC) X-HE-Tag: coach65_3711191273b7 X-Filterd-Recvd-Size: 6861 Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Thu, 3 Dec 2020 01:08:11 +0000 (UTC) Received: by mail-qv1-f41.google.com with SMTP id cv2so201218qvb.9 for ; Wed, 02 Dec 2020 17:08:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=2ujXQeS5YHJuwp8X77KQyBVTIqxYGrywvie1/2UIiLw=; b=n47Z/lVS8IkfhPvBmVGqKOFdz4FVCTLn1dvgHz9kp+Gkj3E6aBxNjIPFp20KlR1kZl zMP5CkJ8446G5WpJQJx2ZMSmKE2DS1yuSuyfO0mUdBYgdUT5g9VcrpLMHQkfu754qbQW ntk9CX4/QqPw4fdDGujCG/qgWaVOA64vX3aZXwTGVr5zECFZulRl5zLoT0sADwTr0hFS qQk+L+F+wqxNRKT5D9fX0PKzmrpvPNkuWeMH5MoGky167OsHceYdANRqdtEfVHlsuFll QUgpBz7HDkrnBkiCckun9TrlpD+iGiMeQDKaZ8nv+YLq03KH6wa0M4GSfsqMgUZ01EiW IA6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=2ujXQeS5YHJuwp8X77KQyBVTIqxYGrywvie1/2UIiLw=; b=eUibdgAr4lee8hYjDfq9TL13zScXcdlMcaOTXYSFPXmAFgjNjVqKYLujprQcMvNvKb 0od6mrHk48CpxZlMEVMT6WTbM0tWOujgNAh9s5tNl5cZQJCwhwkY6RNB95oQ8gnEy8Jt VGlGXcTgYFFdB5ybo4uD3wyaESNpbz7gwDTxdqVu17SKvFlYkM9lO7vbfMnEEqX7VBq0 k0XDC+0dcTHfy2iKzGzw+kN6c4rz3k/hH+MrHsruJ3VxzpBXqUW63qi1Cf0p4UDZvKn7 RKCy6zsZaJbb8sjG/MOk8Xo8CnNpPKqLXs3BCWI/d/NZM6hw6GQkGPoB7OU3cIvP80qb J5qw== X-Gm-Message-State: AOAM530M0j0X86j1S/ooTvRi1/2FB9t38KtOHcqYP3leJ3WEsNouYJsL bHkzHc4rkEkNh8aUvJ+pwtM85w== X-Google-Smtp-Source: ABdhPJxxdFVazo1Bis5kAqf6NL1JQXtbPnvdZ2iWy7MFyBP7nrhTkf9GMiTj7Sblf63mVZlCBx9eDQ== X-Received: by 2002:ad4:4051:: with SMTP id r17mr977961qvp.39.1606957690812; Wed, 02 Dec 2020 17:08:10 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-156-34-48-30.dhcp-dynamic.fibreop.ns.bellaliant.net. [156.34.48.30]) by smtp.gmail.com with ESMTPSA id y189sm523255qka.30.2020.12.02.17.08.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Dec 2020 17:08:10 -0800 (PST) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1kkd6P-005HUP-Ch; Wed, 02 Dec 2020 21:08:09 -0400 Date: Wed, 2 Dec 2020 21:08:09 -0400 From: Jason Gunthorpe To: Pavel Tatashin Cc: LKML , linux-mm , Andrew Morton , Vlastimil Babka , Michal Hocko , David Hildenbrand , Oscar Salvador , Dan Williams , Sasha Levin , Tyler Hicks , Joonsoo Kim , mike.kravetz@oracle.com, Steven Rostedt , Ingo Molnar , Peter Zijlstra , Mel Gorman , Matthew Wilcox , David Rientjes , John Hubbard Subject: Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone Message-ID: <20201203010809.GQ5487@ziepe.ca> References: <20201202052330.474592-1-pasha.tatashin@soleen.com> <20201202052330.474592-7-pasha.tatashin@soleen.com> <20201202163507.GL5487@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 02, 2020 at 07:19:45PM -0500, Pavel Tatashin wrote: > > It is a good moment to say, I really dislike how this was implemented > > in the first place. > > > > Scanning the output of gup just to do the is_migrate_movable() test is > > kind of nonsense and slow. It would be better/faster to handle this > > directly while gup is scanning the page tables and adding pages to the > > list. > > Hi Jason, > > I assume you mean to migrate pages as soon as they are followed and > skip those that are faulted, as we already know that faulted pages are > allocated from nomovable zone. > > The place would be: > > __get_user_pages() > while(more pages) > get_gate_page() > follow_hugetlb_page() > follow_page_mask() > > if (!page) > faultin_page() > > if (page && !faulted && (gup_flags & FOLL_LONGTERM) ) > check_and_migrate this page Either here or perhaps even lower down the call chain when the page is captured, similar to how GUP fast would detect it. (how is that done anyhow?) > I looked at that function, and I do not think the code will be cleaner > there, as that function already has a complicated loop. That function is complicated for its own reasons.. But complicated is not the point here.. > The only drawback with the current approach that I see is that > check_and_migrate_movable_pages() has to check once the faulted > pages. Yes > This is while not optimal is not horrible. It is. > The FOLL_LONGTERM should not happen too frequently, so having one > extra nr_pages loop should not hurt the performance. FOLL_LONGTERM is typically used with very large regions, for instance we are benchmarking around the 300G level. It takes 10s of seconds for get_user_pages to operate. There are many inefficiencies in this path. This extra work of re-scanning the list is part of the cost. Further, having these special wrappers just for FOLL_LONGTERM has a spill over complexity on the entire rest of the callchain up to here, we now have endless wrappers and varieties of function calls that generally are happening because the longterm path needs to end up in a different place than other paths. IMHO this is due to the lack of integration with the primary loop above > Also, I checked and most of the users of FOLL_LONGTERM pin only one > page at a time. Which means the extra loop is only to check a single > page. Er, I don't know what you checked but those are not the cases I see. Two big users are vfio and rdma. Both are pinning huge ranges of memory in very typical use cases. > However, those changes can come after this series. The current series > fixes a bug where hot-remove is not working with making minimal amount > of changes, so it is easy to backport it to stable kernels. This is a good point, good enough that you should probably continue as is Jason