From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91AE6CA9EA0 for ; Wed, 23 Oct 2019 00:52:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2C171208C0 for ; Wed, 23 Oct 2019 00:52:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="E1fC8UTL" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2C171208C0 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 719326B0003; Tue, 22 Oct 2019 20:52:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6CA106B0006; Tue, 22 Oct 2019 20:52:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5DF316B0007; Tue, 22 Oct 2019 20:52:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0207.hostedemail.com [216.40.44.207]) by kanga.kvack.org (Postfix) with ESMTP id 388176B0003 for ; Tue, 22 Oct 2019 20:52:13 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id AA113181AF5C1 for ; Wed, 23 Oct 2019 00:52:12 +0000 (UTC) X-FDA: 76073222904.22.word83_87a96c0346053 X-HE-Tag: word83_87a96c0346053 X-Filterd-Recvd-Size: 6597 Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com [209.85.214.195]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Wed, 23 Oct 2019 00:52:12 +0000 (UTC) Received: by mail-pl1-f195.google.com with SMTP id j11so9190729plk.3 for ; Tue, 22 Oct 2019 17:52:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=Srd6gml0YLGx1I577UoqvNSMcSiLgu2cTCSMasmakCE=; b=E1fC8UTLD3GGDIef9dWBus1qtkkmtrAcx1kuFUFA0HMa6PVDEe+k0alKpZHj5wHZIG HpskJIgaFND8TR6D2YN86I9cDYiRF3aEhmMH69RsXSsHFlqA5M1AVqt67btfZXQSUOJm 67/j4apzvaAoxbq//X21PI2f/aHbbrR1WNWTZl29HeaLcLbZTm+LMzWW/0PzVl6B6cgd /BI5xIYJ4HSxRRA13FvrHFh4NmgR0msBEljG2tsYOyhqQt23O2Bsg5Yd+BhNxZSYSb8F 2o/wDghmzo8c0xyZo8pFj90UXzPoMxwNFQrfQbb+kKQnlUClDEiDCgdA1+ljMOqcswOW 6IIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=Srd6gml0YLGx1I577UoqvNSMcSiLgu2cTCSMasmakCE=; b=beywk33euRBrwmiH/yangViQ9nkFTDrIstX06FaDC/lJX6UnZn4NfQjm+txe/6Unwg r0aA8xGluP3QKINGh+rmXXaSfHhyxYdnOT7/lOoAQ37gLS04cXkITf4xR+ZiK4AtxZrv 5c+8ovc7TvAD9yTI8UYPyLtscLCcXtPd4IKdINwmyZRJ3+g+Lpv7P9TTJZ3WUoUFQFfu UuGwCcHpapBqHMeKLF/MBItIQ/F2shn+tBnye82ETSD6sbybqxZi5nyeTcy/AchiBd63 bAIaS/QUK+cg8ZqD8zMcXjpXOURSye78YSdPSHHGUnbvXlJKDpMUtz8sy1Op8sEl1kEA oHTQ== X-Gm-Message-State: APjAAAVtC/TEuIH9ODtEtvQdGGKFWnmV/L9wlzcVKqernj3zlIT1YmcZ fef7aj6yRI0tZyZf+jyPJeQNLw== X-Google-Smtp-Source: APXvYqzGK6vEETGws/tO0syMU7KrEpXi2MFN3psYj9A5lbZHKfsITgzGyRZ6no5eBq0q3Yj3nyJk1A== X-Received: by 2002:a17:902:6ac8:: with SMTP id i8mr6349524plt.164.1571791929082; Tue, 22 Oct 2019 17:52:09 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id 64sm21139510pfx.31.2019.10.22.17.52.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Oct 2019 17:52:08 -0700 (PDT) Date: Tue, 22 Oct 2019 17:52:07 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Waiman Long cc: Michal Hocko , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Johannes Weiner , Roman Gushchin , Vlastimil Babka , Konstantin Khlebnikov , Jann Horn , Song Liu , Greg Kroah-Hartman , Rafael Aquini , Mel Gorman Subject: Re: [PATCH] mm/vmstat: Reduce zone lock hold time when reading /proc/pagetypeinfo In-Reply-To: Message-ID: References: <20191022162156.17316-1-longman@redhat.com> <20191022165745.GT9379@dhcp22.suse.cz> <0b206255-5c62-18f5-d751-a5576a6c0e8f@redhat.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 22 Oct 2019, Waiman Long wrote: > >>> and used nr_free to compute the missing count. Since MIGRATE_MOVABLE > >>> is usually the largest one on large memory systems, this is the one > >>> to be skipped. Since the printing order is migration-type => order, we > >>> will have to store the counts in an internal 2D array before printing > >>> them out. > >>> > >>> Even by skipping the MIGRATE_MOVABLE pages, we may still be holding the > >>> zone lock for too long blocking out other zone lock waiters from being > >>> run. This can be problematic for systems with large amount of memory. > >>> So a check is added to temporarily release the lock and reschedule if > >>> more than 64k of list entries have been iterated for each order. With > >>> a MAX_ORDER of 11, the worst case will be iterating about 700k of list > >>> entries before releasing the lock. > >> But you are still iterating through the whole free_list at once so if it > >> gets really large then this is still possible. I think it would be > >> preferable to use per migratetype nr_free if it doesn't cause any > >> regressions. > >> > > Yes, it is still theoretically possible. I will take a further look at > > having per-migrate type nr_free. BTW, there is one more place where the > > free lists are being iterated with zone lock held - mark_free_pages(). > > Looking deeper into the code, the exact migration type is not stored in > the page itself. An initial movable page can be stolen to be put into > another migration type. So in a delete or move from free_area, we don't > know exactly what migration type the page is coming from. IOW, it is > hard to get accurate counts of the number of entries in each lists. > I think the suggestion is to maintain a nr_free count of the free_list for each order for each migratetype so anytime a page is added or deleted from the list, the nr_free is adjusted. Then the free_area's nr_free becomes the sum of its migratetype's nr_free at that order. That's possible to do if you track the migratetype per page, as you said, or like pcp pages track it as part of page->index. It's a trade-off on whether you want to impact the performance of maintaining these new nr_frees anytime you manipulate the freelists. I think Vlastimil and I discussed per order per migratetype nr_frees in the past and it could be a worthwhile improvement for other reasons, specifically it leads to heuristics that can be used to determine how fragmentated a certain migratetype is for a zone, i.e. a very quick way to determine what ratio of pages over all MIGRATE_UNMOVABLE pageblocks are free. Or maybe there are other reasons why these nr_frees can't be maintained anymore? (I had a patch to do it on 4.3.) You may also find systems where MIGRATE_MOVABLE is not actually the longest free_list compared to other migratetypes on a severely fragmented system, so special casing MIGRATE_MOVABLE might not be the best way forward.