From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64C1CC10F14 for ; Thu, 3 Oct 2019 19:52:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 37FE3215EA for ; Thu, 3 Oct 2019 19:52:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Cs5bN0Ms" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733273AbfJCTwg (ORCPT ); Thu, 3 Oct 2019 15:52:36 -0400 Received: from mail-pl1-f195.google.com ([209.85.214.195]:46352 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726677AbfJCTwg (ORCPT ); Thu, 3 Oct 2019 15:52:36 -0400 Received: by mail-pl1-f195.google.com with SMTP id q24so1977212plr.13 for ; Thu, 03 Oct 2019 12:52:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=At1JkwuXqZ3LFuZLMcGPn7BnqP6xgEOzm+oOPuoeZCw=; b=Cs5bN0Ms2J4H8eerWkKl+7VV4n6Drh5DalQ/YypF26lf038jPpPifIAM2oMFqh7kzu pfz26TAJbjuvoDyXxCizOyusCi+eM6NsntiQXXMvO1N6uLYyJg8HIWwMBz3elISvmfLV xwLook/rzaw2fF6V+6J1qDgniYYGc0ipVVzD+85wubnQDLtU5pyEE7Hvn7DO8gGIqjwn YSrqMS8kM5tNChmzwQCU3pekXwPB7CTlOKN/6kmVIJWGF7wZAXR4SqFEzQ7Xe7TPuMru Xiw/yv/+41TkWXP0k5Zo4ZlrnxqFroqTLoKxCXlJzy2OV1SqH/+PeEavAg6ClvWu+4Z9 HgaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=At1JkwuXqZ3LFuZLMcGPn7BnqP6xgEOzm+oOPuoeZCw=; b=Kth4+Ogoyqn4xrp2umE38XErXlGGn/f/5tHlOf+NJYzJhT646j+Fjt/i5lpAj4yd1G F6I7zOXIPyrUz5Evd54q5h+QkhJOxU8IOE+QjxQzZfBn0t+f0O6CUuPTDJ/QOxQnrUKn RtCrseUshJQyOQFxiA1TJe5W6aW9mKCc4dBTxKk9O7mmscpjg9n2ONaStGUyPCRY/tS/ UllLlHY09L0vNu2+JdGDi9NW5byJImfmMaBsdlwkXeaw5z2NMcBCSJ4J31Sdsl2eY+Fz 8DPgiFoIWyJOzOd+s8GIMv5VnxKsdJmeWDm2klUHUosGh9YAAWMC64dK/eRWeMySO3BI q+tg== X-Gm-Message-State: APjAAAWZgeK1p1qKIjtQcWbsWjTepBF6aHhKxyfAuv9+S0bH/bQTob5E 0MHXTJl5Q68nWvi/gtmRFL6YOw== X-Google-Smtp-Source: APXvYqxWHU1cLDLEYvXF5ogNeQqdEl5RjrAy7PbpU/nMugHciefJV2xdXBvSWbbkOjkUnM8I8/cz+g== X-Received: by 2002:a17:902:bd4a:: with SMTP id b10mr11269472plx.305.1570132355113; Thu, 03 Oct 2019 12:52:35 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id d20sm6380430pfq.88.2019.10.03.12.52.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Oct 2019 12:52:34 -0700 (PDT) Date: Thu, 3 Oct 2019 12:52:33 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Vlastimil Babka cc: Mike Kravetz , Michal Hocko , Linus Torvalds , Andrea Arcangeli , Andrew Morton , Mel Gorman , "Kirill A. Shutemov" , Linux Kernel Mailing List , Linux-MM Subject: Re: [rfc] mm, hugetlb: allow hugepage allocations to excessively reclaim In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 3 Oct 2019, Vlastimil Babka wrote: > I think the key differences between Mike's tests and Michal's is this part > from Mike's mail linked above: > > "I 'tested' by simply creating some background activity and then seeing > how many hugetlb pages could be allocated. Of course, many tries over > time in a loop." > > - "some background activity" might be different than Michal's pre-filling > of the memory with (clean) page cache > - "many tries over time in a loop" could mean that kswapd has time to > reclaim and eventually the new condition for pageblock order will pass > every few retries, because there's enough memory for compaction and it > won't return COMPACT_SKIPPED > I'll rely on Mike, the hugetlb maintainer, to assess the trade-off between the potential for encountering very expensive reclaim as Andrea did and the possibility of being able to allocate additional hugetlb pages at runtime if we did that expensive reclaim. For parity with previous kernels it seems reasonable to ask that this remains unchanged since allocating large amounts of hugetlb pages has different latency expectations than during page fault. This patch is available if he'd prefer to go that route. On the other hand, userspace could achieve similar results if it were to use vm.drop_caches and explicitly triggered compaction through either procfs or sysfs before writing to vm.nr_hugepages, and that would be much faster because it would be done in one go. Users who allocate through the kernel command line would obviously be unaffected. Commit b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") was written with the latter in mind. Mike subsequently requested that hugetlb not be impacted at least provisionally until it could be further assessed. I'd suggest that latter: let the user initiate expensive reclaim and/or compaction when tuning vm.nr_hugepages and leave no surprises for users using hugetlb overcommit, but I wouldn't argue against either approach, he knows the users and expectations of hugetlb far better than I do. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45F96C4CED1 for ; Thu, 3 Oct 2019 19:52:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0B166207FF for ; Thu, 3 Oct 2019 19:52:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Cs5bN0Ms" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0B166207FF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9FC076B0005; Thu, 3 Oct 2019 15:52:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9AD436B0006; Thu, 3 Oct 2019 15:52:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 874036B0007; Thu, 3 Oct 2019 15:52:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0210.hostedemail.com [216.40.44.210]) by kanga.kvack.org (Postfix) with ESMTP id 5FACD6B0005 for ; Thu, 3 Oct 2019 15:52:37 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id F15B2824CA04 for ; Thu, 3 Oct 2019 19:52:36 +0000 (UTC) X-FDA: 76003520712.05.dogs77_33e1438e7fa36 X-HE-Tag: dogs77_33e1438e7fa36 X-Filterd-Recvd-Size: 5562 Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com [209.85.214.195]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Thu, 3 Oct 2019 19:52:36 +0000 (UTC) Received: by mail-pl1-f195.google.com with SMTP id u20so1998500plq.4 for ; Thu, 03 Oct 2019 12:52:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=At1JkwuXqZ3LFuZLMcGPn7BnqP6xgEOzm+oOPuoeZCw=; b=Cs5bN0Ms2J4H8eerWkKl+7VV4n6Drh5DalQ/YypF26lf038jPpPifIAM2oMFqh7kzu pfz26TAJbjuvoDyXxCizOyusCi+eM6NsntiQXXMvO1N6uLYyJg8HIWwMBz3elISvmfLV xwLook/rzaw2fF6V+6J1qDgniYYGc0ipVVzD+85wubnQDLtU5pyEE7Hvn7DO8gGIqjwn YSrqMS8kM5tNChmzwQCU3pekXwPB7CTlOKN/6kmVIJWGF7wZAXR4SqFEzQ7Xe7TPuMru Xiw/yv/+41TkWXP0k5Zo4ZlrnxqFroqTLoKxCXlJzy2OV1SqH/+PeEavAg6ClvWu+4Z9 HgaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=At1JkwuXqZ3LFuZLMcGPn7BnqP6xgEOzm+oOPuoeZCw=; b=nrK4+4PC3mCwmThI5u+qK9JanTPuojSgU7C9AJ3eF4LzW/WXjIJAcrZoYyzbQbMW0S PbWa6srTFk73UUwwIhKOvJVz9/BOPIRUPBAok2ExwDgjDij2X4tLNQ1gL283nmQXBLjd 7z51FqN3lccQGGSka+AtATXsNq9k15z4s9qgkA4i4WOa2PhWLH5Crx+s02zdI8woH2R4 GmaCCLNA/QRtnHvX5fejZcAgir6HK6HGbhfAZ8/t4Lui9RDvZihCkf1cWfYVtM5dfQAS PL1DU4FvwIMEi4hc1R1epfDa9UMG9CT6HuwCh+J/+HO7ndkdxg1Cx6uA+bpjY/Ezc3kN qO0Q== X-Gm-Message-State: APjAAAWG1DluViFoM1MEV3L4WzxmutyCKjdeP1NhByH0SPHyjn5Rq97x fCj6+bMwarqhQlLLqX+YwyDz8w== X-Google-Smtp-Source: APXvYqxWHU1cLDLEYvXF5ogNeQqdEl5RjrAy7PbpU/nMugHciefJV2xdXBvSWbbkOjkUnM8I8/cz+g== X-Received: by 2002:a17:902:bd4a:: with SMTP id b10mr11269472plx.305.1570132355113; Thu, 03 Oct 2019 12:52:35 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id d20sm6380430pfq.88.2019.10.03.12.52.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Oct 2019 12:52:34 -0700 (PDT) Date: Thu, 3 Oct 2019 12:52:33 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Vlastimil Babka cc: Mike Kravetz , Michal Hocko , Linus Torvalds , Andrea Arcangeli , Andrew Morton , Mel Gorman , "Kirill A. Shutemov" , Linux Kernel Mailing List , Linux-MM Subject: Re: [rfc] mm, hugetlb: allow hugepage allocations to excessively reclaim In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 3 Oct 2019, Vlastimil Babka wrote: > I think the key differences between Mike's tests and Michal's is this part > from Mike's mail linked above: > > "I 'tested' by simply creating some background activity and then seeing > how many hugetlb pages could be allocated. Of course, many tries over > time in a loop." > > - "some background activity" might be different than Michal's pre-filling > of the memory with (clean) page cache > - "many tries over time in a loop" could mean that kswapd has time to > reclaim and eventually the new condition for pageblock order will pass > every few retries, because there's enough memory for compaction and it > won't return COMPACT_SKIPPED > I'll rely on Mike, the hugetlb maintainer, to assess the trade-off between the potential for encountering very expensive reclaim as Andrea did and the possibility of being able to allocate additional hugetlb pages at runtime if we did that expensive reclaim. For parity with previous kernels it seems reasonable to ask that this remains unchanged since allocating large amounts of hugetlb pages has different latency expectations than during page fault. This patch is available if he'd prefer to go that route. On the other hand, userspace could achieve similar results if it were to use vm.drop_caches and explicitly triggered compaction through either procfs or sysfs before writing to vm.nr_hugepages, and that would be much faster because it would be done in one go. Users who allocate through the kernel command line would obviously be unaffected. Commit b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") was written with the latter in mind. Mike subsequently requested that hugetlb not be impacted at least provisionally until it could be further assessed. I'd suggest that latter: let the user initiate expensive reclaim and/or compaction when tuning vm.nr_hugepages and leave no surprises for users using hugetlb overcommit, but I wouldn't argue against either approach, he knows the users and expectations of hugetlb far better than I do.