From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 533F2C433E0 for ; Wed, 17 Feb 2021 04:24:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B07F264DA8 for ; Wed, 17 Feb 2021 04:24:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B07F264DA8 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F33BF8D001C; Tue, 16 Feb 2021 23:24:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EE4568D0019; Tue, 16 Feb 2021 23:24:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DAD3A8D001C; Tue, 16 Feb 2021 23:24:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0220.hostedemail.com [216.40.44.220]) by kanga.kvack.org (Postfix) with ESMTP id C4D4F8D0019 for ; Tue, 16 Feb 2021 23:24:19 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 8F3CC759C for ; Wed, 17 Feb 2021 04:24:19 +0000 (UTC) X-FDA: 77826467838.02.song92_2508c4827649 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 77E4210097AA0 for ; Wed, 17 Feb 2021 04:24:19 +0000 (UTC) X-HE-Tag: song92_2508c4827649 X-Filterd-Recvd-Size: 5165 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Wed, 17 Feb 2021 04:24:19 +0000 (UTC) Received: by mail-pj1-f47.google.com with SMTP id cl8so766354pjb.0 for ; Tue, 16 Feb 2021 20:24:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:mime-version; bh=T4/96hUtw1KFnnG+qaU+pPQp0q1RZ8WhDKQRMumHw8k=; b=pdyMWYZuqXn/XEKMVeAodpa8NudqV55PLK0lfCA+6UPnwVasIJrLJZw6rZMOIU4ur6 oCc9qY5tEOi4ekwjo2jlW0fTASJ43gOijBxa+L1WfTP+c2KHIsmtJH2ajXzN29ZS2V/T veYI2B0n+G6FeLa4EUbBw6ZaUimNZAKbagwDmcwT3aGN7cMeWthEsI52ZFiMndbwONr9 s1SdfI58QsLd11yWLm/HyHsnWyRttOkVGJLKEJRRwHgFBFBhjTi8r1XBoZjhJS4tRGvR 41RLoCpBJLUb36+5LPatFk5nt52mOeN9bXflk2mcRqb3zdGp6aKTeajJZeFzxNGWSQVM N7pA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version; bh=T4/96hUtw1KFnnG+qaU+pPQp0q1RZ8WhDKQRMumHw8k=; b=Ym5rA1pfDy1GNJhXMvBX9rb/2di8GSqtMIfu6U8sH/yo8FcrZzeIVj3h9EdW070u+i ueTW36/dtRMvaIXcpoa3AJCbKu+daTpsrsLr+vF0uvdOPtwas05MTH+wZiOeEmtsyD19 SMC3M7q0Ci1bd8ChOtxHpDdc+mtr+Vuc8zZaMvBkWVU2cSVshRFJXy0gG52vIGwySiI4 2TUs5yDv6blYK3LdH8Cep1NwBC9wfXAFksJolkb9BOYoMcCwwkdkolL2FpgM0pDSbPBs yg2OE6B4PPwSdm9GTE/UGqdjaDf5JSewg8Q3fO3JS/Mlswy8235G8JTD+Wqbjaxg4/+n lqXw== X-Gm-Message-State: AOAM531KTBuIX5dov+0WV5MhHjswPuOTofgfudfLJTrfu9otZCBFhBAp cwWENgqNshxAHFvbjzcpUldZOQ== X-Google-Smtp-Source: ABdhPJwiai28hv/qBkDHWHz4tszlgkSODIPlj84xX/CrSMp5kyRgyYA3gVTY8v3fuEPlnJz7FH+XQA== X-Received: by 2002:a17:902:8608:b029:e2:d2e2:6347 with SMTP id f8-20020a1709028608b02900e2d2e26347mr22291272plo.43.1613535857755; Tue, 16 Feb 2021 20:24:17 -0800 (PST) Received: from [2620:15c:17:3:984e:d574:ca36:ce3c] ([2620:15c:17:3:984e:d574:ca36:ce3c]) by smtp.gmail.com with ESMTPSA id w3sm475171pjb.2.2021.02.16.20.24.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Feb 2021 20:24:17 -0800 (PST) Date: Tue, 16 Feb 2021 20:24:16 -0800 (PST) From: David Rientjes To: Alex Shi , Hugh Dickins , Andrea Arcangeli , "Kirill A. Shutemov" , Song Liu , Michal Hocko , Matthew Wilcox , Minchan Kim , Vlastimil Babka cc: Chris Kennelly , linux-mm@kvack.org Subject: [RFC] Hugepage collapse in process context Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi everybody, Khugepaged is slow by default, it scans at most 4096 pages every 10s. That's normally fine as a system-wide setting, but some applications would benefit from a more aggressive approach (as long as they are willing to pay for it). Instead of adding priorities for eligible ranges of memory to khugepaged, temporarily speeding khugepaged up for the whole system, or sharding its work for memory belonging to a certain process, one approach would be to allow userspace to induce hugepage collapse. The benefit to this approach would be that this is done in process context so its cpu is charged to the process that is inducing the collapse. Khugepaged is not involved. Idea was to allow userspace to induce hugepage collapse through the new process_madvise() call. This allows us to collapse hugepages on behalf of current or another process for a vectored set of ranges. This could be done through a new process_madvise() mode *or* it could be a flag to MADV_HUGEPAGE since process_madvise() allows for a flag parameter to be passed. For example, MADV_F_SYNC. When done, this madvise call would allocate a hugepage on the right node and attempt to do the collapse in process context just as khugepaged would otherwise do. This would immediately be useful for a malloc implementation, for example, that has released its memory back to the system using MADV_DONTNEED and will subsequently refault the memory. Rather than wait for khugepaged to come along 30m later, for example, and collapse this memory into a hugepage (which could take a much longer time on a very large system), an alternative would be to use this process_madvise() mode to induce the action up front. In other words, say "I'm returning this memory to the application and it's going to be hot, so back it by a hugepage now rather than waiting until later." It would also be useful for read-only file-backed mappings for text segments. Khugepaged should be happy, it's just less work done by generic kthreads that gets charged as an overall tax to everybody. Thoughts? Thanks!