From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E186C433E0 for ; Tue, 5 Jan 2021 10:22:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 077FB22288 for ; Tue, 5 Jan 2021 10:22:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728318AbhAEKW6 (ORCPT ); Tue, 5 Jan 2021 05:22:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727768AbhAEKW5 (ORCPT ); Tue, 5 Jan 2021 05:22:57 -0500 Received: from mail-lf1-x12f.google.com (mail-lf1-x12f.google.com [IPv6:2a00:1450:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35830C061793 for ; Tue, 5 Jan 2021 02:22:17 -0800 (PST) Received: by mail-lf1-x12f.google.com with SMTP id y19so71173290lfa.13 for ; Tue, 05 Jan 2021 02:22:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=5kKMEMOodY6kGWS9Y6O4QuS2Jo/OUIx2VF4zkaRwF7U=; b=ZqSdvv3lE8DyMK9cHX0L0iq0Wmq5P/xIlRNh7tHkh3VxljPNtgbX24RLEwulmUbej6 dPrwUXdP44Ddkyc2JdWPMcRNXsy52EKz6n6S0mpZrNhAepsxOefAv5nQPKe8jlcdS3LL clswXiXzw47LhMECuEeNcw9Sc5ezTWXLgPR1AnmPoUrSK8nVauxMetpnuh16NRVr0CLp O/7YXSm1Rgt6tEazbVx9YORA6+eCAozpMeBSl/uftK1vr7h7Qo8wr2j3PziIfDIc9yEe YMLzrxRUmdjU4LevoafRSFzJD7CHeYpiApveXTR2JXTIXKH7k2MAuKQxgQIITF2hB6Ok G1UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=5kKMEMOodY6kGWS9Y6O4QuS2Jo/OUIx2VF4zkaRwF7U=; b=eoBtcR0HwVnLnEIqxPGhd8PFfxcT0RVH+ZKlsZw7o2knUn4aFm9UeN0FQ+or2c8sN5 jeLFXYdTxgdOWFRTeAqyi5X94E+0gUNmfXiZuMKKdc5RNinHG1zuRdLJQSaX64d2zEf7 xONXU4IDXNbUZ5f7B//rNpCb9MiFtdx/0X75Zb9EAO45FmobuVIZsYhKssiqXAWFBkco Kp28mFKcJZwfJ1WlAzRjQ1wlBUdbNGG5QjkpfWA+K3vvapADFGRA1AOqWcG8rzlksJ0S 3eeDxOFVAXyIW+aCYJcY7a1EgIOR366BolDqnKb9uJ7SgC+Y4N2/WwMa/GaO2kBRUrrC 2lSg== X-Gm-Message-State: AOAM530KyqyJ2vXM+dIBzB9NHQo4ye6/2KNSPbqGxbO91A6p2WHUkedj twH04x2iS39OCXMKqbzcExhx0pHnFzh7uEyBA9M= X-Google-Smtp-Source: ABdhPJzhAICElpCxt0p7fOjh2nlvJdOICyX5VQS3X21zKBfAKZBd916qPWUU1I7BFQmC/BVhFafClLc6oDwR70zNaIE= X-Received: by 2002:a2e:96da:: with SMTP id d26mr35613030ljj.233.1609842135783; Tue, 05 Jan 2021 02:22:15 -0800 (PST) MIME-Version: 1.0 References: <96BB0656-F234-4634-853E-E2A747B6ECDB@redhat.com> In-Reply-To: From: Liang Li Date: Tue, 5 Jan 2021 18:22:03 +0800 Message-ID: Subject: Re: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO To: David Hildenbrand Cc: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , linux-mm , LKML , virtualization@lists.linux-foundation.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > >> That=E2=80=98s mostly already existing scheduling logic, no? (How many= vms can I put onto a specific machine eventually) > > > > It depends on how the scheduling component is designed. Yes, you can pu= t > > 10 VMs with 4C8G(4CPU, 8G RAM) on a host and 20 VMs with 2C4G on > > another one. But if one type of them, e.g. 4C8G are sold out, customers > > can't by more 4C8G VM while there are some free 2C4G VMs, the resource > > reserved for them can be provided as 4C8G VMs > > > > 1. You can, just the startup time will be a little slower? E.g., grow > pre-allocated 4G file to 8G. > > 2. Or let's be creative: teach QEMU to construct a single > RAMBlock/MemoryRegion out of multiple tmpfs files. Works as long as you > don't go crazy on different VM sizes / size differences. > > 3. In your example above, you can dynamically rebalance as VMs are > getting sold, to make sure you always have "big ones" lying around you > can shrink on demand. > Yes, we can always come up with some ways to make things work. it will make the developer of the upper layer component crazy :) > > > > You must know there are a lot of functions in the kernel which can > > be done in userspace. e.g. Some of the device emulations like APIC, > > vhost-net backend which has userspace implementation. :) > > Bad or not depends on the benefits the solution brings. > > From the viewpoint of a user space application, the kernel should > > provide high performance memory management service. That's why > > I think it should be done in the kernel. > > As I expressed a couple of times already, I don't see why using > hugetlbfs and implementing some sort of pre-zeroing there isn't sufficien= t. Did I miss something before? I thought you doubt the need for hugetlbfs free page pre zero out. Hugetlbfs is a good choice and is sufficient. > We really don't *want* complicated things deep down in the mm core if > there are reasonable alternatives. > I understand your concern, we should have sufficient reason to add a new feature to the kernel. And for this one, it's most value is to make the application's life is easier. And implementing it in hugetlbfs can avoid adding more complexity to core MM. I will send out a new revision and drop the part for 'buddy free pages pre zero out'. Thanks for your suggestion! Liang From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91916C433DB for ; Tue, 5 Jan 2021 10:22:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0481222288 for ; Tue, 5 Jan 2021 10:22:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0481222288 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4A1AC6B00EA; Tue, 5 Jan 2021 05:22:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 44FFB6B00EB; Tue, 5 Jan 2021 05:22:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CA3D6B00EC; Tue, 5 Jan 2021 05:22:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0226.hostedemail.com [216.40.44.226]) by kanga.kvack.org (Postfix) with ESMTP id 10B846B00EA for ; Tue, 5 Jan 2021 05:22:18 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C19781EE6 for ; Tue, 5 Jan 2021 10:22:17 +0000 (UTC) X-FDA: 77671331514.22.touch74_1c08a3d274d8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 87F5D18038E60 for ; Tue, 5 Jan 2021 10:22:17 +0000 (UTC) X-HE-Tag: touch74_1c08a3d274d8 X-Filterd-Recvd-Size: 5681 Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Tue, 5 Jan 2021 10:22:17 +0000 (UTC) Received: by mail-lf1-f47.google.com with SMTP id l11so71436991lfg.0 for ; Tue, 05 Jan 2021 02:22:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=5kKMEMOodY6kGWS9Y6O4QuS2Jo/OUIx2VF4zkaRwF7U=; b=ZqSdvv3lE8DyMK9cHX0L0iq0Wmq5P/xIlRNh7tHkh3VxljPNtgbX24RLEwulmUbej6 dPrwUXdP44Ddkyc2JdWPMcRNXsy52EKz6n6S0mpZrNhAepsxOefAv5nQPKe8jlcdS3LL clswXiXzw47LhMECuEeNcw9Sc5ezTWXLgPR1AnmPoUrSK8nVauxMetpnuh16NRVr0CLp O/7YXSm1Rgt6tEazbVx9YORA6+eCAozpMeBSl/uftK1vr7h7Qo8wr2j3PziIfDIc9yEe YMLzrxRUmdjU4LevoafRSFzJD7CHeYpiApveXTR2JXTIXKH7k2MAuKQxgQIITF2hB6Ok G1UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=5kKMEMOodY6kGWS9Y6O4QuS2Jo/OUIx2VF4zkaRwF7U=; b=ULNb1gxtXrHGzm5W2B/mLYpMTL/WtUpufZdGl8w11HqlhSQ+7Zi0crmON8l76eSYFP CHPCSsjsuJc30DmDZwK5cTS03goQBFJZvPG11abTdJC+vUzaU93hQczyvR1wutSRxjRZ RbrJOuNlKldTF/de7U7TSQO5xlTjMkQc561R3Q95H8sYQaVKTzEZ5JGJp+xE0ZTSMFKe 7Dqu33cIAir4X7ehcjtHxsmYBeOZ+5JThsRcsWRAluNnOFzVqYS/pks2/ngRxMzG76D6 TmqKOTvJpIWgOoK5jhbY5ptD8PHdLNQZtYtpaf84/IJOuqW4Bq3gyk2GQy+GkxDRquFm ccXw== X-Gm-Message-State: AOAM530AH7L9KpxRXmVbe/uIkecaItAO6s3ITr/nRwHLzl0rhivzbEn/ ZpEfGrwJtnpLz4dcdLAPiMeO2OqZLvI/rpZj56M= X-Google-Smtp-Source: ABdhPJzhAICElpCxt0p7fOjh2nlvJdOICyX5VQS3X21zKBfAKZBd916qPWUU1I7BFQmC/BVhFafClLc6oDwR70zNaIE= X-Received: by 2002:a2e:96da:: with SMTP id d26mr35613030ljj.233.1609842135783; Tue, 05 Jan 2021 02:22:15 -0800 (PST) MIME-Version: 1.0 References: <96BB0656-F234-4634-853E-E2A747B6ECDB@redhat.com> In-Reply-To: From: Liang Li Date: Tue, 5 Jan 2021 18:22:03 +0800 Message-ID: Subject: Re: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO To: David Hildenbrand Cc: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , linux-mm , LKML , virtualization@lists.linux-foundation.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > >> That=E2=80=98s mostly already existing scheduling logic, no? (How many= vms can I put onto a specific machine eventually) > > > > It depends on how the scheduling component is designed. Yes, you can pu= t > > 10 VMs with 4C8G(4CPU, 8G RAM) on a host and 20 VMs with 2C4G on > > another one. But if one type of them, e.g. 4C8G are sold out, customers > > can't by more 4C8G VM while there are some free 2C4G VMs, the resource > > reserved for them can be provided as 4C8G VMs > > > > 1. You can, just the startup time will be a little slower? E.g., grow > pre-allocated 4G file to 8G. > > 2. Or let's be creative: teach QEMU to construct a single > RAMBlock/MemoryRegion out of multiple tmpfs files. Works as long as you > don't go crazy on different VM sizes / size differences. > > 3. In your example above, you can dynamically rebalance as VMs are > getting sold, to make sure you always have "big ones" lying around you > can shrink on demand. > Yes, we can always come up with some ways to make things work. it will make the developer of the upper layer component crazy :) > > > > You must know there are a lot of functions in the kernel which can > > be done in userspace. e.g. Some of the device emulations like APIC, > > vhost-net backend which has userspace implementation. :) > > Bad or not depends on the benefits the solution brings. > > From the viewpoint of a user space application, the kernel should > > provide high performance memory management service. That's why > > I think it should be done in the kernel. > > As I expressed a couple of times already, I don't see why using > hugetlbfs and implementing some sort of pre-zeroing there isn't sufficien= t. Did I miss something before? I thought you doubt the need for hugetlbfs free page pre zero out. Hugetlbfs is a good choice and is sufficient. > We really don't *want* complicated things deep down in the mm core if > there are reasonable alternatives. > I understand your concern, we should have sufficient reason to add a new feature to the kernel. And for this one, it's most value is to make the application's life is easier. And implementing it in hugetlbfs can avoid adding more complexity to core MM. I will send out a new revision and drop the part for 'buddy free pages pre zero out'. Thanks for your suggestion! Liang