From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20FBAC433FE for ; Mon, 17 Oct 2022 11:47:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230201AbiJQLrK (ORCPT ); Mon, 17 Oct 2022 07:47:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44850 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229920AbiJQLrI (ORCPT ); Mon, 17 Oct 2022 07:47:08 -0400 Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52B5B3ED60 for ; Mon, 17 Oct 2022 04:47:07 -0700 (PDT) Received: by mail-pf1-x433.google.com with SMTP id i3so10820819pfk.9 for ; Mon, 17 Oct 2022 04:47:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ksfEmUMb7fqe6MKjS5VADZftutcjHhvp3aU5YGY9770=; b=ilshxEz0NM4IUFvrlO+i3NTmAKABpZ1PYdzmiFfnSYGrQbzxtp/QLuGZ3Zhbwedmtu Op7wxHqtoU8S+Cf4rSl2QlHXDZC/wiACHr/hejbCxiy7eAn4O/TOBQKpF6X4ynm1byWl 5/ietLzqATBpk6IfF2izzfzWHAA9uGDKgzmAlJHSbZ144POSu8yOMa8GU26k52JJAhrs L8H/uwXK61DUFxeJiyRdoqW/fIkU+bY7LMakvB/Ws45g6o32RAjj+iV3p42BQ3GIpiLp fmxWKcTll2W4tvAuYGiopLbgrAN7CskfVrGhNdHqJsbfuAqyCHuRmJLXJetgMEAX3BeP EwdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ksfEmUMb7fqe6MKjS5VADZftutcjHhvp3aU5YGY9770=; b=Id8+laYPZ8g8xkuJd48rAS/NVIHF2OPGjleIY2VxXiFXIjmIE+vDworz7v39nIEoph N4aUeG4OeEAW7DGORVF8uPbdoLiwR3NuCuYYLxNDHM7lzqxL8v8M57rfYEBlmSlOcoD3 YlkVJyQdiqGb32bMue5ptMZiTnepb9sESoImVsetiCGs2skKd6AEBZ5PUk09K+bm/psl sztJy57HTgeiruUaln1Q2Pr4YyzZ7zPByM8L7xhX08b2nnXhHqktvc70vMb+2icgaj5q OLLaJcV4GYzmXXT0NSvp/khg2GsEgwHhWoZgo8UdHlk3d5ZHz7p1Y8tT1La31lVoQUzQ nCEw== X-Gm-Message-State: ACrzQf1EExrQDLsZlBNEDQW3kAzD6A7tPKOpv7GEcDOMk88i/p6mdqsc SgbEi4UvnuRh7sdZsjilBPkYmvRlyJBdfV8m3cgWQg== X-Google-Smtp-Source: AMsMyM5/2hMQhvXO1kSvrXbnNa8L+9UJVa78f0Y0yDrWwnFN4J7x4pyTOmjhSPeghQKkJRj+/RqH/KHeTd2YnKU9rdo= X-Received: by 2002:a05:6a00:1946:b0:565:c337:c530 with SMTP id s6-20020a056a00194600b00565c337c530mr12019565pfk.47.1666007226701; Mon, 17 Oct 2022 04:47:06 -0700 (PDT) MIME-Version: 1.0 References: <20221012081526.73067-1-huangjie.albert@bytedance.com> <2aaf2c3a-6e49-abb9-b9c8-19ce87404982@redhat.com> <2f41fc4c-68eb-ab7d-970b-fcb10f474fd4@redhat.com> In-Reply-To: <2f41fc4c-68eb-ab7d-970b-fcb10f474fd4@redhat.com> From: =?UTF-8?B?6buE5p2w?= Date: Mon, 17 Oct 2022 19:46:55 +0800 Message-ID: Subject: Re: [External] Re: [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops To: David Hildenbrand Cc: songmuchun@bytedance.com, Mike Kravetz , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org David Hildenbrand =E4=BA=8E2022=E5=B9=B410=E6=9C=8817=E6= =97=A5=E5=91=A8=E4=B8=80 19:33=E5=86=99=E9=81=93=EF=BC=9A > > On 17.10.22 11:48, =E9=BB=84=E6=9D=B0 wrote: > > David Hildenbrand =E4=BA=8E2022=E5=B9=B410=E6=9C=881= 7=E6=97=A5=E5=91=A8=E4=B8=80 16:44=E5=86=99=E9=81=93=EF=BC=9A > >> > >> On 12.10.22 10:15, Albert Huang wrote: > >>> From: "huangjie.albert" > >>> > >>> implement these two functions so that we can set the mempolicy to > >>> the inode of the hugetlb file. This ensures that the mempolicy of > >>> all processes sharing this huge page file is consistent. > >>> > >>> In some scenarios where huge pages are shared: > >>> if we need to limit the memory usage of vm within node0, so I set qem= u's > >>> mempilciy bind to node0, but if there is a process (such as virtiofsd= ) > >>> shared memory with the vm, in this case. If the page fault is trigger= ed > >>> by virtiofsd, the allocated memory may go to node1 which depends on > >>> virtiofsd. > >>> > >> > >> Any VM that uses hugetlb should be preallocating memory. For example, > >> this is the expected default under QEMU when using huge pages. > >> > >> Once preallocation does the right thing regarding NUMA policy, there i= s > >> no need to worry about it in other sub-processes. > >> > > > > Hi, David > > thanks for your reminder > > > > Yes, you are absolutely right, However, the pre-allocation mechanism > > does solve this problem. > > However, some scenarios do not like to use the pre-allocation mechanism= , such as > > scenarios that are sensitive to virtual machine startup time, or > > scenarios that require > > high memory utilization. The on-demand allocation mechanism may be bett= er, > > so the key point is to find a way support for shared policy=E3=80=82 > > Using hugetlb -- with a fixed pool size -- without preallocation is like > playing with fire. Hugetlb reservation makes one believe that on-demand > allocation is going to work, but there are various scenarios where that > can go seriously wrong, and you can run out of huge pages. > > If you're using hugetlb as memory backend for a VM without > preallocation, you really have to be very careful. I can only advise > against doing that. > > > Also: why does another process read/write *first* to a guest physical > memory location before the OS running inside the VM even initialized > that memory? That sounds very wrong. What am I missing? > for example : virtio ring buffer. For the avial descriptor, the guest kernel only gives an address to the backend, and does not actually access the memory. Thanks. > -- > Thanks, > > David / dhildenb >