Re: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO

From: Liang Li <liliang324@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Michal Hocko <mhocko@suse.com>,
	Liang Li <liliangleo@didiglobal.com>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	virtualization@lists.linux-foundation.org
Subject: Re: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO
Date: Wed, 23 Dec 2020 20:11:57 +0800	[thread overview]
Message-ID: <CA+2MQi_C-PTqyrqBprhtGBAiDBnPQBzwu6hvyuk+QiKy0L3sHw@mail.gmail.com> (raw)
In-Reply-To: <55052a91-64f9-b343-a1c4-f059ca50ecf3@redhat.com>

On Wed, Dec 23, 2020 at 4:41 PM David Hildenbrand <david@redhat.com> wrote:
>
> [...]
>
> >> I was rather saying that for security it's of little use IMHO.
> >> Application/VM start up time might be improved by using huge pages (and
> >> pre-zeroing these). Free page reporting might be improved by using
> >> MADV_FREE instead of MADV_DONTNEED in the hypervisor.
> >>
> >>> this feature, above all of them, which one is likely to become the
> >>> most strong one?  From the implementation, you will find it is
> >>> configurable, users don't want to use it can turn it off.  This is not
> >>> an option?
> >>
> >> Well, we have to maintain the feature and sacrifice a page flag. For
> >> example, do we expect someone explicitly enabling the feature just to
> >> speed up startup time of an app that consumes a lot of memory? I highly
> >> doubt it.
> >
> > In our production environment, there are three main applications have such
> > requirement, one is QEMU [creating a VM with SR-IOV passthrough device],
> > anther other two are DPDK related applications, DPDK OVS and SPDK vhost,
> > for best performance, they populate memory when starting up. For SPDK vhost,
> > we make use of the VHOST_USER_GET/SET_INFLIGHT_FD feature for
> > vhost 'live' upgrade, which is done by killing the old process and
> > starting a new
> > one with the new binary. In this case, we want the new process started as quick
> > as possible to shorten the service downtime. We really enable this feature
> > to speed up startup time for them  :)
>
> Thanks for info on the use case!
>
> All of these use cases either already use, or could use, huge pages
> IMHO. It's not your ordinary proprietary gaming app :) This is where
> pre-zeroing of huge pages could already help.

You are welcome.  For some historical reason, some of our services are
not using hugetlbfs, that is why I didn't start with hugetlbfs.

> Just wondering, wouldn't it be possible to use tmpfs/hugetlbfs ...
> creating a file and pre-zeroing it from another process, or am I missing
> something important? At least for QEMU this should work AFAIK, where you
> can just pass the file to be use using memory-backend-file.
>
If using another process to create a file, we can offload the overhead to
another process, and there is no need to pre-zeroing it's content, just
populating the memory is enough.
If we do it that way, then how to determine the size of the file? it depends
on the RAM size of the VM the customer buys. Maybe we can create a file
large enough in advance and truncate it to the right size just before the
VM is created. Then, how many large files should be created on a host?
You will find there are a lot of things that have to be handled properly.
I think it's possible to make it work well, but we will transfer the
management complexity to up layer components. It's a bad practice to let
upper layer components process such low level details which should be
handled in the OS layer.

> >
> >> I'd love to hear opinions of other people. (a lot of people are offline
> >> until beginning of January, including, well, actually me :) )
> >
> > OK. I will wait some time for others' feedback. Happy holidays!
>
> To you too, cheers!
>

I have to work at least two months before the vacation. :(

Liang