Re: [PATCH 1/1] hw/arm/virt: Support for virtio-mem-pci

From: Gavin Shan <gshan@redhat.com>
To: David Hildenbrand <david@redhat.com>, qemu-arm@nongnu.org
Cc: peter.maydell@linaro.org, drjones@redhat.com,
	Michal Privoznik <mprivozn@redhat.com>,
	richard.henderson@linaro.org, qemu-devel@nongnu.org,
	eric.auger@redhat.com, shan.gavin@gmail.com,
	Jonathan.Cameron@huawei.com, imammedo@redhat.com
Subject: Re: [PATCH 1/1] hw/arm/virt: Support for virtio-mem-pci
Date: Fri, 3 Dec 2021 14:42:38 +1100	[thread overview]
Message-ID: <18aa700f-e540-50f5-423a-717ef39d52d3@redhat.com> (raw)
In-Reply-To: <1ae3f686-aaa9-4402-433e-325f422275df@redhat.com>

On 12/1/21 8:03 PM, David Hildenbrand wrote:
>>>>
>>>>      * It has been passing the tests with various combinations like 64KB
>>>>        and 4KB page sizes on host and guest, different memory device
>>>>        backends like normal, transparent huge page and HugeTLB, plus
>>>>        migration.
>>>
>>> Perfect. A note that hugetlbfs isn't fully supported/safe to use until
>>> we have preallocation support in QEMU (WIP).
>>>
>>
>> Yes, there is some warnings raised to enlarge 'request-size' on
>> host with 64KB page size. Note that the memory backends I used
>> in the testings always have "prealloc=on" property though.
> 
> 1. prealloc=on
> 
> "prealloc=on" on the memory backend won't get the job done, because the first
> thing virtio-mem does is discard all memory in the memory backend again when
> it initializes. So it's an expensive NOP :) See
> 
> https://lkml.kernel.org/r/20211130104136.40927-9-david@redhat.com
> 
> for the virtio-mem "prealloc=on" option that preallocates memory when
> exposing that memory to the VM.
> 
> To use huge pages in a safe way with virtio-mem, we need "reserve=off" on the
> memory backend and "prealloc=on" on the virtio-mem device. (I'm in the process
> of documenting that on virtio-mem.gitlab.io/ to make it clearer)
> 

David, I will reply in a different thread for discussion purpose only. I
need some time for the investigation :)

> 
> 2. Warning on arm64 with 64k
> 
> I assume the warning you're seeing is regarding the block-size:
> 
> "Read unsupported THP size: ..." followed by
> "Could not detect THP size, falling back to ..."
> 
> The right thing to do for now should be to remove that sanity check:
> 
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index d5a578142b..33c32afeb1 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -78,11 +78,8 @@ static uint32_t virtio_mem_thp_size(void)
>       if (g_file_get_contents(HPAGE_PMD_SIZE_PATH, &content, NULL, NULL) &&
>           !qemu_strtou64(content, &endptr, 0, &tmp) &&
>           (!endptr || *endptr == '\n')) {
> -        /*
> -         * Sanity-check the value, if it's too big (e.g., aarch64 with 64k base
> -         * pages) or weird, fallback to something smaller.
> -         */
> -        if (!tmp || !is_power_of_2(tmp) || tmp > 16 * MiB) {
> +        /* Sanity-check the value and fallback to something reasonable.  */
> +        if (!tmp || !is_power_of_2(tmp)) {
>               warn_report("Read unsupported THP size: %" PRIx64, tmp);
>           } else {
>               thp_size = tmp;
> 
> This will not affect setups we care about ( x86-64 KVM ).
> 
> It will imply that with a arm64 64k host, we can only hot(un)plug in
> 512 MiB granularity when not using hugetlb witht he default block-size.
> However, that is already the case with arm64 64k guests as well.
> The suggestion will be to use arm64 4k with virtio-mem in the host and
> the guest for increased flexibility -- fortunately most distros
> already have performed the switch to 4k on arm64, so we don't really
> care IMHO.
> 
> To support block_size < THP size when not using hugetlb,
> we'll have to disable THP (via MADV_NOHUGEPAGE) permanently for the memory
> region, also making sure that e.g., postcopy won't re-enable it by adding
> a proper flag (RAM_NOHUGEPAGE) to the RAMBlock. Because the issue is that
> once the guest touches some memory, we might populate a THP that would cover
> plugged and unplugged memory, which is bad.
> 
> Instead of warning in virtio_mem_device_realize() when
> 	vmem->block_size < virtio_mem_default_block_size(rb)
> we'd have to disable THP.
> 
> 
> Further, we should fixup the default THP size on arm64 in case we're
> running on an older kernel where we cannot sense the THP size:
> 
> 
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index d5a578142b..371cee380a 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -38,13 +38,21 @@
>    */
>   #define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)(1 * MiB))
>   
> +static uint32_t virtio_mem_default_thp_size(void)
> +{
> +#if defined(__aarch64__)
> +    if (qemu_real_host_page_size == 64 * KiB) {
> +        return 512 * MiB;
> +    }
> +#endif
>   #if defined(__x86_64__) || defined(__arm__) || defined(__aarch64__) || \
>       defined(__powerpc64__)
> -#define VIRTIO_MEM_DEFAULT_THP_SIZE ((uint32_t)(2 * MiB))
> +    return 2 * MiB;
>   #else
> -        /* fallback to 1 MiB (e.g., the THP size on s390x) */
> -#define VIRTIO_MEM_DEFAULT_THP_SIZE VIRTIO_MEM_MIN_BLOCK_SIZE
> +    /* fallback to 1 MiB (e.g., the THP size on s390x) */
> +    return VIRTIO_MEM_MIN_BLOCK_SIZE;
>   #endif
> +}
>   
>   /*
>    * We want to have a reasonable default block size such that
> @@ -90,7 +98,7 @@ static uint32_t virtio_mem_thp_size(void)
>       }
>   
>       if (!thp_size) {
> -        thp_size = VIRTIO_MEM_DEFAULT_THP_SIZE;
> +        thp_size = virtio_mem_default_thp_size();
>           warn_report("Could not detect THP size, falling back to %" PRIx64
>                       "  MiB.", thp_size / MiB);
>       }
> 
> 
> 
> In the context of proper arm64 support, adding the above two changes
> should be good enough. If you agree, can you include them in your v2
> series as a separate patch?
> 
> Supporting block_size < thp_size when not using hugetlb is a different
> work IMHO, if someone ever cares about that.
> 
> 

Yeah, It's exactly the warnings I observed and I agree on the changes
except the 16KB-base-page-size case is missed. I've included all the
changes into separate patch in v2, which was just posted.

Thanks,
Gavin