All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: qemu-devel@nongnu.org
Cc: Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
	James Houghton <jthoughton@google.com>,
	Juan Quintela <quintela@redhat.com>,
	peterx@redhat.com,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>
Subject: [PATCH RFC 00/21] migration: Support hugetlb doublemaps
Date: Tue, 17 Jan 2023 17:08:53 -0500	[thread overview]
Message-ID: <20230117220914.2062125-1-peterx@redhat.com> (raw)

Based-on: <20221213213850.1481858-1-peterx@redhat.com>
  [PATCH 0/5] migration: Fix disorder of channel creations

Trees for reference:
  https://github.com/xzpeter/linux/releases/tag/doublemap-v0.1
  https://github.com/xzpeter/qemu/releases/tag/doublemap-v0.1

This is an RFC series that only for early discussion purpose but not for
merging.

This patchset allows postcopy to work with huge pages better by migrating
huge pages in small page sizes.  It relies on a kernel feature called
"hugetlb HGM" which is currently proposed on the Linux kernel mailing list
by James Houghton, latest version v1:

https://lore.kernel.org/r/20230105101844.1893104-1-jthoughton@google.com

[PS: The kernel v1 patchset may need a few fixups to make QEMU work, which
 are all contained in the tree link provided tagged doublemap-v0.1 above]

The kernel series is still during review upstream, so the API is still not
stable.

I kept the old name of "doublemap" in this QEMU patchset to represent HGM.
With that, huge pages can be mapped with even smaller sizes than the huge
page itself.  It can drastically reduce page fault latencies during
postcopy if the guest has hugepage backed memories and make postcopy start
working with huge pages.  The average page request latency can drop from
~1sec to ~250us for 1G backed in the initial test results.

UFFDIO_COPY doesn't support mapping huge pages in small sizes, so one major
part of this series introduced UFFDIO_CONTINUE to resolve page faults for
hugetlb mappings.

Sampled page latency histogram for 18G guest with/without doublemap
(preempt=on, single thread busy spin workload over 18G map):

Before:

@delay_us:
[64, 128)              3 |@                                                   |
[128, 256)            84 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[256, 512)            10 |@@@@@@                                              |
[512, 1K)              1 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)              0 |                                                    |
[16K, 32K)             0 |                                                    |
[32K, 64K)             0 |                                                    |
[64K, 128K)            0 |                                                    |
[128K, 256K)           0 |                                                    |
[256K, 512K)           0 |                                                    |
[512K, 1M)             0 |                                                    |
[1M, 2M)              17 |@@@@@@@@@@                                          |
[2M, 4M)              21 |@@@@@@@@@@@@@                                       |
[4M, 8M)               8 |@@@@                                                |
[8M, 16M)              4 |@@                                                  |

After:

@delay_us:
[16, 32)               6 |                                                    |
[32, 64)               6 |                                                    |
[64, 128)           3117 |@@                                                  |
[128, 256)         70815 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[256, 512)         30460 |@@@@@@@@@@@@@@@@@@@@@@                              |
[512, 1K)           1135 |                                                    |
[1K, 2K)              34 |                                                    |
[2K, 4K)              42 |                                                    |
[4K, 8K)             126 |                                                    |
[8K, 16K)             91 |                                                    |
[16K, 32K)             0 |                                                    |
[32K, 64K)             1 |                                                    |

Any early comment welcomed.  Thanks.

Peter Xu (21):
  update linux headers
  util: Include osdep.h first in util/mmap-alloc.c
  physmem: Add qemu_ram_is_hugetlb()
  madvise: Include linux/mman.h under linux-headers/
  madvise: Add QEMU_MADV_SPLIT
  madvise: Add QEMU_MADV_COLLAPSE
  ramblock: Cache file offset for file-backed ramblocks
  ramblock: Cache the length to do file mmap() on ramblocks
  ramblock: Add RAM_READONLY
  ramblock: Add ramblock_file_map()
  migration: Add hugetlb-doublemap cap
  migration: Introduce page size for-migration-only
  migration: Add migration_ram_pagesize_largest()
  migration: Map hugetlbfs ramblocks twice, and pre-allocate
  migration: Teach qemu about minor faults and doublemap
  migration: Enable doublemap with MADV_SPLIT
  migration: Rework ram discard logic for hugetlb double-map
  migration: Allow postcopy_register_shared_ufd() to fail
  migration: Add postcopy_mark_received()
  migration: Handle page faults using UFFDIO_CONTINUE
  migration: Collapse huge pages again after postcopy finished

 backends/hostmem-file.c                       |   3 +-
 hw/virtio/vhost-user.c                        |   9 +-
 include/exec/cpu-common.h                     |   3 +-
 include/exec/memory.h                         |   4 +-
 include/exec/ram_addr.h                       |   6 +-
 include/exec/ramblock.h                       |  14 +
 include/qemu/madvise.h                        |  18 ++
 include/standard-headers/drm/drm_fourcc.h     |  63 +++-
 include/standard-headers/linux/ethtool.h      |  81 ++++-
 include/standard-headers/linux/fuse.h         |  20 +-
 .../linux/input-event-codes.h                 |   4 +
 include/standard-headers/linux/pci_regs.h     |   2 +
 include/standard-headers/linux/virtio_blk.h   |  19 ++
 include/standard-headers/linux/virtio_bt.h    |   8 +
 include/standard-headers/linux/virtio_net.h   |   4 +
 linux-headers/asm-arm64/kvm.h                 |   1 +
 linux-headers/asm-generic/hugetlb_encode.h    |  26 +-
 linux-headers/asm-generic/mman-common.h       |   4 +
 linux-headers/asm-mips/mman.h                 |   4 +
 linux-headers/asm-riscv/kvm.h                 |   7 +
 linux-headers/asm-x86/kvm.h                   |  11 +-
 linux-headers/linux/kvm.h                     |  32 +-
 linux-headers/linux/psci.h                    |  14 +
 linux-headers/linux/userfaultfd.h             |   4 +
 linux-headers/linux/vfio.h                    | 278 +++++++++++++++++-
 migration/migration.c                         |  56 +++-
 migration/migration.h                         |   1 +
 migration/postcopy-ram.c                      | 228 +++++++++++---
 migration/postcopy-ram.h                      |   5 +-
 migration/ram.c                               | 165 ++++++++++-
 migration/ram.h                               |   2 +
 migration/trace-events                        |   6 +-
 qapi/migration.json                           |   7 +-
 softmmu/memory.c                              |   8 +-
 softmmu/physmem.c                             |  92 ++++--
 util/mmap-alloc.c                             |   2 +-
 36 files changed, 1051 insertions(+), 160 deletions(-)

-- 
2.37.3



             reply	other threads:[~2023-01-17 22:09 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-17 22:08 Peter Xu [this message]
2023-01-17 22:08 ` [PATCH RFC 01/21] update linux headers Peter Xu
2023-01-17 22:08 ` [PATCH RFC 02/21] util: Include osdep.h first in util/mmap-alloc.c Peter Xu
2023-01-18 12:00   ` Dr. David Alan Gilbert
2023-01-25  0:19   ` Philippe Mathieu-Daudé
2023-01-30  4:57   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 03/21] physmem: Add qemu_ram_is_hugetlb() Peter Xu
2023-01-18 12:02   ` Dr. David Alan Gilbert
2023-01-30  5:00   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 04/21] madvise: Include linux/mman.h under linux-headers/ Peter Xu
2023-01-18 12:08   ` Dr. David Alan Gilbert
2023-01-30  5:01   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 05/21] madvise: Add QEMU_MADV_SPLIT Peter Xu
2023-01-30  5:01   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 06/21] madvise: Add QEMU_MADV_COLLAPSE Peter Xu
2023-01-18 18:51   ` Dr. David Alan Gilbert
2023-01-18 20:21     ` Peter Xu
2023-01-30  5:02   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 07/21] ramblock: Cache file offset for file-backed ramblocks Peter Xu
2023-01-30  5:02   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 08/21] ramblock: Cache the length to do file mmap() on ramblocks Peter Xu
2023-01-23 18:51   ` Dr. David Alan Gilbert
2023-01-24 20:28     ` Peter Xu
2023-01-30  5:05   ` Juan Quintela
2023-01-30 22:07     ` Peter Xu
2023-01-17 22:09 ` [PATCH RFC 09/21] ramblock: Add RAM_READONLY Peter Xu
2023-01-23 19:42   ` Dr. David Alan Gilbert
2023-01-30  5:06   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 10/21] ramblock: Add ramblock_file_map() Peter Xu
2023-01-24 10:06   ` Dr. David Alan Gilbert
2023-01-24 20:47     ` Peter Xu
2023-01-25  9:24       ` Dr. David Alan Gilbert
2023-01-25 14:46         ` Peter Xu
2023-01-30  5:09   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 11/21] migration: Add hugetlb-doublemap cap Peter Xu
2023-01-24 12:45   ` Dr. David Alan Gilbert
2023-01-24 21:15     ` Peter Xu
2023-01-30  5:13   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 12/21] migration: Introduce page size for-migration-only Peter Xu
2023-01-24 13:20   ` Dr. David Alan Gilbert
2023-01-24 21:36     ` Peter Xu
2023-01-24 22:03       ` Peter Xu
2023-01-30  5:17   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 13/21] migration: Add migration_ram_pagesize_largest() Peter Xu
2023-01-24 17:34   ` Dr. David Alan Gilbert
2023-01-30  5:19   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 14/21] migration: Map hugetlbfs ramblocks twice, and pre-allocate Peter Xu
2023-01-25 14:25   ` Dr. David Alan Gilbert
2023-01-30  5:24   ` Juan Quintela
2023-01-30 22:35     ` Peter Xu
2023-02-01 18:53       ` Juan Quintela
2023-02-06 21:40         ` Peter Xu
2023-01-17 22:09 ` [PATCH RFC 15/21] migration: Teach qemu about minor faults and doublemap Peter Xu
2023-01-30  5:45   ` Juan Quintela
2023-01-30 22:50     ` Peter Xu
2023-02-01 18:55       ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 16/21] migration: Enable doublemap with MADV_SPLIT Peter Xu
2023-02-01 18:59   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 17/21] migration: Rework ram discard logic for hugetlb double-map Peter Xu
2023-02-01 19:03   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 18/21] migration: Allow postcopy_register_shared_ufd() to fail Peter Xu
2023-02-01 19:09   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 19/21] migration: Add postcopy_mark_received() Peter Xu
2023-02-01 19:10   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 20/21] migration: Handle page faults using UFFDIO_CONTINUE Peter Xu
2023-02-01 19:24   ` Juan Quintela
2023-02-01 19:52     ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 21/21] migration: Collapse huge pages again after postcopy finished Peter Xu
2023-02-01 19:49   ` Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230117220914.2062125-1-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jthoughton@google.com \
    --cc=lsoaresp@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.