[PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation
@ 2022-11-15 11:15 Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 01/17] Initialize the return value in kvm__for_each_mem_bank() Fuad Tabba
                   ` (16 more replies)
  0 siblings, 17 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

This patch series moves kvmtool from allocating guest vm memory
using anonymous mmap to using memfd/ftruncate. The main
motivation is to ease the transition to the fd-based kvm guest
memory proposal [*]. It also facilitates using ipc memory sharing
should that be needed in the future. Moreover, it removes the
need for using temporary files if the memory is backed by
hugetlbfs.

In the process of this rework, this patch series fixes a bug
(uninitalized return value). It also adds a memory allocation
function that allocates aligned memory without the need to
over-map/allocate. This facilitates refactoring, which simplifies
the code and removes a lot of the arch-specific code.

Cheers,
/fuad

[*] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/

Fuad Tabba (17):
  Initialize the return value in kvm__for_each_mem_bank()
  Make mmap_hugetlbfs() static
  Rename parameter in mmap_anon_or_hugetlbfs()
  Add hostmem va to debug print
  Factor out getting the hugetlb block size
  Use memfd for hugetlbfs when allocating guest ram
  Make blk_size a parameter and pass it to mmap_hugetlbfs()
  Use memfd for all guest ram allocations
  Allocate pvtime memory with memfd
  Allocate vesa memory with memfd
  Add a function that allocates aligned memory if specified
  Use new function to align memory
  Remove struct fields and code used for alignment
  Replace kvm_arch_delete_ram with kvm_delete_ram
  Remove no-longer unused macro
  Factor out set_user_memory_region code
  Pass the memory file descriptor and offset when registering ram

 arm/aarch64/pvtime.c              |  20 ++++-
 arm/include/arm-common/kvm-arch.h |   7 --
 arm/kvm.c                         |  35 +++------
 framebuffer.c                     |   2 +
 hw/cfi_flash.c                    |   4 +-
 hw/vesa.c                         |  17 ++++-
 include/kvm/framebuffer.h         |   1 +
 include/kvm/kvm.h                 |  19 ++---
 include/kvm/util.h                |   7 +-
 kvm.c                             |  69 ++++++++++-------
 mips/kvm.c                        |  11 +--
 powerpc/kvm.c                     |   7 +-
 riscv/include/kvm/kvm-arch.h      |   7 --
 riscv/kvm.c                       |  26 ++-----
 util/util.c                       | 119 +++++++++++++++++++++++-------
 vfio/core.c                       |   3 +-
 x86/kvm.c                         |  11 +--
 17 files changed, 209 insertions(+), 156 deletions(-)


base-commit: e17d182ad3f797f01947fc234d95c96c050c534b
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 01/17] Initialize the return value in kvm__for_each_mem_bank()
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-15 11:59   ` Andre Przywara
  2022-11-23 16:08   ` Alexandru Elisei
  2022-11-15 11:15 ` [PATCH kvmtool v1 02/17] Make mmap_hugetlbfs() static Fuad Tabba
                   ` (15 subsequent siblings)
  16 siblings, 2 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

If none of the bank types match, the function would return an
uninitialized value.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kvm.c b/kvm.c
index 42b8812..78bc0d8 100644
--- a/kvm.c
+++ b/kvm.c
@@ -387,7 +387,7 @@ int kvm__for_each_mem_bank(struct kvm *kvm, enum kvm_mem_type type,
 			   int (*fun)(struct kvm *kvm, struct kvm_mem_bank *bank, void *data),
 			   void *data)
 {
-	int ret;
+	int ret = 0;
 	struct kvm_mem_bank *bank;
 
 	list_for_each_entry(bank, &kvm->mem_banks, list) {
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 02/17] Make mmap_hugetlbfs() static
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 01/17] Initialize the return value in kvm__for_each_mem_bank() Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-15 17:58   ` Andre Przywara
  2022-11-15 11:15 ` [PATCH kvmtool v1 03/17] Rename parameter in mmap_anon_or_hugetlbfs() Fuad Tabba
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

This function isn't used outside of util.c.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/kvm/util.h | 1 -
 util/util.c        | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/kvm/util.h b/include/kvm/util.h
index b494548..b0c3684 100644
--- a/include/kvm/util.h
+++ b/include/kvm/util.h
@@ -140,7 +140,6 @@ static inline int pow2_size(unsigned long x)
 }
 
 struct kvm;
-void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
 void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size);
 
 #endif /* KVM__UTIL_H */
diff --git a/util/util.c b/util/util.c
index 1877105..093bd3b 100644
--- a/util/util.c
+++ b/util/util.c
@@ -81,7 +81,7 @@ void die_perror(const char *s)
 	exit(1);
 }
 
-void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
+static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
 {
 	char mpath[PATH_MAX];
 	int fd;
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 03/17] Rename parameter in mmap_anon_or_hugetlbfs()
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 01/17] Initialize the return value in kvm__for_each_mem_bank() Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 02/17] Make mmap_hugetlbfs() static Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-23 16:40   ` Alexandru Elisei
  2022-11-15 11:15 ` [PATCH kvmtool v1 04/17] Add hostmem va to debug print Fuad Tabba
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

For consistency with other similar functions in the same file and
for brevity.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/kvm/util.h | 2 +-
 util/util.c        | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/kvm/util.h b/include/kvm/util.h
index b0c3684..61a205b 100644
--- a/include/kvm/util.h
+++ b/include/kvm/util.h
@@ -140,6 +140,6 @@ static inline int pow2_size(unsigned long x)
 }
 
 struct kvm;
-void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size);
+void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
 
 #endif /* KVM__UTIL_H */
diff --git a/util/util.c b/util/util.c
index 093bd3b..22b64b6 100644
--- a/util/util.c
+++ b/util/util.c
@@ -118,14 +118,14 @@ static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
 }
 
 /* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
-void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size)
+void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
 {
-	if (hugetlbfs_path)
+	if (htlbfs_path)
 		/*
 		 * We don't /need/ to map guest RAM from hugetlbfs, but we do so
 		 * if the user specifies a hugetlbfs path.
 		 */
-		return mmap_hugetlbfs(kvm, hugetlbfs_path, size);
+		return mmap_hugetlbfs(kvm, htlbfs_path, size);
 	else {
 		kvm->ram_pagesize = getpagesize();
 		return mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 04/17] Add hostmem va to debug print
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (2 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 03/17] Rename parameter in mmap_anon_or_hugetlbfs() Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 05/17] Factor out getting the hugetlb block size Fuad Tabba
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

Useful when debugging.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arm/kvm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arm/kvm.c b/arm/kvm.c
index d51cc15..c84983e 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -66,8 +66,8 @@ void kvm__init_ram(struct kvm *kvm)
 
 	kvm->arch.memory_guest_start = phys_start;
 
-	pr_debug("RAM created at 0x%llx - 0x%llx",
-		 phys_start, phys_start + phys_size - 1);
+	pr_debug("RAM created at 0x%llx - 0x%llx (host_mem 0x%llx)",
+		 phys_start, phys_start + phys_size - 1, (u64)host_mem);
 }
 
 void kvm__arch_delete_ram(struct kvm *kvm)
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 05/17] Factor out getting the hugetlb block size
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (3 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 04/17] Add hostmem va to debug print Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 06/17] Use memfd for hugetlbfs when allocating guest ram Fuad Tabba
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

This functionality will be needed separately in a future patch.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 util/util.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/util/util.c b/util/util.c
index 22b64b6..e6c0951 100644
--- a/util/util.c
+++ b/util/util.c
@@ -81,13 +81,9 @@ void die_perror(const char *s)
 	exit(1);
 }
 
-static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
+static u64 get_hugepage_blk_size(const char *htlbfs_path)
 {
-	char mpath[PATH_MAX];
-	int fd;
 	struct statfs sfs;
-	void *addr;
-	unsigned long blk_size;
 
 	if (statfs(htlbfs_path, &sfs) < 0)
 		die("Can't stat %s\n", htlbfs_path);
@@ -95,10 +91,20 @@ static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
 	if ((unsigned int)sfs.f_type != HUGETLBFS_MAGIC)
 		die("%s is not hugetlbfs!\n", htlbfs_path);
 
-	blk_size = (unsigned long)sfs.f_bsize;
-	if (sfs.f_bsize == 0 || blk_size > size) {
-		die("Can't use hugetlbfs pagesize %ld for mem size %lld\n",
-			blk_size, (unsigned long long)size);
+	return sfs.f_bsize;
+}
+
+static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
+{
+	char mpath[PATH_MAX];
+	int fd;
+	void *addr;
+	u64 blk_size;
+
+	blk_size = get_hugepage_blk_size(htlbfs_path);
+	if (blk_size == 0 || blk_size > size) {
+		die("Can't use hugetlbfs pagesize %lld for mem size %lld\n",
+			(unsigned long long)blk_size, (unsigned long long)size);
 	}
 
 	kvm->ram_pagesize = blk_size;
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 06/17] Use memfd for hugetlbfs when allocating guest ram
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (4 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 05/17] Factor out getting the hugetlb block size Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-24 10:19   ` Alexandru Elisei
  2022-11-15 11:15 ` [PATCH kvmtool v1 07/17] Make blk_size a parameter and pass it to mmap_hugetlbfs() Fuad Tabba
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

This removes the need of using a temporary file for the fd.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 util/util.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/util/util.c b/util/util.c
index e6c0951..d6ceb5d 100644
--- a/util/util.c
+++ b/util/util.c
@@ -10,6 +10,14 @@
 #include <sys/stat.h>
 #include <sys/statfs.h>
 
+#ifndef MFD_HUGETLB
+#define MFD_HUGETLB	0x0004U
+#endif
+
+#ifndef MFD_HUGE_SHIFT
+#define MFD_HUGE_SHIFT	26
+#endif
+
 static void report(const char *prefix, const char *err, va_list params)
 {
 	char msg[1024];
@@ -96,10 +104,12 @@ static u64 get_hugepage_blk_size(const char *htlbfs_path)
 
 static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
 {
-	char mpath[PATH_MAX];
+	const char *name = "kvmtool";
+	unsigned int flags = 0;
 	int fd;
 	void *addr;
 	u64 blk_size;
+	int htsize;
 
 	blk_size = get_hugepage_blk_size(htlbfs_path);
 	if (blk_size == 0 || blk_size > size) {
@@ -107,13 +117,18 @@ static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
 			(unsigned long long)blk_size, (unsigned long long)size);
 	}
 
+	htsize = __builtin_ctzl(blk_size);
+	if ((1ULL << htsize) != blk_size)
+		die("Hugepage size must be a power of 2.\n");
+
+	flags |= MFD_HUGETLB;
+	flags |= htsize << MFD_HUGE_SHIFT;
+
 	kvm->ram_pagesize = blk_size;
 
-	snprintf(mpath, PATH_MAX, "%s/kvmtoolXXXXXX", htlbfs_path);
-	fd = mkstemp(mpath);
+	fd = memfd_create(name, flags);
 	if (fd < 0)
-		die("Can't open %s for hugetlbfs map\n", mpath);
-	unlink(mpath);
+		die("Can't memfd_create for hugetlbfs map\n");
 	if (ftruncate(fd, size) < 0)
 		die("Can't ftruncate for mem mapping size %lld\n",
 			(unsigned long long)size);
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 07/17] Make blk_size a parameter and pass it to mmap_hugetlbfs()
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (5 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 06/17] Use memfd for hugetlbfs when allocating guest ram Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations Fuad Tabba
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

This is the first step of making this function more generic.

The main purpose of this patch is to make it easier to review the
next one.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 util/util.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/util/util.c b/util/util.c
index d6ceb5d..d3483d8 100644
--- a/util/util.c
+++ b/util/util.c
@@ -102,30 +102,20 @@ static u64 get_hugepage_blk_size(const char *htlbfs_path)
 	return sfs.f_bsize;
 }
 
-static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
+static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size, u64 blk_size)
 {
 	const char *name = "kvmtool";
 	unsigned int flags = 0;
 	int fd;
 	void *addr;
-	u64 blk_size;
-	int htsize;
+	int htsize = __builtin_ctzl(blk_size);
 
-	blk_size = get_hugepage_blk_size(htlbfs_path);
-	if (blk_size == 0 || blk_size > size) {
-		die("Can't use hugetlbfs pagesize %lld for mem size %lld\n",
-			(unsigned long long)blk_size, (unsigned long long)size);
-	}
-
-	htsize = __builtin_ctzl(blk_size);
 	if ((1ULL << htsize) != blk_size)
 		die("Hugepage size must be a power of 2.\n");
 
 	flags |= MFD_HUGETLB;
 	flags |= htsize << MFD_HUGE_SHIFT;
 
-	kvm->ram_pagesize = blk_size;
-
 	fd = memfd_create(name, flags);
 	if (fd < 0)
 		die("Can't memfd_create for hugetlbfs map\n");
@@ -141,13 +131,23 @@ static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
 /* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
 void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
 {
-	if (htlbfs_path)
-		/*
-		 * We don't /need/ to map guest RAM from hugetlbfs, but we do so
-		 * if the user specifies a hugetlbfs path.
-		 */
-		return mmap_hugetlbfs(kvm, htlbfs_path, size);
-	else {
+	u64 blk_size = 0;
+
+	/*
+	 * We don't /need/ to map guest RAM from hugetlbfs, but we do so
+	 * if the user specifies a hugetlbfs path.
+	 */
+	if (htlbfs_path) {
+		blk_size = get_hugepage_blk_size(htlbfs_path);
+
+		if (blk_size == 0 || blk_size > size) {
+			die("Can't use hugetlbfs pagesize %lld for mem size %lld\n",
+				(unsigned long long)blk_size, (unsigned long long)size);
+		}
+
+		kvm->ram_pagesize = blk_size;
+		return mmap_hugetlbfs(kvm, htlbfs_path, size, blk_size);
+	} else {
 		kvm->ram_pagesize = getpagesize();
 		return mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
 	}
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (6 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 07/17] Make blk_size a parameter and pass it to mmap_hugetlbfs() Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-24 11:01   ` Alexandru Elisei
  2022-11-15 11:15 ` [PATCH kvmtool v1 09/17] Allocate pvtime memory with memfd Fuad Tabba
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

Allocate all guest ram backed by memfd/ftruncate instead of
anonymous mmap. This will make it easier to use kvm with fd-based
kvm guest memory proposals [*]. It also would make it easier to
use ipc memory sharing should that be needed in the future.

Signed-off-by: Fuad Tabba <tabba@google.com>

[*] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
---
 include/kvm/kvm.h  |  1 +
 include/kvm/util.h |  3 +++
 kvm.c              |  4 ++++
 util/util.c        | 33 ++++++++++++++++++++-------------
 4 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 3872dc6..d0d519b 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -87,6 +87,7 @@ struct kvm {
 	struct kvm_config	cfg;
 	int			sys_fd;		/* For system ioctls(), i.e. /dev/kvm */
 	int			vm_fd;		/* For VM ioctls() */
+	int			ram_fd;		/* For guest memory. */
 	timer_t			timerid;	/* Posix timer for interrupts */
 
 	int			nrcpus;		/* Number of cpus to run */
diff --git a/include/kvm/util.h b/include/kvm/util.h
index 61a205b..369603b 100644
--- a/include/kvm/util.h
+++ b/include/kvm/util.h
@@ -140,6 +140,9 @@ static inline int pow2_size(unsigned long x)
 }
 
 struct kvm;
+int memfd_alloc(u64 size, bool hugetlb, u64 blk_size);
+void *mmap_anon_or_hugetlbfs_align(struct kvm *kvm, const char *htlbfs_path,
+				   u64 size, u64 align);
 void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
 
 #endif /* KVM__UTIL_H */
diff --git a/kvm.c b/kvm.c
index 78bc0d8..ed29d68 100644
--- a/kvm.c
+++ b/kvm.c
@@ -160,6 +160,7 @@ struct kvm *kvm__new(void)
 	mutex_init(&kvm->mem_banks_lock);
 	kvm->sys_fd = -1;
 	kvm->vm_fd = -1;
+	kvm->ram_fd = -1;
 
 #ifdef KVM_BRLOCK_DEBUG
 	kvm->brlock_sem = (pthread_rwlock_t) PTHREAD_RWLOCK_INITIALIZER;
@@ -174,6 +175,9 @@ int kvm__exit(struct kvm *kvm)
 
 	kvm__arch_delete_ram(kvm);
 
+	if (kvm->ram_fd >= 0)
+		close(kvm->ram_fd);
+
 	list_for_each_entry_safe(bank, tmp, &kvm->mem_banks, list) {
 		list_del(&bank->list);
 		free(bank);
diff --git a/util/util.c b/util/util.c
index d3483d8..278bcc2 100644
--- a/util/util.c
+++ b/util/util.c
@@ -102,36 +102,38 @@ static u64 get_hugepage_blk_size(const char *htlbfs_path)
 	return sfs.f_bsize;
 }
 
-static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size, u64 blk_size)
+int memfd_alloc(u64 size, bool hugetlb, u64 blk_size)
 {
 	const char *name = "kvmtool";
 	unsigned int flags = 0;
 	int fd;
-	void *addr;
-	int htsize = __builtin_ctzl(blk_size);
 
-	if ((1ULL << htsize) != blk_size)
-		die("Hugepage size must be a power of 2.\n");
+	if (hugetlb) {
+		int htsize = __builtin_ctzl(blk_size);
 
-	flags |= MFD_HUGETLB;
-	flags |= htsize << MFD_HUGE_SHIFT;
+		if ((1ULL << htsize) != blk_size)
+			die("Hugepage size must be a power of 2.\n");
+
+		flags |= MFD_HUGETLB;
+		flags |= htsize << MFD_HUGE_SHIFT;
+	}
 
 	fd = memfd_create(name, flags);
 	if (fd < 0)
-		die("Can't memfd_create for hugetlbfs map\n");
+		die("Can't memfd_create for memory map\n");
+
 	if (ftruncate(fd, size) < 0)
 		die("Can't ftruncate for mem mapping size %lld\n",
 			(unsigned long long)size);
-	addr = mmap(NULL, size, PROT_RW, MAP_PRIVATE, fd, 0);
-	close(fd);
 
-	return addr;
+	return fd;
 }
 
 /* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
 void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
 {
 	u64 blk_size = 0;
+	int fd;
 
 	/*
 	 * We don't /need/ to map guest RAM from hugetlbfs, but we do so
@@ -146,9 +148,14 @@ void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
 		}
 
 		kvm->ram_pagesize = blk_size;
-		return mmap_hugetlbfs(kvm, htlbfs_path, size, blk_size);
 	} else {
 		kvm->ram_pagesize = getpagesize();
-		return mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
 	}
+
+	fd = memfd_alloc(size, htlbfs_path, blk_size);
+	if (fd < 0)
+		return MAP_FAILED;
+
+	kvm->ram_fd = fd;
+	return mmap(NULL, size, PROT_RW, MAP_PRIVATE, kvm->ram_fd, 0);
 }
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 09/17] Allocate pvtime memory with memfd
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (7 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 10/17] Allocate vesa " Fuad Tabba
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

Ensure that all guest memory is fd-based.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arm/aarch64/pvtime.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/arm/aarch64/pvtime.c b/arm/aarch64/pvtime.c
index 2933ac7..a7ba03e 100644
--- a/arm/aarch64/pvtime.c
+++ b/arm/aarch64/pvtime.c
@@ -8,25 +8,35 @@
 #define ARM_PVTIME_STRUCT_SIZE		(64)
 
 static void *usr_mem;
+int user_mem_fd = -1;
 
 static int pvtime__alloc_region(struct kvm *kvm)
 {
 	char *mem;
+	int mem_fd;
 	int ret = 0;
 
-	mem = mmap(NULL, ARM_PVTIME_SIZE, PROT_RW,
-		   MAP_ANON_NORESERVE, -1, 0);
-	if (mem == MAP_FAILED)
+	mem_fd = memfd_alloc(ARM_PVTIME_SIZE, false, 0);
+	if (mem_fd < 0)
 		return -errno;
 
+	mem = mmap(NULL, ARM_PVTIME_SIZE, PROT_RW, MAP_PRIVATE, mem_fd, 0);
+	if (mem == MAP_FAILED) {
+		ret = -errno;
+		close(mem_fd);
+		return ret;
+	}
+
 	ret = kvm__register_ram(kvm, ARM_PVTIME_BASE,
 				ARM_PVTIME_SIZE, mem);
 	if (ret) {
 		munmap(mem, ARM_PVTIME_SIZE);
+		close(mem_fd);
 		return ret;
 	}
 
 	usr_mem = mem;
+	user_mem_fd = mem_fd;
 	return ret;
 }
 
@@ -38,7 +48,9 @@ static int pvtime__teardown_region(struct kvm *kvm)
 	kvm__destroy_mem(kvm, ARM_PVTIME_BASE,
 			 ARM_PVTIME_SIZE, usr_mem);
 	munmap(usr_mem, ARM_PVTIME_SIZE);
+	close(user_mem_fd);
 	usr_mem = NULL;
+	user_mem_fd = -1;
 	return 0;
 }
 
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 10/17] Allocate vesa memory with memfd
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (8 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 09/17] Allocate pvtime memory with memfd Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 11/17] Add a function that allocates aligned memory if specified Fuad Tabba
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

Ensure that all guest memory is fd-based.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 framebuffer.c             |  2 ++
 hw/vesa.c                 | 15 +++++++++++++--
 include/kvm/framebuffer.h |  1 +
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/framebuffer.c b/framebuffer.c
index fb8f51d..a025293 100644
--- a/framebuffer.c
+++ b/framebuffer.c
@@ -73,6 +73,8 @@ int fb__exit(struct kvm *kvm)
 				fb->targets[i]->stop(fb);
 
 		munmap(fb->mem, fb->mem_size);
+		if (fb->mem_fd >= 0)
+			close(fb->mem_fd);
 	}
 
 	return 0;
diff --git a/hw/vesa.c b/hw/vesa.c
index 7f82cdb..522ffa3 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -41,6 +41,7 @@ static struct framebuffer vesafb = {
 	.depth		= VESA_BPP,
 	.mem_addr	= VESA_MEM_ADDR,
 	.mem_size	= VESA_MEM_SIZE,
+	.mem_fd		= -1,
 };
 
 static void vesa_pci_io(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len,
@@ -66,6 +67,7 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 {
 	u16 vesa_base_addr;
 	char *mem;
+	int mem_fd;
 	int r;
 
 	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
@@ -88,22 +90,31 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 	if (r < 0)
 		goto unregister_ioport;
 
-	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
-	if (mem == MAP_FAILED) {
+	mem_fd = memfd_alloc(ARM_PVTIME_SIZE, false, 0, 0);
+	if (mem_fd < 0) {
 		r = -errno;
 		goto unregister_device;
 	}
 
+	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_PRIVATE, mem_fd, 0);
+	if (mem == MAP_FAILED) {
+		r = -errno;
+		goto close_memfd;
+	}
+
 	r = kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
 	if (r < 0)
 		goto unmap_dev;
 
 	vesafb.mem = mem;
+	vesafb.mem_fd = mem_fd;
 	vesafb.kvm = kvm;
 	return fb__register(&vesafb);
 
 unmap_dev:
 	munmap(mem, VESA_MEM_SIZE);
+close_memfd:
+	close(mem_fd);
 unregister_device:
 	device__unregister(&vesa_device);
 unregister_ioport:
diff --git a/include/kvm/framebuffer.h b/include/kvm/framebuffer.h
index e3200e5..c340273 100644
--- a/include/kvm/framebuffer.h
+++ b/include/kvm/framebuffer.h
@@ -22,6 +22,7 @@ struct framebuffer {
 	char				*mem;
 	u64				mem_addr;
 	u64				mem_size;
+	int				mem_fd;
 	struct kvm			*kvm;
 
 	unsigned long			nr_targets;
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 11/17] Add a function that allocates aligned memory if specified
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (9 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 10/17] Allocate vesa " Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 12/17] Use new function to align memory Fuad Tabba
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

Add a variant of mmap_anon_or_hugetlbfs() that allocates memory
aligned as specified. This function doesn't map or allocate more
memory than the requested amount.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 util/util.c | 41 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/util/util.c b/util/util.c
index 278bcc2..953e2d8 100644
--- a/util/util.c
+++ b/util/util.c
@@ -129,10 +129,17 @@ int memfd_alloc(u64 size, bool hugetlb, u64 blk_size)
 	return fd;
 }
 
-/* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
-void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
+/*
+ * This function allocates memory aligned to align_sz.
+ * It also wraps the decision between hugetlbfs (if requested) or normal mmap.
+ */
+void *mmap_anon_or_hugetlbfs_align(struct kvm *kvm, const char *htlbfs_path,
+				   u64 size, u64 align_sz)
 {
 	u64 blk_size = 0;
+	u64 total_map = size + align_sz;
+	u64 start_off, end_off;
+	void *addr_map, *addr_align;
 	int fd;
 
 	/*
@@ -152,10 +159,38 @@ void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
 		kvm->ram_pagesize = getpagesize();
 	}
 
+	/* Create a mapping with room for alignment without allocating. */
+	addr_map = mmap(NULL, total_map, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS,
+			-1, 0);
+	if (addr_map == MAP_FAILED)
+		return MAP_FAILED;
+
 	fd = memfd_alloc(size, htlbfs_path, blk_size);
 	if (fd < 0)
 		return MAP_FAILED;
 
+	/* Map the allocated memory in the fd to the specified alignment. */
+	addr_align = (void *)ALIGN((u64)addr_map, align_sz);
+	if (mmap(addr_align, size, PROT_RW, MAP_PRIVATE | MAP_FIXED, fd, 0) ==
+	    MAP_FAILED) {
+		close(fd);
+		return MAP_FAILED;
+	}
+
+	/* Remove the mapping for unused address ranges. */
+	start_off = addr_align - addr_map;
+	if (start_off)
+		munmap(addr_map, start_off);
+
+	end_off = align_sz - start_off;
+	if (end_off)
+		munmap((void *)((u64)addr_align + size), end_off);
+
 	kvm->ram_fd = fd;
-	return mmap(NULL, size, PROT_RW, MAP_PRIVATE, kvm->ram_fd, 0);
+	return addr_align;
+}
+
+void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
+{
+	return mmap_anon_or_hugetlbfs_align(kvm, htlbfs_path, size, 0);
 }
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 12/17] Use new function to align memory
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (10 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 11/17] Add a function that allocates aligned memory if specified Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 13/17] Remove struct fields and code used for alignment Fuad Tabba
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

Use the new mmap_anon_or_hugetlbfs_align() to allocate memory
aligned as needed instead of doing it at the caller while
allocating and mapping more than needed.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arm/kvm.c   | 10 +++++-----
 riscv/kvm.c | 10 +++++-----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arm/kvm.c b/arm/kvm.c
index c84983e..0e5bfad 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -37,17 +37,17 @@ void kvm__init_ram(struct kvm *kvm)
 	 * 2M trumps 64K, so let's go with that.
 	 */
 	kvm->ram_size = kvm->cfg.ram_size;
-	kvm->arch.ram_alloc_size = kvm->ram_size + SZ_2M;
-	kvm->arch.ram_alloc_start = mmap_anon_or_hugetlbfs(kvm,
+	kvm->arch.ram_alloc_size = kvm->ram_size;
+	kvm->arch.ram_alloc_start = mmap_anon_or_hugetlbfs_align(kvm,
 						kvm->cfg.hugetlbfs_path,
-						kvm->arch.ram_alloc_size);
+						kvm->arch.ram_alloc_size,
+						SZ_2M);
 
 	if (kvm->arch.ram_alloc_start == MAP_FAILED)
 		die("Failed to map %lld bytes for guest memory (%d)",
 		    kvm->arch.ram_alloc_size, errno);
 
-	kvm->ram_start = (void *)ALIGN((unsigned long)kvm->arch.ram_alloc_start,
-					SZ_2M);
+	kvm->ram_start = kvm->arch.ram_alloc_start;
 
 	madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size,
 		MADV_MERGEABLE);
diff --git a/riscv/kvm.c b/riscv/kvm.c
index 4d6f5cb..e26b4f0 100644
--- a/riscv/kvm.c
+++ b/riscv/kvm.c
@@ -70,17 +70,17 @@ void kvm__arch_init(struct kvm *kvm)
 	 * 2M trumps 64K, so let's go with that.
 	 */
 	kvm->ram_size = min(kvm->cfg.ram_size, (u64)RISCV_MAX_MEMORY(kvm));
-	kvm->arch.ram_alloc_size = kvm->ram_size + SZ_2M;
-	kvm->arch.ram_alloc_start = mmap_anon_or_hugetlbfs(kvm,
+	kvm->arch.ram_alloc_size = kvm->ram_size;
+	kvm->arch.ram_alloc_start = mmap_anon_or_hugetlbfs_align(kvm,
 						kvm->cfg.hugetlbfs_path,
-						kvm->arch.ram_alloc_size);
+						kvm->arch.ram_alloc_size,
+						SZ_2M);
 
 	if (kvm->arch.ram_alloc_start == MAP_FAILED)
 		die("Failed to map %lld bytes for guest memory (%d)",
 		    kvm->arch.ram_alloc_size, errno);
 
-	kvm->ram_start = (void *)ALIGN((unsigned long)kvm->arch.ram_alloc_start,
-					SZ_2M);
+	kvm->ram_start = kvm->arch.ram_alloc_start;
 
 	madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size,
 		MADV_MERGEABLE);
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 13/17] Remove struct fields and code used for alignment
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (11 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 12/17] Use new function to align memory Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 14/17] Replace kvm_arch_delete_ram with kvm_delete_ram Fuad Tabba
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

Now that the allocator allocates aligned memory, remove
arch-specific code and struct fields used for alignment.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arm/include/arm-common/kvm-arch.h |  7 -------
 arm/kvm.c                         | 31 +++++++++++--------------------
 riscv/include/kvm/kvm-arch.h      |  7 -------
 riscv/kvm.c                       | 21 +++++++--------------
 4 files changed, 18 insertions(+), 48 deletions(-)

diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
index b2ae373..654abc9 100644
--- a/arm/include/arm-common/kvm-arch.h
+++ b/arm/include/arm-common/kvm-arch.h
@@ -96,13 +96,6 @@ static inline bool arm_addr_in_ioport_region(u64 phys_addr)
 }
 
 struct kvm_arch {
-	/*
-	 * We may have to align the guest memory for virtio, so keep the
-	 * original pointers here for munmap.
-	 */
-	void	*ram_alloc_start;
-	u64	ram_alloc_size;
-
 	/*
 	 * Guest addresses for memory layout.
 	 */
diff --git a/arm/kvm.c b/arm/kvm.c
index 0e5bfad..770075e 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -27,7 +27,6 @@ bool kvm__arch_cpu_supports_vm(void)
 void kvm__init_ram(struct kvm *kvm)
 {
 	u64 phys_start, phys_size;
-	void *host_mem;
 	int err;
 
 	/*
@@ -37,42 +36,34 @@ void kvm__init_ram(struct kvm *kvm)
 	 * 2M trumps 64K, so let's go with that.
 	 */
 	kvm->ram_size = kvm->cfg.ram_size;
-	kvm->arch.ram_alloc_size = kvm->ram_size;
-	kvm->arch.ram_alloc_start = mmap_anon_or_hugetlbfs_align(kvm,
-						kvm->cfg.hugetlbfs_path,
-						kvm->arch.ram_alloc_size,
-						SZ_2M);
+	kvm->ram_start = mmap_anon_or_hugetlbfs_align(kvm,
+						      kvm->cfg.hugetlbfs_path,
+						      kvm->ram_size, SZ_2M);
 
-	if (kvm->arch.ram_alloc_start == MAP_FAILED)
+	if (kvm->ram_start == MAP_FAILED)
 		die("Failed to map %lld bytes for guest memory (%d)",
-		    kvm->arch.ram_alloc_size, errno);
+		    kvm->ram_size, errno);
 
-	kvm->ram_start = kvm->arch.ram_alloc_start;
-
-	madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size,
-		MADV_MERGEABLE);
-
-	madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size,
-		MADV_HUGEPAGE);
+	madvise(kvm->ram_start, kvm->ram_size, MADV_MERGEABLE);
+	madvise(kvm->ram_start, kvm->ram_size, MADV_HUGEPAGE);
 
 	phys_start	= kvm->cfg.ram_addr;
 	phys_size	= kvm->ram_size;
-	host_mem	= kvm->ram_start;
 
-	err = kvm__register_ram(kvm, phys_start, phys_size, host_mem);
+	err = kvm__register_ram(kvm, phys_start, phys_size, kvm->ram_start);
 	if (err)
 		die("Failed to register %lld bytes of memory at physical "
 		    "address 0x%llx [err %d]", phys_size, phys_start, err);
 
 	kvm->arch.memory_guest_start = phys_start;
 
-	pr_debug("RAM created at 0x%llx - 0x%llx (host_mem 0x%llx)",
-		 phys_start, phys_start + phys_size - 1, (u64)host_mem);
+	pr_debug("RAM created at 0x%llx - 0x%llx (host ram_start 0x%llx)",
+		 phys_start, phys_start + phys_size - 1, (u64)kvm->ram_start);
 }
 
 void kvm__arch_delete_ram(struct kvm *kvm)
 {
-	munmap(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size);
+	munmap(kvm->ram_start, kvm->ram_size);
 }
 
 void kvm__arch_read_term(struct kvm *kvm)
diff --git a/riscv/include/kvm/kvm-arch.h b/riscv/include/kvm/kvm-arch.h
index 1e130f5..5bb7eee 100644
--- a/riscv/include/kvm/kvm-arch.h
+++ b/riscv/include/kvm/kvm-arch.h
@@ -56,13 +56,6 @@
 struct kvm;
 
 struct kvm_arch {
-	/*
-	 * We may have to align the guest memory for virtio, so keep the
-	 * original pointers here for munmap.
-	 */
-	void	*ram_alloc_start;
-	u64	ram_alloc_size;
-
 	/*
 	 * Guest addresses for memory layout.
 	 */
diff --git a/riscv/kvm.c b/riscv/kvm.c
index e26b4f0..d05b8e4 100644
--- a/riscv/kvm.c
+++ b/riscv/kvm.c
@@ -48,7 +48,7 @@ void kvm__init_ram(struct kvm *kvm)
 
 void kvm__arch_delete_ram(struct kvm *kvm)
 {
-	munmap(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size);
+	munmap(kvm->ram_start, kvm->ram_size);
 }
 
 void kvm__arch_read_term(struct kvm *kvm)
@@ -70,23 +70,16 @@ void kvm__arch_init(struct kvm *kvm)
 	 * 2M trumps 64K, so let's go with that.
 	 */
 	kvm->ram_size = min(kvm->cfg.ram_size, (u64)RISCV_MAX_MEMORY(kvm));
-	kvm->arch.ram_alloc_size = kvm->ram_size;
-	kvm->arch.ram_alloc_start = mmap_anon_or_hugetlbfs_align(kvm,
-						kvm->cfg.hugetlbfs_path,
-						kvm->arch.ram_alloc_size,
-						SZ_2M);
+	kvm->ram_start = mmap_anon_or_hugetlbfs_align(kvm,
+						      kvm->cfg.hugetlbfs_path,
+						      kvm->ram_size, SZ_2M);
 
-	if (kvm->arch.ram_alloc_start == MAP_FAILED)
+	if (kvm->ram_start == MAP_FAILED)
 		die("Failed to map %lld bytes for guest memory (%d)",
 		    kvm->arch.ram_alloc_size, errno);
 
-	kvm->ram_start = kvm->arch.ram_alloc_start;
-
-	madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size,
-		MADV_MERGEABLE);
-
-	madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size,
-		MADV_HUGEPAGE);
+	madvise(kvm->ram_start, kvm->ram_size, MADV_MERGEABLE);
+	madvise(kvm->ram_start, kvm->ram_size, MADV_HUGEPAGE);
 }
 
 #define FDT_ALIGN	SZ_4M
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 14/17] Replace kvm_arch_delete_ram with kvm_delete_ram
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (12 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 13/17] Remove struct fields and code used for alignment Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 15/17] Remove no-longer unused macro Fuad Tabba
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

Now that deleting ram is the same across all architectures, no
need for arch-specific deletion of ram.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arm/kvm.c         |  5 -----
 include/kvm/kvm.h |  1 -
 kvm.c             | 13 +++++++++----
 mips/kvm.c        |  5 -----
 powerpc/kvm.c     |  5 -----
 riscv/kvm.c       |  5 -----
 x86/kvm.c         |  5 -----
 7 files changed, 9 insertions(+), 30 deletions(-)

diff --git a/arm/kvm.c b/arm/kvm.c
index 770075e..5cceef8 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -61,11 +61,6 @@ void kvm__init_ram(struct kvm *kvm)
 		 phys_start, phys_start + phys_size - 1, (u64)kvm->ram_start);
 }
 
-void kvm__arch_delete_ram(struct kvm *kvm)
-{
-	munmap(kvm->ram_start, kvm->ram_size);
-}
-
 void kvm__arch_read_term(struct kvm *kvm)
 {
 	serial8250__update_consoles(kvm);
diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index d0d519b..f0be524 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -198,7 +198,6 @@ void kvm__arch_validate_cfg(struct kvm *kvm);
 void kvm__arch_set_cmdline(char *cmdline, bool video);
 void kvm__arch_init(struct kvm *kvm);
 u64 kvm__arch_default_ram_address(void);
-void kvm__arch_delete_ram(struct kvm *kvm);
 int kvm__arch_setup_firmware(struct kvm *kvm);
 int kvm__arch_free_firmware(struct kvm *kvm);
 bool kvm__arch_cpu_supports_vm(void);
diff --git a/kvm.c b/kvm.c
index ed29d68..695c038 100644
--- a/kvm.c
+++ b/kvm.c
@@ -169,14 +169,19 @@ struct kvm *kvm__new(void)
 	return kvm;
 }
 
-int kvm__exit(struct kvm *kvm)
+static void kvm__delete_ram(struct kvm *kvm)
 {
-	struct kvm_mem_bank *bank, *tmp;
-
-	kvm__arch_delete_ram(kvm);
+	munmap(kvm->ram_start, kvm->ram_size);
 
 	if (kvm->ram_fd >= 0)
 		close(kvm->ram_fd);
+}
+
+int kvm__exit(struct kvm *kvm)
+{
+	struct kvm_mem_bank *bank, *tmp;
+
+	kvm__delete_ram(kvm);
 
 	list_for_each_entry_safe(bank, tmp, &kvm->mem_banks, list) {
 		list_del(&bank->list);
diff --git a/mips/kvm.c b/mips/kvm.c
index 0faa03a..0a0d025 100644
--- a/mips/kvm.c
+++ b/mips/kvm.c
@@ -56,11 +56,6 @@ void kvm__init_ram(struct kvm *kvm)
 	}
 }
 
-void kvm__arch_delete_ram(struct kvm *kvm)
-{
-	munmap(kvm->ram_start, kvm->ram_size);
-}
-
 void kvm__arch_set_cmdline(char *cmdline, bool video)
 {
 
diff --git a/powerpc/kvm.c b/powerpc/kvm.c
index 7b0d066..8d467e9 100644
--- a/powerpc/kvm.c
+++ b/powerpc/kvm.c
@@ -148,11 +148,6 @@ void kvm__arch_init(struct kvm *kvm)
 			 SPAPR_PCI_IO_WIN_SIZE);
 }
 
-void kvm__arch_delete_ram(struct kvm *kvm)
-{
-	munmap(kvm->ram_start, kvm->ram_size);
-}
-
 void kvm__irq_trigger(struct kvm *kvm, int irq)
 {
 	kvm__irq_line(kvm, irq, 1);
diff --git a/riscv/kvm.c b/riscv/kvm.c
index d05b8e4..4a2a3df 100644
--- a/riscv/kvm.c
+++ b/riscv/kvm.c
@@ -46,11 +46,6 @@ void kvm__init_ram(struct kvm *kvm)
 	kvm->arch.memory_guest_start = phys_start;
 }
 
-void kvm__arch_delete_ram(struct kvm *kvm)
-{
-	munmap(kvm->ram_start, kvm->ram_size);
-}
-
 void kvm__arch_read_term(struct kvm *kvm)
 {
 	serial8250__update_consoles(kvm);
diff --git a/x86/kvm.c b/x86/kvm.c
index 328fa75..8d29904 100644
--- a/x86/kvm.c
+++ b/x86/kvm.c
@@ -177,11 +177,6 @@ void kvm__arch_init(struct kvm *kvm)
 		die_perror("KVM_CREATE_IRQCHIP ioctl");
 }
 
-void kvm__arch_delete_ram(struct kvm *kvm)
-{
-	munmap(kvm->ram_start, kvm->ram_size);
-}
-
 void kvm__irq_line(struct kvm *kvm, int irq, int level)
 {
 	struct kvm_irq_level irq_level;
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 15/17] Remove no-longer unused macro
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (13 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 14/17] Replace kvm_arch_delete_ram with kvm_delete_ram Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 16/17] Factor out set_user_memory_region code Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 17/17] Pass the memory file descriptor and offset when registering ram Fuad Tabba
  16 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

Not needed anymore since we're not doing anonymous allocation.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 include/kvm/util.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/kvm/util.h b/include/kvm/util.h
index 369603b..f807fcc 100644
--- a/include/kvm/util.h
+++ b/include/kvm/util.h
@@ -35,7 +35,6 @@
 extern bool do_debug_print;
 
 #define PROT_RW (PROT_READ|PROT_WRITE)
-#define MAP_ANON_NORESERVE (MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE)
 
 extern void die(const char *err, ...) NORETURN __attribute__((format (printf, 1, 2)));
 extern void die_perror(const char *s) NORETURN;
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 16/17] Factor out set_user_memory_region code
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (14 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 15/17] Remove no-longer unused macro Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  2022-11-15 11:15 ` [PATCH kvmtool v1 17/17] Pass the memory file descriptor and offset when registering ram Fuad Tabba
  16 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

This is common code, and will be reused in the future when
setting memory regions using file descriptors.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 kvm.c | 53 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 28 insertions(+), 25 deletions(-)

diff --git a/kvm.c b/kvm.c
index 695c038..a1eb365 100644
--- a/kvm.c
+++ b/kvm.c
@@ -193,10 +193,30 @@ int kvm__exit(struct kvm *kvm)
 }
 core_exit(kvm__exit);
 
+
+static int set_user_memory_region(int vm_fd, u32 slot, u32 flags,
+				  u64 guest_phys, u64 size,
+				  u64 userspace_addr)
+{
+	int ret = 0;
+	struct kvm_userspace_memory_region mem = {
+		.slot			= slot,
+		.flags			= flags,
+		.guest_phys_addr	= guest_phys,
+		.memory_size		= size,
+		.userspace_addr		= (unsigned long)userspace_addr,
+	};
+
+	ret = ioctl(vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
+	if (ret < 0)
+		ret = -errno;
+
+	return ret;
+}
+
 int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size,
 		     void *userspace_addr)
 {
-	struct kvm_userspace_memory_region mem;
 	struct kvm_mem_bank *bank;
 	int ret;
 
@@ -220,18 +240,10 @@ int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size,
 		goto out;
 	}
 
-	mem = (struct kvm_userspace_memory_region) {
-		.slot			= bank->slot,
-		.guest_phys_addr	= guest_phys,
-		.memory_size		= 0,
-		.userspace_addr		= (unsigned long)userspace_addr,
-	};
-
-	ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
-	if (ret < 0) {
-		ret = -errno;
+	ret = set_user_memory_region(kvm->vm_fd, bank->slot, 0, guest_phys, 0,
+				     (u64) userspace_addr);
+	if (ret < 0)
 		goto out;
-	}
 
 	list_del(&bank->list);
 	free(bank);
@@ -246,7 +258,6 @@ out:
 int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
 		      void *userspace_addr, enum kvm_mem_type type)
 {
-	struct kvm_userspace_memory_region mem;
 	struct kvm_mem_bank *merged = NULL;
 	struct kvm_mem_bank *bank;
 	struct list_head *prev_entry;
@@ -327,19 +338,11 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
 		flags |= KVM_MEM_READONLY;
 
 	if (type != KVM_MEM_TYPE_RESERVED) {
-		mem = (struct kvm_userspace_memory_region) {
-			.slot			= slot,
-			.flags			= flags,
-			.guest_phys_addr	= guest_phys,
-			.memory_size		= size,
-			.userspace_addr		= (unsigned long)userspace_addr,
-		};
-
-		ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
-		if (ret < 0) {
-			ret = -errno;
+		ret = set_user_memory_region(kvm->vm_fd, slot, flags,
+					     guest_phys, size,
+					     (u64) userspace_addr);
+		if (ret < 0)
 			goto out;
-		}
 	}
 
 	list_add(&bank->list, prev_entry);
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH kvmtool v1 17/17] Pass the memory file descriptor and offset when registering ram
  2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
                   ` (15 preceding siblings ...)
  2022-11-15 11:15 ` [PATCH kvmtool v1 16/17] Factor out set_user_memory_region code Fuad Tabba
@ 2022-11-15 11:15 ` Fuad Tabba
  16 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-15 11:15 UTC (permalink / raw)
  To: kvm; +Cc: julien.thierry.kdev, andre.przywara, alexandru.elisei, will, tabba

Since the memory file descriptor is the canonical reference to guest
memory, pass that and the offset when registering guest memory.
Future fd-based kvm proposals might even not require a userspace
address [*].

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>

[*] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
---
 arm/aarch64/pvtime.c |  2 +-
 arm/kvm.c            |  3 ++-
 hw/cfi_flash.c       |  4 +++-
 hw/vesa.c            |  2 +-
 include/kvm/kvm.h    | 17 +++++++++--------
 kvm.c                |  3 ++-
 mips/kvm.c           |  6 +++---
 powerpc/kvm.c        |  2 +-
 riscv/kvm.c          |  2 +-
 vfio/core.c          |  3 ++-
 x86/kvm.c            |  6 +++---
 11 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/arm/aarch64/pvtime.c b/arm/aarch64/pvtime.c
index a7ba03e..9b06ee4 100644
--- a/arm/aarch64/pvtime.c
+++ b/arm/aarch64/pvtime.c
@@ -28,7 +28,7 @@ static int pvtime__alloc_region(struct kvm *kvm)
 	}
 
 	ret = kvm__register_ram(kvm, ARM_PVTIME_BASE,
-				ARM_PVTIME_SIZE, mem);
+				ARM_PVTIME_SIZE, mem, mem_fd, 0);
 	if (ret) {
 		munmap(mem, ARM_PVTIME_SIZE);
 		close(mem_fd);
diff --git a/arm/kvm.c b/arm/kvm.c
index 5cceef8..8772a55 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -50,7 +50,8 @@ void kvm__init_ram(struct kvm *kvm)
 	phys_start	= kvm->cfg.ram_addr;
 	phys_size	= kvm->ram_size;
 
-	err = kvm__register_ram(kvm, phys_start, phys_size, kvm->ram_start);
+	err = kvm__register_ram(kvm, phys_start, phys_size, kvm->ram_start,
+		kvm->ram_fd, 0);
 	if (err)
 		die("Failed to register %lld bytes of memory at physical "
 		    "address 0x%llx [err %d]", phys_size, phys_start, err);
diff --git a/hw/cfi_flash.c b/hw/cfi_flash.c
index 7faecdf..92a6567 100644
--- a/hw/cfi_flash.c
+++ b/hw/cfi_flash.c
@@ -131,6 +131,7 @@ struct cfi_flash_device {
 	u32			size;
 
 	void			*flash_memory;
+	int			flash_fd;
 	u8			program_buffer[PROGRAM_BUFF_SIZE];
 	unsigned long		*lock_bm;
 	u64			block_address;
@@ -451,7 +452,7 @@ static int map_flash_memory(struct kvm *kvm, struct cfi_flash_device *sfdev)
 	int ret;
 
 	ret = kvm__register_mem(kvm, sfdev->base_addr, sfdev->size,
-				sfdev->flash_memory,
+				sfdev->flash_memory, sfdev->flash_fd, 0,
 				KVM_MEM_TYPE_RAM | KVM_MEM_TYPE_READONLY);
 	if (!ret)
 		sfdev->is_mapped = true;
@@ -583,6 +584,7 @@ static struct cfi_flash_device *create_flash_device_file(struct kvm *kvm,
 		ret = -errno;
 		goto out_free;
 	}
+	sfdev->flash_fd = fd;
 	sfdev->base_addr = KVM_FLASH_MMIO_BASE;
 	sfdev->state = READY;
 	sfdev->read_mode = READ_ARRAY;
diff --git a/hw/vesa.c b/hw/vesa.c
index 522ffa3..277d638 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -102,7 +102,7 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 		goto close_memfd;
 	}
 
-	r = kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
+	r = kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem, mem_fd, 0);
 	if (r < 0)
 		goto unmap_dev;
 
diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index f0be524..33cae9d 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -135,24 +135,25 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
 bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u8 is_write);
 int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr);
 int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr,
-		      enum kvm_mem_type type);
+		      int memfd, u64 offset, enum kvm_mem_type type);
 static inline int kvm__register_ram(struct kvm *kvm, u64 guest_phys, u64 size,
-				    void *userspace_addr)
+				    void *userspace_addr, int memfd, u64 offset)
 {
-	return kvm__register_mem(kvm, guest_phys, size, userspace_addr,
-				 KVM_MEM_TYPE_RAM);
+	return kvm__register_mem(kvm, guest_phys, size, userspace_addr, memfd,
+				 offset, KVM_MEM_TYPE_RAM);
 }
 
 static inline int kvm__register_dev_mem(struct kvm *kvm, u64 guest_phys,
-					u64 size, void *userspace_addr)
+					u64 size, void *userspace_addr,
+					int memfd, u64 offset)
 {
-	return kvm__register_mem(kvm, guest_phys, size, userspace_addr,
-				 KVM_MEM_TYPE_DEVICE);
+	return kvm__register_mem(kvm, guest_phys, size, userspace_addr, memfd,
+				 offset, KVM_MEM_TYPE_DEVICE);
 }
 
 static inline int kvm__reserve_mem(struct kvm *kvm, u64 guest_phys, u64 size)
 {
-	return kvm__register_mem(kvm, guest_phys, size, NULL,
+	return kvm__register_mem(kvm, guest_phys, size, NULL, -1, 0,
 				 KVM_MEM_TYPE_RESERVED);
 }
 
diff --git a/kvm.c b/kvm.c
index a1eb365..0c6ed3d 100644
--- a/kvm.c
+++ b/kvm.c
@@ -256,7 +256,8 @@ out:
 }
 
 int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
-		      void *userspace_addr, enum kvm_mem_type type)
+		      void *userspace_addr, int memfd, u64 offset,
+		      enum kvm_mem_type type)
 {
 	struct kvm_mem_bank *merged = NULL;
 	struct kvm_mem_bank *bank;
diff --git a/mips/kvm.c b/mips/kvm.c
index 0a0d025..ebb2b19 100644
--- a/mips/kvm.c
+++ b/mips/kvm.c
@@ -38,21 +38,21 @@ void kvm__init_ram(struct kvm *kvm)
 		phys_size  = kvm->ram_size;
 		host_mem   = kvm->ram_start;
 
-		kvm__register_ram(kvm, phys_start, phys_size, host_mem);
+		kvm__register_ram(kvm, phys_start, phys_size, host_mem, kvm->ram_fd, 0);
 	} else {
 		/* one region for memory that fits below MMIO range */
 		phys_start = 0;
 		phys_size  = KVM_MMIO_START;
 		host_mem   = kvm->ram_start;
 
-		kvm__register_ram(kvm, phys_start, phys_size, host_mem);
+		kvm__register_ram(kvm, phys_start, phys_size, host_mem, kvm->ram_fd, 0);
 
 		/* one region for rest of memory */
 		phys_start = KVM_MMIO_START + KVM_MMIO_SIZE;
 		phys_size  = kvm->ram_size - KVM_MMIO_START;
 		host_mem   = kvm->ram_start + KVM_MMIO_START;
 
-		kvm__register_ram(kvm, phys_start, phys_size, host_mem);
+		kvm__register_ram(kvm, phys_start, phys_size, host_mem, kvm->ram_fd, 0);
 	}
 }
 
diff --git a/powerpc/kvm.c b/powerpc/kvm.c
index 8d467e9..c36c497 100644
--- a/powerpc/kvm.c
+++ b/powerpc/kvm.c
@@ -88,7 +88,7 @@ void kvm__init_ram(struct kvm *kvm)
 		    "overlaps MMIO!\n",
 		    phys_size);
 
-	kvm__register_ram(kvm, phys_start, phys_size, host_mem);
+	kvm__register_ram(kvm, phys_start, phys_size, host_mem, kvm->ram_fd, 0);
 }
 
 void kvm__arch_set_cmdline(char *cmdline, bool video)
diff --git a/riscv/kvm.c b/riscv/kvm.c
index 4a2a3df..bb79c5d 100644
--- a/riscv/kvm.c
+++ b/riscv/kvm.c
@@ -38,7 +38,7 @@ void kvm__init_ram(struct kvm *kvm)
 	phys_size	= kvm->ram_size;
 	host_mem	= kvm->ram_start;
 
-	err = kvm__register_ram(kvm, phys_start, phys_size, host_mem);
+	err = kvm__register_ram(kvm, phys_start, phys_size, host_mem, kvm->ram_fd, 0);
 	if (err)
 		die("Failed to register %lld bytes of memory at physical "
 		    "address 0x%llx [err %d]", phys_size, phys_start, err);
diff --git a/vfio/core.c b/vfio/core.c
index 3ff2c0b..ea189a0 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -255,7 +255,8 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
 	region->host_addr = base;
 
 	ret = kvm__register_dev_mem(kvm, region->guest_phys_addr, map_size,
-				    region->host_addr);
+				    region->host_addr, vdev->fd,
+				    region->info.offset);
 	if (ret) {
 		vfio_dev_err(vdev, "failed to register region with KVM");
 		return ret;
diff --git a/x86/kvm.c b/x86/kvm.c
index 8d29904..cee82d3 100644
--- a/x86/kvm.c
+++ b/x86/kvm.c
@@ -107,7 +107,7 @@ void kvm__init_ram(struct kvm *kvm)
 		phys_size  = kvm->ram_size;
 		host_mem   = kvm->ram_start;
 
-		kvm__register_ram(kvm, phys_start, phys_size, host_mem);
+		kvm__register_ram(kvm, phys_start, phys_size, host_mem, kvm->ram_fd, 0);
 	} else {
 		/* First RAM range from zero to the PCI gap: */
 
@@ -115,7 +115,7 @@ void kvm__init_ram(struct kvm *kvm)
 		phys_size  = KVM_32BIT_GAP_START;
 		host_mem   = kvm->ram_start;
 
-		kvm__register_ram(kvm, phys_start, phys_size, host_mem);
+		kvm__register_ram(kvm, phys_start, phys_size, host_mem, kvm->ram_fd, 0);
 
 		/* Second RAM range from 4GB to the end of RAM: */
 
@@ -123,7 +123,7 @@ void kvm__init_ram(struct kvm *kvm)
 		phys_size  = kvm->ram_size - phys_start;
 		host_mem   = kvm->ram_start + phys_start;
 
-		kvm__register_ram(kvm, phys_start, phys_size, host_mem);
+		kvm__register_ram(kvm, phys_start, phys_size, host_mem, kvm->ram_fd, 0);
 	}
 }
 
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 01/17] Initialize the return value in kvm__for_each_mem_bank()
  2022-11-15 11:15 ` [PATCH kvmtool v1 01/17] Initialize the return value in kvm__for_each_mem_bank() Fuad Tabba
@ 2022-11-15 11:59   ` Andre Przywara
  2022-11-23 16:08   ` Alexandru Elisei
  1 sibling, 0 replies; 36+ messages in thread
From: Andre Przywara @ 2022-11-15 11:59 UTC (permalink / raw)
  To: Fuad Tabba; +Cc: kvm, julien.thierry.kdev, alexandru.elisei, will

On Tue, 15 Nov 2022 11:15:33 +0000
Fuad Tabba <tabba@google.com> wrote:

Hi,

> If none of the bank types match, the function would return an
> uninitialized value.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>

Indeed the comment promises to return 0 if there is no error.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  kvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kvm.c b/kvm.c
> index 42b8812..78bc0d8 100644
> --- a/kvm.c
> +++ b/kvm.c
> @@ -387,7 +387,7 @@ int kvm__for_each_mem_bank(struct kvm *kvm, enum kvm_mem_type type,
>  			   int (*fun)(struct kvm *kvm, struct kvm_mem_bank *bank, void *data),
>  			   void *data)
>  {
> -	int ret;
> +	int ret = 0;
>  	struct kvm_mem_bank *bank;
>  
>  	list_for_each_entry(bank, &kvm->mem_banks, list) {


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 02/17] Make mmap_hugetlbfs() static
  2022-11-15 11:15 ` [PATCH kvmtool v1 02/17] Make mmap_hugetlbfs() static Fuad Tabba
@ 2022-11-15 17:58   ` Andre Przywara
  0 siblings, 0 replies; 36+ messages in thread
From: Andre Przywara @ 2022-11-15 17:58 UTC (permalink / raw)
  To: Fuad Tabba; +Cc: kvm, julien.thierry.kdev, alexandru.elisei, will

On Tue, 15 Nov 2022 11:15:34 +0000
Fuad Tabba <tabba@google.com> wrote:

> This function isn't used outside of util.c.
> 
> No functional change intended.

I am not sure if it was once intended to be public, but it's indeed not
used outside, and it looks like we lose this function anyway, so:

> Signed-off-by: Fuad Tabba <tabba@google.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  include/kvm/util.h | 1 -
>  util/util.c        | 2 +-
>  2 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/include/kvm/util.h b/include/kvm/util.h
> index b494548..b0c3684 100644
> --- a/include/kvm/util.h
> +++ b/include/kvm/util.h
> @@ -140,7 +140,6 @@ static inline int pow2_size(unsigned long x)
>  }
>  
>  struct kvm;
> -void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
>  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size);
>  
>  #endif /* KVM__UTIL_H */
> diff --git a/util/util.c b/util/util.c
> index 1877105..093bd3b 100644
> --- a/util/util.c
> +++ b/util/util.c
> @@ -81,7 +81,7 @@ void die_perror(const char *s)
>  	exit(1);
>  }
>  
> -void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> +static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
>  {
>  	char mpath[PATH_MAX];
>  	int fd;


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 01/17] Initialize the return value in kvm__for_each_mem_bank()
  2022-11-15 11:15 ` [PATCH kvmtool v1 01/17] Initialize the return value in kvm__for_each_mem_bank() Fuad Tabba
  2022-11-15 11:59   ` Andre Przywara
@ 2022-11-23 16:08   ` Alexandru Elisei
  2022-11-23 17:43     ` Fuad Tabba
  1 sibling, 1 reply; 36+ messages in thread
From: Alexandru Elisei @ 2022-11-23 16:08 UTC (permalink / raw)
  To: Fuad Tabba; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Tue, Nov 15, 2022 at 11:15:33AM +0000, Fuad Tabba wrote:
> If none of the bank types match, the function would return an
> uninitialized value.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  kvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kvm.c b/kvm.c
> index 42b8812..78bc0d8 100644
> --- a/kvm.c
> +++ b/kvm.c
> @@ -387,7 +387,7 @@ int kvm__for_each_mem_bank(struct kvm *kvm, enum kvm_mem_type type,
>  			   int (*fun)(struct kvm *kvm, struct kvm_mem_bank *bank, void *data),
>  			   void *data)
>  {
> -	int ret;
> +	int ret = 0;

Would you consider moving the variable declaration after the 'bank'
variable?

>  	struct kvm_mem_bank *bank;
>  
>  	list_for_each_entry(bank, &kvm->mem_banks, list) {

Shouldn't the function return an error if no memory banks matched the type
specified (initialize ret to -EINVAL instead of 0)? I'm thinking that if
the caller expects a certain type of memory bank to be present, but that
memory type is not present, then somehwere an error occured and the caller
should be made aware of it.

Case in point, kvm__for_each_mem_bank() is used vfio/core.c for
KVM_MEM_TYPE_RAM. If RAM hasn't been created by that point, then
VFIO_IOMMU_MAP_DMA will not be called for guest memory and the assigned
device will not work.

Thanks,
Alex

> -- 
> 2.38.1.431.g37b22c650d-goog
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 03/17] Rename parameter in mmap_anon_or_hugetlbfs()
  2022-11-15 11:15 ` [PATCH kvmtool v1 03/17] Rename parameter in mmap_anon_or_hugetlbfs() Fuad Tabba
@ 2022-11-23 16:40   ` Alexandru Elisei
  2022-11-23 17:44     ` Fuad Tabba
  0 siblings, 1 reply; 36+ messages in thread
From: Alexandru Elisei @ 2022-11-23 16:40 UTC (permalink / raw)
  To: Fuad Tabba; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Tue, Nov 15, 2022 at 11:15:35AM +0000, Fuad Tabba wrote:
> For consistency with other similar functions in the same file and
> for brevity.
> 
> No functional change intended.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  include/kvm/util.h | 2 +-
>  util/util.c        | 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/kvm/util.h b/include/kvm/util.h
> index b0c3684..61a205b 100644
> --- a/include/kvm/util.h
> +++ b/include/kvm/util.h
> @@ -140,6 +140,6 @@ static inline int pow2_size(unsigned long x)
>  }
>  
>  struct kvm;
> -void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size);
> +void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
>  
>  #endif /* KVM__UTIL_H */
> diff --git a/util/util.c b/util/util.c
> index 093bd3b..22b64b6 100644
> --- a/util/util.c
> +++ b/util/util.c
> @@ -118,14 +118,14 @@ static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
>  }
>  
>  /* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
> -void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size)
> +void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)

All the functions that deal with hugetlbfs have "hugetlbfs" in the name,
and the kvm_config field is called hugetlbfs_path. Wouldn't it make more
sense to rename the htlbfs_path parameter to hugetlbs_path instead of the
other way around?

Thanks,
Alex

>  {
> -	if (hugetlbfs_path)
> +	if (htlbfs_path)
>  		/*
>  		 * We don't /need/ to map guest RAM from hugetlbfs, but we do so
>  		 * if the user specifies a hugetlbfs path.
>  		 */
> -		return mmap_hugetlbfs(kvm, hugetlbfs_path, size);
> +		return mmap_hugetlbfs(kvm, htlbfs_path, size);
>  	else {
>  		kvm->ram_pagesize = getpagesize();
>  		return mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
> -- 
> 2.38.1.431.g37b22c650d-goog
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 01/17] Initialize the return value in kvm__for_each_mem_bank()
  2022-11-23 16:08   ` Alexandru Elisei
@ 2022-11-23 17:43     ` Fuad Tabba
  0 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-23 17:43 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Wed, Nov 23, 2022 at 4:08 PM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi,
>
> On Tue, Nov 15, 2022 at 11:15:33AM +0000, Fuad Tabba wrote:
> > If none of the bank types match, the function would return an
> > uninitialized value.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  kvm.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kvm.c b/kvm.c
> > index 42b8812..78bc0d8 100644
> > --- a/kvm.c
> > +++ b/kvm.c
> > @@ -387,7 +387,7 @@ int kvm__for_each_mem_bank(struct kvm *kvm, enum kvm_mem_type type,
> >                          int (*fun)(struct kvm *kvm, struct kvm_mem_bank *bank, void *data),
> >                          void *data)
> >  {
> > -     int ret;
> > +     int ret = 0;
>
> Would you consider moving the variable declaration after the 'bank'
> variable?

Will do.

> >       struct kvm_mem_bank *bank;
> >
> >       list_for_each_entry(bank, &kvm->mem_banks, list) {
>
> Shouldn't the function return an error if no memory banks matched the type
> specified (initialize ret to -EINVAL instead of 0)? I'm thinking that if
> the caller expects a certain type of memory bank to be present, but that
> memory type is not present, then somehwere an error occured and the caller
> should be made aware of it.
>
> Case in point, kvm__for_each_mem_bank() is used vfio/core.c for
> KVM_MEM_TYPE_RAM. If RAM hasn't been created by that point, then
> VFIO_IOMMU_MAP_DMA will not be called for guest memory and the assigned
> device will not work.

I was following the behavior specified in the comment. That said, I
agree with you that returning an error should be the correct behavior.
I'll fix that and adjust the comment to reflect that.

Cheers,
/fuad

> Thanks,
> Alex
>
> > --
> > 2.38.1.431.g37b22c650d-goog
> >

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 03/17] Rename parameter in mmap_anon_or_hugetlbfs()
  2022-11-23 16:40   ` Alexandru Elisei
@ 2022-11-23 17:44     ` Fuad Tabba
  0 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-23 17:44 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Wed, Nov 23, 2022 at 4:40 PM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi,
>
> On Tue, Nov 15, 2022 at 11:15:35AM +0000, Fuad Tabba wrote:
> > For consistency with other similar functions in the same file and
> > for brevity.
> >
> > No functional change intended.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  include/kvm/util.h | 2 +-
> >  util/util.c        | 6 +++---
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/kvm/util.h b/include/kvm/util.h
> > index b0c3684..61a205b 100644
> > --- a/include/kvm/util.h
> > +++ b/include/kvm/util.h
> > @@ -140,6 +140,6 @@ static inline int pow2_size(unsigned long x)
> >  }
> >
> >  struct kvm;
> > -void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size);
> > +void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
> >
> >  #endif /* KVM__UTIL_H */
> > diff --git a/util/util.c b/util/util.c
> > index 093bd3b..22b64b6 100644
> > --- a/util/util.c
> > +++ b/util/util.c
> > @@ -118,14 +118,14 @@ static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> >  }
> >
> >  /* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
> > -void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size)
> > +void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
>
> All the functions that deal with hugetlbfs have "hugetlbfs" in the name,
> and the kvm_config field is called hugetlbfs_path. Wouldn't it make more
> sense to rename the htlbfs_path parameter to hugetlbs_path instead of the
> other way around?

I was going for brevity, but changing it the other way around is more
consistent, as you said.

I'll do that when I respin this.

Cheers,
/fuad

> Thanks,
> Alex
>
> >  {
> > -     if (hugetlbfs_path)
> > +     if (htlbfs_path)
> >               /*
> >                * We don't /need/ to map guest RAM from hugetlbfs, but we do so
> >                * if the user specifies a hugetlbfs path.
> >                */
> > -             return mmap_hugetlbfs(kvm, hugetlbfs_path, size);
> > +             return mmap_hugetlbfs(kvm, htlbfs_path, size);
> >       else {
> >               kvm->ram_pagesize = getpagesize();
> >               return mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
> > --
> > 2.38.1.431.g37b22c650d-goog
> >

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 06/17] Use memfd for hugetlbfs when allocating guest ram
  2022-11-15 11:15 ` [PATCH kvmtool v1 06/17] Use memfd for hugetlbfs when allocating guest ram Fuad Tabba
@ 2022-11-24 10:19   ` Alexandru Elisei
  2022-11-24 10:45     ` Fuad Tabba
  0 siblings, 1 reply; 36+ messages in thread
From: Alexandru Elisei @ 2022-11-24 10:19 UTC (permalink / raw)
  To: Fuad Tabba; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Tue, Nov 15, 2022 at 11:15:38AM +0000, Fuad Tabba wrote:
> This removes the need of using a temporary file for the fd.

I'm confused by this. The man page for memfd_create says that it creates an
anonymous file that lives in RAM.

> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  util/util.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/util/util.c b/util/util.c
> index e6c0951..d6ceb5d 100644
> --- a/util/util.c
> +++ b/util/util.c
> @@ -10,6 +10,14 @@
>  #include <sys/stat.h>
>  #include <sys/statfs.h>
>  
> +#ifndef MFD_HUGETLB
> +#define MFD_HUGETLB	0x0004U
> +#endif
> +
> +#ifndef MFD_HUGE_SHIFT
> +#define MFD_HUGE_SHIFT	26
> +#endif

Hm... on my machine these are defined in linux/memfd.h, maybe you are
missing the include?

> +
>  static void report(const char *prefix, const char *err, va_list params)
>  {
>  	char msg[1024];
> @@ -96,10 +104,12 @@ static u64 get_hugepage_blk_size(const char *htlbfs_path)
>  
>  static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
>  {
> -	char mpath[PATH_MAX];
> +	const char *name = "kvmtool";
> +	unsigned int flags = 0;
>  	int fd;
>  	void *addr;
>  	u64 blk_size;
> +	int htsize;
>  
>  	blk_size = get_hugepage_blk_size(htlbfs_path);
>  	if (blk_size == 0 || blk_size > size) {
> @@ -107,13 +117,18 @@ static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
>  			(unsigned long long)blk_size, (unsigned long long)size);
>  	}
>  
> +	htsize = __builtin_ctzl(blk_size);
> +	if ((1ULL << htsize) != blk_size)
> +		die("Hugepage size must be a power of 2.\n");
> +
> +	flags |= MFD_HUGETLB;
> +	flags |= htsize << MFD_HUGE_SHIFT;

If I understand the intention correctly, this entire sequence can be
rewritten using is_power_of_two() from util.h:

	if (!is_power_of_two(blk_size))
		die("Hugepage size must be a power of 2");

	flags |= MFD_HUGETLB;
	flags |= blk_size << MFD_HUGE_SHIFT;

Also, die() automatically adds the newline at the end of the string.
That's unless you specifically wanted two newline characters at the end of
the message.

> +
>  	kvm->ram_pagesize = blk_size;
>  
> -	snprintf(mpath, PATH_MAX, "%s/kvmtoolXXXXXX", htlbfs_path);
> -	fd = mkstemp(mpath);
> +	fd = memfd_create(name, flags);
>  	if (fd < 0)
> -		die("Can't open %s for hugetlbfs map\n", mpath);
> -	unlink(mpath);
> +		die("Can't memfd_create for hugetlbfs map\n");

die_perror("memfd_create")? That way you also print the error number and
the message associated with it. Same thing with the other die statements
here, replacing them with die_perror() looks like it would be helpful.

Thanks,
Alex

>  	if (ftruncate(fd, size) < 0)
>  		die("Can't ftruncate for mem mapping size %lld\n",
>  			(unsigned long long)size);
> -- 
> 2.38.1.431.g37b22c650d-goog
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 06/17] Use memfd for hugetlbfs when allocating guest ram
  2022-11-24 10:19   ` Alexandru Elisei
@ 2022-11-24 10:45     ` Fuad Tabba
  0 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-24 10:45 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Thu, Nov 24, 2022 at 10:19 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi,
>
> On Tue, Nov 15, 2022 at 11:15:38AM +0000, Fuad Tabba wrote:
> > This removes the need of using a temporary file for the fd.
>
> I'm confused by this. The man page for memfd_create says that it creates an
> anonymous file that lives in RAM.

I'm referring for the need to create a temporary file in the hugetlbfs path.

>
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  util/util.c | 25 ++++++++++++++++++++-----
> >  1 file changed, 20 insertions(+), 5 deletions(-)
> >
> > diff --git a/util/util.c b/util/util.c
> > index e6c0951..d6ceb5d 100644
> > --- a/util/util.c
> > +++ b/util/util.c
> > @@ -10,6 +10,14 @@
> >  #include <sys/stat.h>
> >  #include <sys/statfs.h>
> >
> > +#ifndef MFD_HUGETLB
> > +#define MFD_HUGETLB  0x0004U
> > +#endif
> > +
> > +#ifndef MFD_HUGE_SHIFT
> > +#define MFD_HUGE_SHIFT       26
> > +#endif
>
> Hm... on my machine these are defined in linux/memfd.h, maybe you are
> missing the include?

Ack. I'll replace them with the include.

>
> > +
> >  static void report(const char *prefix, const char *err, va_list params)
> >  {
> >       char msg[1024];
> > @@ -96,10 +104,12 @@ static u64 get_hugepage_blk_size(const char *htlbfs_path)
> >
> >  static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> >  {
> > -     char mpath[PATH_MAX];
> > +     const char *name = "kvmtool";
> > +     unsigned int flags = 0;
> >       int fd;
> >       void *addr;
> >       u64 blk_size;
> > +     int htsize;
> >
> >       blk_size = get_hugepage_blk_size(htlbfs_path);
> >       if (blk_size == 0 || blk_size > size) {
> > @@ -107,13 +117,18 @@ static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> >                       (unsigned long long)blk_size, (unsigned long long)size);
> >       }
> >
> > +     htsize = __builtin_ctzl(blk_size);
> > +     if ((1ULL << htsize) != blk_size)
> > +             die("Hugepage size must be a power of 2.\n");
> > +
> > +     flags |= MFD_HUGETLB;
> > +     flags |= htsize << MFD_HUGE_SHIFT;
>
> If I understand the intention correctly, this entire sequence can be
> rewritten using is_power_of_two() from util.h:
>
>         if (!is_power_of_two(blk_size))
>                 die("Hugepage size must be a power of 2");

This is simpler, yes.

>
> Also, die() automatically adds the newline at the end of the string.
> That's unless you specifically wanted two newline characters at the end of
> the message.

Ack.

>
> > +
> >       kvm->ram_pagesize = blk_size;
> >
> > -     snprintf(mpath, PATH_MAX, "%s/kvmtoolXXXXXX", htlbfs_path);
> > -     fd = mkstemp(mpath);
> > +     fd = memfd_create(name, flags);
> >       if (fd < 0)
> > -             die("Can't open %s for hugetlbfs map\n", mpath);
> > -     unlink(mpath);
> > +             die("Can't memfd_create for hugetlbfs map\n");
>
> die_perror("memfd_create")? That way you also print the error number and
> the message associated with it. Same thing with the other die statements
> here, replacing them with die_perror() looks like it would be helpful.

Will do.

Cheers,
/fuad

>
> Thanks,
> Alex
>
> >       if (ftruncate(fd, size) < 0)
> >               die("Can't ftruncate for mem mapping size %lld\n",
> >                       (unsigned long long)size);
> > --
> > 2.38.1.431.g37b22c650d-goog
> >

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations
  2022-11-15 11:15 ` [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations Fuad Tabba
@ 2022-11-24 11:01   ` Alexandru Elisei
  2022-11-24 15:19     ` Fuad Tabba
  0 siblings, 1 reply; 36+ messages in thread
From: Alexandru Elisei @ 2022-11-24 11:01 UTC (permalink / raw)
  To: Fuad Tabba; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Tue, Nov 15, 2022 at 11:15:40AM +0000, Fuad Tabba wrote:
> Allocate all guest ram backed by memfd/ftruncate instead of
> anonymous mmap. This will make it easier to use kvm with fd-based
> kvm guest memory proposals [*]. It also would make it easier to
> use ipc memory sharing should that be needed in the future.

Today, there are two memory allocation paths:

- One using hugetlbfs when --hugetlbfs is specified on the command line, which
  creates a file.

- One using mmap, when --hugetlbfs is not specified.

How would support in kvmtool for the secret memfd look like? I assume there
would need to be some kind of command line parameter to kvmtool to instruct it
to use the secret memfd, right? What I'm trying to figure out is why is
necessary to make everything a memfd file instead of adding a third memory
allocation path just for that particular usecase (or merging the hugetlbfs path
if they are similar enough). Right now, the anonymous memory path is a
mmap(MAP_ANONYMOUS) call, simple and straightforward, and I would like some more
convincing that this change is needed.

Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
that? If more work is needed for it, then wouldn't it make more sense to do
all the changes at once? This change might look sensible right now, but it
might turn out that it was the wrong way to go about it when someone
actually starts implementing memory sharing.

Thanks,
Alex

> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> 
> [*] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> ---
>  include/kvm/kvm.h  |  1 +
>  include/kvm/util.h |  3 +++
>  kvm.c              |  4 ++++
>  util/util.c        | 33 ++++++++++++++++++++-------------
>  4 files changed, 28 insertions(+), 13 deletions(-)
> 
> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> index 3872dc6..d0d519b 100644
> --- a/include/kvm/kvm.h
> +++ b/include/kvm/kvm.h
> @@ -87,6 +87,7 @@ struct kvm {
>  	struct kvm_config	cfg;
>  	int			sys_fd;		/* For system ioctls(), i.e. /dev/kvm */
>  	int			vm_fd;		/* For VM ioctls() */
> +	int			ram_fd;		/* For guest memory. */
>  	timer_t			timerid;	/* Posix timer for interrupts */
>  
>  	int			nrcpus;		/* Number of cpus to run */
> diff --git a/include/kvm/util.h b/include/kvm/util.h
> index 61a205b..369603b 100644
> --- a/include/kvm/util.h
> +++ b/include/kvm/util.h
> @@ -140,6 +140,9 @@ static inline int pow2_size(unsigned long x)
>  }
>  
>  struct kvm;
> +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size);
> +void *mmap_anon_or_hugetlbfs_align(struct kvm *kvm, const char *htlbfs_path,
> +				   u64 size, u64 align);
>  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
>  
>  #endif /* KVM__UTIL_H */
> diff --git a/kvm.c b/kvm.c
> index 78bc0d8..ed29d68 100644
> --- a/kvm.c
> +++ b/kvm.c
> @@ -160,6 +160,7 @@ struct kvm *kvm__new(void)
>  	mutex_init(&kvm->mem_banks_lock);
>  	kvm->sys_fd = -1;
>  	kvm->vm_fd = -1;
> +	kvm->ram_fd = -1;
>  
>  #ifdef KVM_BRLOCK_DEBUG
>  	kvm->brlock_sem = (pthread_rwlock_t) PTHREAD_RWLOCK_INITIALIZER;
> @@ -174,6 +175,9 @@ int kvm__exit(struct kvm *kvm)
>  
>  	kvm__arch_delete_ram(kvm);
>  
> +	if (kvm->ram_fd >= 0)
> +		close(kvm->ram_fd);
> +
>  	list_for_each_entry_safe(bank, tmp, &kvm->mem_banks, list) {
>  		list_del(&bank->list);
>  		free(bank);
> diff --git a/util/util.c b/util/util.c
> index d3483d8..278bcc2 100644
> --- a/util/util.c
> +++ b/util/util.c
> @@ -102,36 +102,38 @@ static u64 get_hugepage_blk_size(const char *htlbfs_path)
>  	return sfs.f_bsize;
>  }
>  
> -static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size, u64 blk_size)
> +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size)
>  {
>  	const char *name = "kvmtool";
>  	unsigned int flags = 0;
>  	int fd;
> -	void *addr;
> -	int htsize = __builtin_ctzl(blk_size);
>  
> -	if ((1ULL << htsize) != blk_size)
> -		die("Hugepage size must be a power of 2.\n");
> +	if (hugetlb) {
> +		int htsize = __builtin_ctzl(blk_size);
>  
> -	flags |= MFD_HUGETLB;
> -	flags |= htsize << MFD_HUGE_SHIFT;
> +		if ((1ULL << htsize) != blk_size)
> +			die("Hugepage size must be a power of 2.\n");
> +
> +		flags |= MFD_HUGETLB;
> +		flags |= htsize << MFD_HUGE_SHIFT;
> +	}
>  
>  	fd = memfd_create(name, flags);
>  	if (fd < 0)
> -		die("Can't memfd_create for hugetlbfs map\n");
> +		die("Can't memfd_create for memory map\n");
> +
>  	if (ftruncate(fd, size) < 0)
>  		die("Can't ftruncate for mem mapping size %lld\n",
>  			(unsigned long long)size);
> -	addr = mmap(NULL, size, PROT_RW, MAP_PRIVATE, fd, 0);
> -	close(fd);
>  
> -	return addr;
> +	return fd;
>  }
>  
>  /* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
>  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
>  {
>  	u64 blk_size = 0;
> +	int fd;
>  
>  	/*
>  	 * We don't /need/ to map guest RAM from hugetlbfs, but we do so
> @@ -146,9 +148,14 @@ void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
>  		}
>  
>  		kvm->ram_pagesize = blk_size;
> -		return mmap_hugetlbfs(kvm, htlbfs_path, size, blk_size);
>  	} else {
>  		kvm->ram_pagesize = getpagesize();
> -		return mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
>  	}
> +
> +	fd = memfd_alloc(size, htlbfs_path, blk_size);
> +	if (fd < 0)
> +		return MAP_FAILED;
> +
> +	kvm->ram_fd = fd;
> +	return mmap(NULL, size, PROT_RW, MAP_PRIVATE, kvm->ram_fd, 0);
>  }
> -- 
> 2.38.1.431.g37b22c650d-goog
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations
  2022-11-24 11:01   ` Alexandru Elisei
@ 2022-11-24 15:19     ` Fuad Tabba
  2022-11-24 17:14       ` Alexandru Elisei
  0 siblings, 1 reply; 36+ messages in thread
From: Fuad Tabba @ 2022-11-24 15:19 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Thu, Nov 24, 2022 at 11:01 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi,
>
> On Tue, Nov 15, 2022 at 11:15:40AM +0000, Fuad Tabba wrote:
> > Allocate all guest ram backed by memfd/ftruncate instead of
> > anonymous mmap. This will make it easier to use kvm with fd-based
> > kvm guest memory proposals [*]. It also would make it easier to
> > use ipc memory sharing should that be needed in the future.
>
> Today, there are two memory allocation paths:
>
> - One using hugetlbfs when --hugetlbfs is specified on the command line, which
>   creates a file.
>
> - One using mmap, when --hugetlbfs is not specified.
>
> How would support in kvmtool for the secret memfd look like? I assume there
> would need to be some kind of command line parameter to kvmtool to instruct it
> to use the secret memfd, right? What I'm trying to figure out is why is
> necessary to make everything a memfd file instead of adding a third memory
> allocation path just for that particular usecase (or merging the hugetlbfs path
> if they are similar enough). Right now, the anonymous memory path is a
> mmap(MAP_ANONYMOUS) call, simple and straightforward, and I would like some more
> convincing that this change is needed.

This isn't about secret mem, but about the unified proposal for guest
private memory [1].  This proposal requires the use of a file
descriptor (fd) as the canonical reference to guest memory in the host
(i.e., VMM) and does not provide an alternative using a
straightforward anonymous mmap(). The idea is that guest memory
shouldn’t have mapping in the host by default, but unless explicitly
shared and needed.

Moreover, using an fd would be more generic and flexible, which allows
for other use cases (such as IPC), or to map that memory in userspace
when appropriate. It also allows us to use the same interface for
hugetlb. Considering that other VMMs (e.g., qemu [2], crosvm [3])
already back guest memory with memfd, and looking at how private
memory would work [4], it seemed to me that the best way to unify all
of these needs is to have the backend of guest memory be fd-based.

It would be possible to have that as a separate kvmtool option, where
fd-backed memory would be only for guests that use the new private
memory extensions. However, that would mean more code to maintain that
is essentially doing the same thing (allocating and mapping memory).

I thought that it would be worth having these patches in kvmtool now
rather than wait until the guest private memory has made it into kvm.
These patches simplify the code as an end result, make it easier to
allocate and map aligned memory without overallocating, and bring
kvmtool closer to a more consistent way of allocating guest memory, in
a similar manner to other VMMs.

Moreover, with the private memory proposal [1], whether the fd-based
support available can be queried by a KVM capability. If it's
available kvmtool would use the fd, if it's not available, it would
use the host-mapped address. Therefore, there isn’t a need for a
command line option, unless for testing.

I have implemented this all the way to support the private memory
proposal in kvmtool [5], but I haven’t posted these since the private
memory proposal itself is still in flux. If you’re interested you
could have a look on how I would go ahead building on these patches
for full support of private memory backed by an fd.

> Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> that? If more work is needed for it, then wouldn't it make more sense to do
> all the changes at once? This change might look sensible right now, but it
> might turn out that it was the wrong way to go about it when someone
> actually starts implementing memory sharing.

I don’t plan on supporting IPC memory sharing. I just mentioned that
as yet another use case that would benefit from guest memory being
fd-based, should kvmtool decide to support it in the future.

Cheers,
/fuad

[1] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
[2] https://github.com/qemu/qemu
[3] https://chromium.googlesource.com/chromiumos/platform/crosvm/
[4] https://github.com/chao-p/qemu/tree/privmem-v9
[5] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/fdmem-v9-core



>
> Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> that? If more work is needed for it, then wouldn't it make more sense to do
> all the changes at once? This change might look sensible right now, but it
> might turn out that it was the wrong way to go about it when someone
> actually starts implementing memory sharing.
>
> Thanks,
> Alex
>
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> >
> > [*] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> > ---
> >  include/kvm/kvm.h  |  1 +
> >  include/kvm/util.h |  3 +++
> >  kvm.c              |  4 ++++
> >  util/util.c        | 33 ++++++++++++++++++++-------------
> >  4 files changed, 28 insertions(+), 13 deletions(-)
> >
> > diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> > index 3872dc6..d0d519b 100644
> > --- a/include/kvm/kvm.h
> > +++ b/include/kvm/kvm.h
> > @@ -87,6 +87,7 @@ struct kvm {
> >       struct kvm_config       cfg;
> >       int                     sys_fd;         /* For system ioctls(), i.e. /dev/kvm */
> >       int                     vm_fd;          /* For VM ioctls() */
> > +     int                     ram_fd;         /* For guest memory. */
> >       timer_t                 timerid;        /* Posix timer for interrupts */
> >
> >       int                     nrcpus;         /* Number of cpus to run */
> > diff --git a/include/kvm/util.h b/include/kvm/util.h
> > index 61a205b..369603b 100644
> > --- a/include/kvm/util.h
> > +++ b/include/kvm/util.h
> > @@ -140,6 +140,9 @@ static inline int pow2_size(unsigned long x)
> >  }
> >
> >  struct kvm;
> > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size);
> > +void *mmap_anon_or_hugetlbfs_align(struct kvm *kvm, const char *htlbfs_path,
> > +                                u64 size, u64 align);
> >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
> >
> >  #endif /* KVM__UTIL_H */
> > diff --git a/kvm.c b/kvm.c
> > index 78bc0d8..ed29d68 100644
> > --- a/kvm.c
> > +++ b/kvm.c
> > @@ -160,6 +160,7 @@ struct kvm *kvm__new(void)
> >       mutex_init(&kvm->mem_banks_lock);
> >       kvm->sys_fd = -1;
> >       kvm->vm_fd = -1;
> > +     kvm->ram_fd = -1;
> >
> >  #ifdef KVM_BRLOCK_DEBUG
> >       kvm->brlock_sem = (pthread_rwlock_t) PTHREAD_RWLOCK_INITIALIZER;
> > @@ -174,6 +175,9 @@ int kvm__exit(struct kvm *kvm)
> >
> >       kvm__arch_delete_ram(kvm);
> >
> > +     if (kvm->ram_fd >= 0)
> > +             close(kvm->ram_fd);
> > +
> >       list_for_each_entry_safe(bank, tmp, &kvm->mem_banks, list) {
> >               list_del(&bank->list);
> >               free(bank);
> > diff --git a/util/util.c b/util/util.c
> > index d3483d8..278bcc2 100644
> > --- a/util/util.c
> > +++ b/util/util.c
> > @@ -102,36 +102,38 @@ static u64 get_hugepage_blk_size(const char *htlbfs_path)
> >       return sfs.f_bsize;
> >  }
> >
> > -static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size, u64 blk_size)
> > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size)
> >  {
> >       const char *name = "kvmtool";
> >       unsigned int flags = 0;
> >       int fd;
> > -     void *addr;
> > -     int htsize = __builtin_ctzl(blk_size);
> >
> > -     if ((1ULL << htsize) != blk_size)
> > -             die("Hugepage size must be a power of 2.\n");
> > +     if (hugetlb) {
> > +             int htsize = __builtin_ctzl(blk_size);
> >
> > -     flags |= MFD_HUGETLB;
> > -     flags |= htsize << MFD_HUGE_SHIFT;
> > +             if ((1ULL << htsize) != blk_size)
> > +                     die("Hugepage size must be a power of 2.\n");
> > +
> > +             flags |= MFD_HUGETLB;
> > +             flags |= htsize << MFD_HUGE_SHIFT;
> > +     }
> >
> >       fd = memfd_create(name, flags);
> >       if (fd < 0)
> > -             die("Can't memfd_create for hugetlbfs map\n");
> > +             die("Can't memfd_create for memory map\n");
> > +
> >       if (ftruncate(fd, size) < 0)
> >               die("Can't ftruncate for mem mapping size %lld\n",
> >                       (unsigned long long)size);
> > -     addr = mmap(NULL, size, PROT_RW, MAP_PRIVATE, fd, 0);
> > -     close(fd);
> >
> > -     return addr;
> > +     return fd;
> >  }
> >
> >  /* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
> >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> >  {
> >       u64 blk_size = 0;
> > +     int fd;
> >
> >       /*
> >        * We don't /need/ to map guest RAM from hugetlbfs, but we do so
> > @@ -146,9 +148,14 @@ void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> >               }
> >
> >               kvm->ram_pagesize = blk_size;
> > -             return mmap_hugetlbfs(kvm, htlbfs_path, size, blk_size);
> >       } else {
> >               kvm->ram_pagesize = getpagesize();
> > -             return mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
> >       }
> > +
> > +     fd = memfd_alloc(size, htlbfs_path, blk_size);
> > +     if (fd < 0)
> > +             return MAP_FAILED;
> > +
> > +     kvm->ram_fd = fd;
> > +     return mmap(NULL, size, PROT_RW, MAP_PRIVATE, kvm->ram_fd, 0);
> >  }
> > --
> > 2.38.1.431.g37b22c650d-goog
> >

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations
  2022-11-24 15:19     ` Fuad Tabba
@ 2022-11-24 17:14       ` Alexandru Elisei
  2022-11-25 10:43         ` Alexandru Elisei
  2022-11-25 10:44         ` Fuad Tabba
  0 siblings, 2 replies; 36+ messages in thread
From: Alexandru Elisei @ 2022-11-24 17:14 UTC (permalink / raw)
  To: Fuad Tabba; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Thu, Nov 24, 2022 at 03:19:34PM +0000, Fuad Tabba wrote:
> Hi,
> 
> On Thu, Nov 24, 2022 at 11:01 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Hi,
> >
> > On Tue, Nov 15, 2022 at 11:15:40AM +0000, Fuad Tabba wrote:
> > > Allocate all guest ram backed by memfd/ftruncate instead of
> > > anonymous mmap. This will make it easier to use kvm with fd-based
> > > kvm guest memory proposals [*]. It also would make it easier to
> > > use ipc memory sharing should that be needed in the future.
> >
> > Today, there are two memory allocation paths:
> >
> > - One using hugetlbfs when --hugetlbfs is specified on the command line, which
> >   creates a file.
> >
> > - One using mmap, when --hugetlbfs is not specified.
> >
> > How would support in kvmtool for the secret memfd look like? I assume there
> > would need to be some kind of command line parameter to kvmtool to instruct it
> > to use the secret memfd, right? What I'm trying to figure out is why is
> > necessary to make everything a memfd file instead of adding a third memory
> > allocation path just for that particular usecase (or merging the hugetlbfs path
> > if they are similar enough). Right now, the anonymous memory path is a
> > mmap(MAP_ANONYMOUS) call, simple and straightforward, and I would like some more
> > convincing that this change is needed.
> 
> This isn't about secret mem, but about the unified proposal for guest
> private memory [1].  This proposal requires the use of a file
> descriptor (fd) as the canonical reference to guest memory in the host
> (i.e., VMM) and does not provide an alternative using a
> straightforward anonymous mmap(). The idea is that guest memory
> shouldn’t have mapping in the host by default, but unless explicitly
> shared and needed.

I think you misunderstood me. I wasn't asking why kvmtool should get
support for guest private memory, I was asking why kvmtool should allocate
**all types of memory** using memfd. Your series allocates **all** memory
using memfd. I never said that kvmtool should or should not get support for
private memory.

> 
> Moreover, using an fd would be more generic and flexible, which allows
> for other use cases (such as IPC), or to map that memory in userspace
> when appropriate. It also allows us to use the same interface for
> hugetlb. Considering that other VMMs (e.g., qemu [2], crosvm [3])
> already back guest memory with memfd, and looking at how private
> memory would work [4], it seemed to me that the best way to unify all
> of these needs is to have the backend of guest memory be fd-based.
> 
> It would be possible to have that as a separate kvmtool option, where
> fd-backed memory would be only for guests that use the new private
> memory extensions. However, that would mean more code to maintain that
> is essentially doing the same thing (allocating and mapping memory).
> 
> I thought that it would be worth having these patches in kvmtool now
> rather than wait until the guest private memory has made it into kvm.
> These patches simplify the code as an end result, make it easier to

In the non-hugetlbfs case, before:

        kvm->arch.ram_alloc_size = kvm->ram_size + SZ_2M;
        kvm->arch.ram_alloc_start = mmap_anon_or_hugetlbfs(kvm, kvm->cfg.hugetlbfs_path, kvm->arch.ram_alloc_size);

	/*
	 * mmap_anon_or_hugetlbfs expands to:
	 * getpagesize()
	 * mmap()
	 */

        kvm->ram_start = (void *)ALIGN((unsigned long)kvm->arch.ram_alloc_start, SZ_2M);

After:
	/* mmap_anon_or_hugetlbfs: */
	getpagesize();
	mmap(NULL, total_map, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
	memfd_alloc(size, htlbfs_path, blk_size);

	/*
	 * memfd_alloc() expands to:
	 * memfd_create()
	 * ftruncate
	 */

	addr_align = (void *)ALIGN((u64)addr_map, align_sz);
	mmap(addr_align, size, PROT_RW, MAP_PRIVATE | MAP_FIXED, fd, 0);

I'm counting one extra mmap(), one memfd_create() and one ftruncate() that
this series adds (not to mention all the boiler plate code to check for
errors).

Let's use another metric, let's count the number of lines of code. Before:
9 lines of code, after: -3 lines removed from arm/kvm.c and 86 lines of
code for memfd_alloc() and mmap_anon_or_hugetlbfs_align().

I'm struggling to find a metric by which the resulting code is simpler, as
you suggest.


> allocate and map aligned memory without overallocating, and bring

If your goal is to remove the overallocting of memory, you can just munmap
the extra memory after alignment is performed. To do that you don't need to
allocate everything using a memfd.

> kvmtool closer to a more consistent way of allocating guest memory, in
> a similar manner to other VMMs.

I would really appreciate pointing me to where qemu allocates memory using
memfd when invoked with -m <size>. I was able to follow the hostmem-ram
backend allocation function until g_malloc0(), but I couldn't find the
implementation for that.

> 
> Moreover, with the private memory proposal [1], whether the fd-based
> support available can be queried by a KVM capability. If it's
> available kvmtool would use the fd, if it's not available, it would
> use the host-mapped address. Therefore, there isn’t a need for a
> command line option, unless for testing.

Why would anyone want to use private memory by default for a
non-confidential VM?

> 
> I have implemented this all the way to support the private memory
> proposal in kvmtool [5], but I haven’t posted these since the private
> memory proposal itself is still in flux. If you’re interested you

Are you saying that you are not really sure how the userspace API will end
up looking? If that's the case, wouldn't it make more sense to wait for the
API to stabilize and then send support for it as one nice series?

Thanks,
Alex

> could have a look on how I would go ahead building on these patches
> for full support of private memory backed by an fd.
> 
> > Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> > that? If more work is needed for it, then wouldn't it make more sense to do
> > all the changes at once? This change might look sensible right now, but it
> > might turn out that it was the wrong way to go about it when someone
> > actually starts implementing memory sharing.
> 
> I don’t plan on supporting IPC memory sharing. I just mentioned that
> as yet another use case that would benefit from guest memory being
> fd-based, should kvmtool decide to support it in the future.
> 
> Cheers,
> /fuad
> 
> [1] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> [2] https://github.com/qemu/qemu
> [3] https://chromium.googlesource.com/chromiumos/platform/crosvm/
> [4] https://github.com/chao-p/qemu/tree/privmem-v9
> [5] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/fdmem-v9-core
> 
> 
> 
> >
> > Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> > that? If more work is needed for it, then wouldn't it make more sense to do
> > all the changes at once? This change might look sensible right now, but it
> > might turn out that it was the wrong way to go about it when someone
> > actually starts implementing memory sharing.
> >
> > Thanks,
> > Alex
> >
> > >
> > > Signed-off-by: Fuad Tabba <tabba@google.com>
> > >
> > > [*] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> > > ---
> > >  include/kvm/kvm.h  |  1 +
> > >  include/kvm/util.h |  3 +++
> > >  kvm.c              |  4 ++++
> > >  util/util.c        | 33 ++++++++++++++++++++-------------
> > >  4 files changed, 28 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> > > index 3872dc6..d0d519b 100644
> > > --- a/include/kvm/kvm.h
> > > +++ b/include/kvm/kvm.h
> > > @@ -87,6 +87,7 @@ struct kvm {
> > >       struct kvm_config       cfg;
> > >       int                     sys_fd;         /* For system ioctls(), i.e. /dev/kvm */
> > >       int                     vm_fd;          /* For VM ioctls() */
> > > +     int                     ram_fd;         /* For guest memory. */
> > >       timer_t                 timerid;        /* Posix timer for interrupts */
> > >
> > >       int                     nrcpus;         /* Number of cpus to run */
> > > diff --git a/include/kvm/util.h b/include/kvm/util.h
> > > index 61a205b..369603b 100644
> > > --- a/include/kvm/util.h
> > > +++ b/include/kvm/util.h
> > > @@ -140,6 +140,9 @@ static inline int pow2_size(unsigned long x)
> > >  }
> > >
> > >  struct kvm;
> > > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size);
> > > +void *mmap_anon_or_hugetlbfs_align(struct kvm *kvm, const char *htlbfs_path,
> > > +                                u64 size, u64 align);
> > >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
> > >
> > >  #endif /* KVM__UTIL_H */
> > > diff --git a/kvm.c b/kvm.c
> > > index 78bc0d8..ed29d68 100644
> > > --- a/kvm.c
> > > +++ b/kvm.c
> > > @@ -160,6 +160,7 @@ struct kvm *kvm__new(void)
> > >       mutex_init(&kvm->mem_banks_lock);
> > >       kvm->sys_fd = -1;
> > >       kvm->vm_fd = -1;
> > > +     kvm->ram_fd = -1;
> > >
> > >  #ifdef KVM_BRLOCK_DEBUG
> > >       kvm->brlock_sem = (pthread_rwlock_t) PTHREAD_RWLOCK_INITIALIZER;
> > > @@ -174,6 +175,9 @@ int kvm__exit(struct kvm *kvm)
> > >
> > >       kvm__arch_delete_ram(kvm);
> > >
> > > +     if (kvm->ram_fd >= 0)
> > > +             close(kvm->ram_fd);
> > > +
> > >       list_for_each_entry_safe(bank, tmp, &kvm->mem_banks, list) {
> > >               list_del(&bank->list);
> > >               free(bank);
> > > diff --git a/util/util.c b/util/util.c
> > > index d3483d8..278bcc2 100644
> > > --- a/util/util.c
> > > +++ b/util/util.c
> > > @@ -102,36 +102,38 @@ static u64 get_hugepage_blk_size(const char *htlbfs_path)
> > >       return sfs.f_bsize;
> > >  }
> > >
> > > -static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size, u64 blk_size)
> > > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size)
> > >  {
> > >       const char *name = "kvmtool";
> > >       unsigned int flags = 0;
> > >       int fd;
> > > -     void *addr;
> > > -     int htsize = __builtin_ctzl(blk_size);
> > >
> > > -     if ((1ULL << htsize) != blk_size)
> > > -             die("Hugepage size must be a power of 2.\n");
> > > +     if (hugetlb) {
> > > +             int htsize = __builtin_ctzl(blk_size);
> > >
> > > -     flags |= MFD_HUGETLB;
> > > -     flags |= htsize << MFD_HUGE_SHIFT;
> > > +             if ((1ULL << htsize) != blk_size)
> > > +                     die("Hugepage size must be a power of 2.\n");
> > > +
> > > +             flags |= MFD_HUGETLB;
> > > +             flags |= htsize << MFD_HUGE_SHIFT;
> > > +     }
> > >
> > >       fd = memfd_create(name, flags);
> > >       if (fd < 0)
> > > -             die("Can't memfd_create for hugetlbfs map\n");
> > > +             die("Can't memfd_create for memory map\n");
> > > +
> > >       if (ftruncate(fd, size) < 0)
> > >               die("Can't ftruncate for mem mapping size %lld\n",
> > >                       (unsigned long long)size);
> > > -     addr = mmap(NULL, size, PROT_RW, MAP_PRIVATE, fd, 0);
> > > -     close(fd);
> > >
> > > -     return addr;
> > > +     return fd;
> > >  }
> > >
> > >  /* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
> > >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> > >  {
> > >       u64 blk_size = 0;
> > > +     int fd;
> > >
> > >       /*
> > >        * We don't /need/ to map guest RAM from hugetlbfs, but we do so
> > > @@ -146,9 +148,14 @@ void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> > >               }
> > >
> > >               kvm->ram_pagesize = blk_size;
> > > -             return mmap_hugetlbfs(kvm, htlbfs_path, size, blk_size);
> > >       } else {
> > >               kvm->ram_pagesize = getpagesize();
> > > -             return mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
> > >       }
> > > +
> > > +     fd = memfd_alloc(size, htlbfs_path, blk_size);
> > > +     if (fd < 0)
> > > +             return MAP_FAILED;
> > > +
> > > +     kvm->ram_fd = fd;
> > > +     return mmap(NULL, size, PROT_RW, MAP_PRIVATE, kvm->ram_fd, 0);
> > >  }
> > > --
> > > 2.38.1.431.g37b22c650d-goog
> > >

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations
  2022-11-24 17:14       ` Alexandru Elisei
@ 2022-11-25 10:43         ` Alexandru Elisei
  2022-11-25 10:58           ` Fuad Tabba
  2022-11-25 10:44         ` Fuad Tabba
  1 sibling, 1 reply; 36+ messages in thread
From: Alexandru Elisei @ 2022-11-25 10:43 UTC (permalink / raw)
  To: Fuad Tabba; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

Did some digging, correction(s) below.

On Thu, Nov 24, 2022 at 05:14:33PM +0000, Alexandru Elisei wrote:
> Hi,
> 
> On Thu, Nov 24, 2022 at 03:19:34PM +0000, Fuad Tabba wrote:
> > [..]
> > kvmtool closer to a more consistent way of allocating guest memory, in
> > a similar manner to other VMMs.
> 
> I would really appreciate pointing me to where qemu allocates memory using
> memfd when invoked with -m <size>. I was able to follow the hostmem-ram
> backend allocation function until g_malloc0(), but I couldn't find the
> implementation for that.

As far as I can tell, qemu allocates memory without backing storage (so by
specifying only -m on the command line) like this:

main -> qemu_init -> qmp_x_exit_preconfig -> qemu_init_board ->
create_default_memdev, which creates a TYPE_MEMORY_BACKEND_RAM object.

When creating the VM ram, the object's alloc function is called in:

create_default_memdev -> user_creatable_complete ->
host_memory_backend_complete) -> ram_backend_memory_alloc ->
memory_region_init_ram_flags_nomigrate -> qemu_ram_alloc ->
qemu_ram_alloc_internal -> ram_block_add -> qemu_anon_ram_alloc ->
qemu_ram_mmap(fd=-1,..) -> mmap_activate(..,fd=-1,..) ->
mmap(..,MAP_ANONYMOUS,fd=-1,..)

Unless I'm mistaken with the above (it was quite convoluted to unwrap all
of this), qemu doesn't allocate RAM for the VM using a backing file, unless
specifically requested by the user.

On the other hand. for crosvm:

main -> crosvm_main -> run_vm -> run_config (from src/scrovm/sys/unix.rs),
creates the memory layout in GuestMemory::new ->
MemoryMappingBuilder::new -> from_shared_memory -> offset -> build.

I couldn't find the implementation for MemoryMappingBuilder::build, if it's
anything like build_fixed, then indeed it looks like it uses the memfd
created with GuestMemory::create_shm and passed to
MemoryMappingBuild::from_shared_offset.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations
  2022-11-24 17:14       ` Alexandru Elisei
  2022-11-25 10:43         ` Alexandru Elisei
@ 2022-11-25 10:44         ` Fuad Tabba
  2022-11-25 11:31           ` Alexandru Elisei
  1 sibling, 1 reply; 36+ messages in thread
From: Fuad Tabba @ 2022-11-25 10:44 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Thu, Nov 24, 2022 at 5:14 PM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi,
>
> On Thu, Nov 24, 2022 at 03:19:34PM +0000, Fuad Tabba wrote:
> > Hi,
> >
> > On Thu, Nov 24, 2022 at 11:01 AM Alexandru Elisei
> > <alexandru.elisei@arm.com> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, Nov 15, 2022 at 11:15:40AM +0000, Fuad Tabba wrote:
> > > > Allocate all guest ram backed by memfd/ftruncate instead of
> > > > anonymous mmap. This will make it easier to use kvm with fd-based
> > > > kvm guest memory proposals [*]. It also would make it easier to
> > > > use ipc memory sharing should that be needed in the future.
> > >
> > > Today, there are two memory allocation paths:
> > >
> > > - One using hugetlbfs when --hugetlbfs is specified on the command line, which
> > >   creates a file.
> > >
> > > - One using mmap, when --hugetlbfs is not specified.
> > >
> > > How would support in kvmtool for the secret memfd look like? I assume there
> > > would need to be some kind of command line parameter to kvmtool to instruct it
> > > to use the secret memfd, right? What I'm trying to figure out is why is
> > > necessary to make everything a memfd file instead of adding a third memory
> > > allocation path just for that particular usecase (or merging the hugetlbfs path
> > > if they are similar enough). Right now, the anonymous memory path is a
> > > mmap(MAP_ANONYMOUS) call, simple and straightforward, and I would like some more
> > > convincing that this change is needed.
> >
> > This isn't about secret mem, but about the unified proposal for guest
> > private memory [1].  This proposal requires the use of a file
> > descriptor (fd) as the canonical reference to guest memory in the host
> > (i.e., VMM) and does not provide an alternative using a
> > straightforward anonymous mmap(). The idea is that guest memory
> > shouldn’t have mapping in the host by default, but unless explicitly
> > shared and needed.
>
> I think you misunderstood me. I wasn't asking why kvmtool should get
> support for guest private memory, I was asking why kvmtool should allocate
> **all types of memory** using memfd. Your series allocates **all** memory
> using memfd. I never said that kvmtool should or should not get support for
> private memory.

My reasoning for allocating all memory with memfd is that it's one
ring to rule them all :) By that I mean, with memfd, we can allocate
normal memory, hugetlb memory, in the future guest private memory, and
easily expand it to support things like IPC memory sharing in the
future.


> >
> > Moreover, using an fd would be more generic and flexible, which allows
> > for other use cases (such as IPC), or to map that memory in userspace
> > when appropriate. It also allows us to use the same interface for
> > hugetlb. Considering that other VMMs (e.g., qemu [2], crosvm [3])
> > already back guest memory with memfd, and looking at how private
> > memory would work [4], it seemed to me that the best way to unify all
> > of these needs is to have the backend of guest memory be fd-based.
> >
> > It would be possible to have that as a separate kvmtool option, where
> > fd-backed memory would be only for guests that use the new private
> > memory extensions. However, that would mean more code to maintain that
> > is essentially doing the same thing (allocating and mapping memory).
> >
> > I thought that it would be worth having these patches in kvmtool now
> > rather than wait until the guest private memory has made it into kvm.
> > These patches simplify the code as an end result, make it easier to
>
> In the non-hugetlbfs case, before:
>
>         kvm->arch.ram_alloc_size = kvm->ram_size + SZ_2M;
>         kvm->arch.ram_alloc_start = mmap_anon_or_hugetlbfs(kvm, kvm->cfg.hugetlbfs_path, kvm->arch.ram_alloc_size);
>
>         /*
>          * mmap_anon_or_hugetlbfs expands to:
>          * getpagesize()
>          * mmap()
>          */
>
>         kvm->ram_start = (void *)ALIGN((unsigned long)kvm->arch.ram_alloc_start, SZ_2M);
>
> After:
>         /* mmap_anon_or_hugetlbfs: */
>         getpagesize();
>         mmap(NULL, total_map, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>         memfd_alloc(size, htlbfs_path, blk_size);
>
>         /*
>          * memfd_alloc() expands to:
>          * memfd_create()
>          * ftruncate
>          */
>
>         addr_align = (void *)ALIGN((u64)addr_map, align_sz);
>         mmap(addr_align, size, PROT_RW, MAP_PRIVATE | MAP_FIXED, fd, 0);
>
> I'm counting one extra mmap(), one memfd_create() and one ftruncate() that
> this series adds (not to mention all the boiler plate code to check for
> errors).
>
> Let's use another metric, let's count the number of lines of code. Before:
> 9 lines of code, after: -3 lines removed from arm/kvm.c and 86 lines of
> code for memfd_alloc() and mmap_anon_or_hugetlbfs_align().
>
> I'm struggling to find a metric by which the resulting code is simpler, as
> you suggest.

With simpler I didn't mean fewer lines of code, rather that it's
easier to reason about, more shared code. With this series, hugetlb
and normal memory creation follow the same path, and with the
refactoring a lot of arch-specific code is gone.

>
> > allocate and map aligned memory without overallocating, and bring
>
> If your goal is to remove the overallocting of memory, you can just munmap
> the extra memory after alignment is performed. To do that you don't need to
> allocate everything using a memfd.
>
> > kvmtool closer to a more consistent way of allocating guest memory, in
> > a similar manner to other VMMs.
>
> I would really appreciate pointing me to where qemu allocates memory using
> memfd when invoked with -m <size>. I was able to follow the hostmem-ram
> backend allocation function until g_malloc0(), but I couldn't find the
> implementation for that.

You're right. I thought that the memfd backend was the default, but
looking again it's not.

> >
> > Moreover, with the private memory proposal [1], whether the fd-based
> > support available can be queried by a KVM capability. If it's
> > available kvmtool would use the fd, if it's not available, it would
> > use the host-mapped address. Therefore, there isn’t a need for a
> > command line option, unless for testing.
>
> Why would anyone want to use private memory by default for a
> non-confidential VM?

The idea is that, at least when pKVM is enabled, we would use the
proposed extensions for all guests, i.e., memory via a file
descriptor, regardless whether the guest is protected (thus the memory
would be private), or not.


> >
> > I have implemented this all the way to support the private memory
> > proposal in kvmtool [5], but I haven’t posted these since the private
> > memory proposal itself is still in flux. If you’re interested you
>
> Are you saying that you are not really sure how the userspace API will end
> up looking? If that's the case, wouldn't it make more sense to wait for the
> API to stabilize and then send support for it as one nice series?

Yes, I'm not sure how it will end up looking. We know that it will be
fd-based though, which is why I thought it might be good to start with
that.

Cheers,
/fuad



> Thanks,
> Alex
>
> > could have a look on how I would go ahead building on these patches
> > for full support of private memory backed by an fd.
> >
> > > Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> > > that? If more work is needed for it, then wouldn't it make more sense to do
> > > all the changes at once? This change might look sensible right now, but it
> > > might turn out that it was the wrong way to go about it when someone
> > > actually starts implementing memory sharing.
> >
> > I don’t plan on supporting IPC memory sharing. I just mentioned that
> > as yet another use case that would benefit from guest memory being
> > fd-based, should kvmtool decide to support it in the future.
> >
> > Cheers,
> > /fuad
> >
> > [1] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> > [2] https://github.com/qemu/qemu
> > [3] https://chromium.googlesource.com/chromiumos/platform/crosvm/
> > [4] https://github.com/chao-p/qemu/tree/privmem-v9
> > [5] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/fdmem-v9-core
> >
> >
> >
> > >
> > > Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> > > that? If more work is needed for it, then wouldn't it make more sense to do
> > > all the changes at once? This change might look sensible right now, but it
> > > might turn out that it was the wrong way to go about it when someone
> > > actually starts implementing memory sharing.
> > >
> > > Thanks,
> > > Alex
> > >
> > > >
> > > > Signed-off-by: Fuad Tabba <tabba@google.com>
> > > >
> > > > [*] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> > > > ---
> > > >  include/kvm/kvm.h  |  1 +
> > > >  include/kvm/util.h |  3 +++
> > > >  kvm.c              |  4 ++++
> > > >  util/util.c        | 33 ++++++++++++++++++++-------------
> > > >  4 files changed, 28 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> > > > index 3872dc6..d0d519b 100644
> > > > --- a/include/kvm/kvm.h
> > > > +++ b/include/kvm/kvm.h
> > > > @@ -87,6 +87,7 @@ struct kvm {
> > > >       struct kvm_config       cfg;
> > > >       int                     sys_fd;         /* For system ioctls(), i.e. /dev/kvm */
> > > >       int                     vm_fd;          /* For VM ioctls() */
> > > > +     int                     ram_fd;         /* For guest memory. */
> > > >       timer_t                 timerid;        /* Posix timer for interrupts */
> > > >
> > > >       int                     nrcpus;         /* Number of cpus to run */
> > > > diff --git a/include/kvm/util.h b/include/kvm/util.h
> > > > index 61a205b..369603b 100644
> > > > --- a/include/kvm/util.h
> > > > +++ b/include/kvm/util.h
> > > > @@ -140,6 +140,9 @@ static inline int pow2_size(unsigned long x)
> > > >  }
> > > >
> > > >  struct kvm;
> > > > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size);
> > > > +void *mmap_anon_or_hugetlbfs_align(struct kvm *kvm, const char *htlbfs_path,
> > > > +                                u64 size, u64 align);
> > > >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
> > > >
> > > >  #endif /* KVM__UTIL_H */
> > > > diff --git a/kvm.c b/kvm.c
> > > > index 78bc0d8..ed29d68 100644
> > > > --- a/kvm.c
> > > > +++ b/kvm.c
> > > > @@ -160,6 +160,7 @@ struct kvm *kvm__new(void)
> > > >       mutex_init(&kvm->mem_banks_lock);
> > > >       kvm->sys_fd = -1;
> > > >       kvm->vm_fd = -1;
> > > > +     kvm->ram_fd = -1;
> > > >
> > > >  #ifdef KVM_BRLOCK_DEBUG
> > > >       kvm->brlock_sem = (pthread_rwlock_t) PTHREAD_RWLOCK_INITIALIZER;
> > > > @@ -174,6 +175,9 @@ int kvm__exit(struct kvm *kvm)
> > > >
> > > >       kvm__arch_delete_ram(kvm);
> > > >
> > > > +     if (kvm->ram_fd >= 0)
> > > > +             close(kvm->ram_fd);
> > > > +
> > > >       list_for_each_entry_safe(bank, tmp, &kvm->mem_banks, list) {
> > > >               list_del(&bank->list);
> > > >               free(bank);
> > > > diff --git a/util/util.c b/util/util.c
> > > > index d3483d8..278bcc2 100644
> > > > --- a/util/util.c
> > > > +++ b/util/util.c
> > > > @@ -102,36 +102,38 @@ static u64 get_hugepage_blk_size(const char *htlbfs_path)
> > > >       return sfs.f_bsize;
> > > >  }
> > > >
> > > > -static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size, u64 blk_size)
> > > > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size)
> > > >  {
> > > >       const char *name = "kvmtool";
> > > >       unsigned int flags = 0;
> > > >       int fd;
> > > > -     void *addr;
> > > > -     int htsize = __builtin_ctzl(blk_size);
> > > >
> > > > -     if ((1ULL << htsize) != blk_size)
> > > > -             die("Hugepage size must be a power of 2.\n");
> > > > +     if (hugetlb) {
> > > > +             int htsize = __builtin_ctzl(blk_size);
> > > >
> > > > -     flags |= MFD_HUGETLB;
> > > > -     flags |= htsize << MFD_HUGE_SHIFT;
> > > > +             if ((1ULL << htsize) != blk_size)
> > > > +                     die("Hugepage size must be a power of 2.\n");
> > > > +
> > > > +             flags |= MFD_HUGETLB;
> > > > +             flags |= htsize << MFD_HUGE_SHIFT;
> > > > +     }
> > > >
> > > >       fd = memfd_create(name, flags);
> > > >       if (fd < 0)
> > > > -             die("Can't memfd_create for hugetlbfs map\n");
> > > > +             die("Can't memfd_create for memory map\n");
> > > > +
> > > >       if (ftruncate(fd, size) < 0)
> > > >               die("Can't ftruncate for mem mapping size %lld\n",
> > > >                       (unsigned long long)size);
> > > > -     addr = mmap(NULL, size, PROT_RW, MAP_PRIVATE, fd, 0);
> > > > -     close(fd);
> > > >
> > > > -     return addr;
> > > > +     return fd;
> > > >  }
> > > >
> > > >  /* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
> > > >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> > > >  {
> > > >       u64 blk_size = 0;
> > > > +     int fd;
> > > >
> > > >       /*
> > > >        * We don't /need/ to map guest RAM from hugetlbfs, but we do so
> > > > @@ -146,9 +148,14 @@ void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> > > >               }
> > > >
> > > >               kvm->ram_pagesize = blk_size;
> > > > -             return mmap_hugetlbfs(kvm, htlbfs_path, size, blk_size);
> > > >       } else {
> > > >               kvm->ram_pagesize = getpagesize();
> > > > -             return mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
> > > >       }
> > > > +
> > > > +     fd = memfd_alloc(size, htlbfs_path, blk_size);
> > > > +     if (fd < 0)
> > > > +             return MAP_FAILED;
> > > > +
> > > > +     kvm->ram_fd = fd;
> > > > +     return mmap(NULL, size, PROT_RW, MAP_PRIVATE, kvm->ram_fd, 0);
> > > >  }
> > > > --
> > > > 2.38.1.431.g37b22c650d-goog
> > > >

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations
  2022-11-25 10:43         ` Alexandru Elisei
@ 2022-11-25 10:58           ` Fuad Tabba
  0 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-25 10:58 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Fri, Nov 25, 2022 at 10:43 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi,
>
> Did some digging, correction(s) below.
>
> On Thu, Nov 24, 2022 at 05:14:33PM +0000, Alexandru Elisei wrote:
> > Hi,
> >
> > On Thu, Nov 24, 2022 at 03:19:34PM +0000, Fuad Tabba wrote:
> > > [..]
> > > kvmtool closer to a more consistent way of allocating guest memory, in
> > > a similar manner to other VMMs.
> >
> > I would really appreciate pointing me to where qemu allocates memory using
> > memfd when invoked with -m <size>. I was able to follow the hostmem-ram
> > backend allocation function until g_malloc0(), but I couldn't find the
> > implementation for that.
>
> As far as I can tell, qemu allocates memory without backing storage (so by
> specifying only -m on the command line) like this:
>
> main -> qemu_init -> qmp_x_exit_preconfig -> qemu_init_board ->
> create_default_memdev, which creates a TYPE_MEMORY_BACKEND_RAM object.
>
> When creating the VM ram, the object's alloc function is called in:
>
> create_default_memdev -> user_creatable_complete ->
> host_memory_backend_complete) -> ram_backend_memory_alloc ->
> memory_region_init_ram_flags_nomigrate -> qemu_ram_alloc ->
> qemu_ram_alloc_internal -> ram_block_add -> qemu_anon_ram_alloc ->
> qemu_ram_mmap(fd=-1,..) -> mmap_activate(..,fd=-1,..) ->
> mmap(..,MAP_ANONYMOUS,fd=-1,..)
>
> Unless I'm mistaken with the above (it was quite convoluted to unwrap all
> of this), qemu doesn't allocate RAM for the VM using a backing file, unless
> specifically requested by the user.

Thanks and sorry that you had to do some digging because of my mistake
(thinking that the memfd backend was the default one).

Cheers,
/fuad

> On the other hand. for crosvm:
>
> main -> crosvm_main -> run_vm -> run_config (from src/scrovm/sys/unix.rs),
> creates the memory layout in GuestMemory::new ->
> MemoryMappingBuilder::new -> from_shared_memory -> offset -> build.
>
> I couldn't find the implementation for MemoryMappingBuilder::build, if it's
> anything like build_fixed, then indeed it looks like it uses the memfd
> created with GuestMemory::create_shm and passed to
> MemoryMappingBuild::from_shared_offset.
>
> Thanks,
> Alex

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations
  2022-11-25 10:44         ` Fuad Tabba
@ 2022-11-25 11:31           ` Alexandru Elisei
  2022-11-28  8:49             ` Fuad Tabba
  0 siblings, 1 reply; 36+ messages in thread
From: Alexandru Elisei @ 2022-11-25 11:31 UTC (permalink / raw)
  To: Fuad Tabba; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Fri, Nov 25, 2022 at 10:44:39AM +0000, Fuad Tabba wrote:
> Hi,
> 
> On Thu, Nov 24, 2022 at 5:14 PM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Hi,
> >
> > On Thu, Nov 24, 2022 at 03:19:34PM +0000, Fuad Tabba wrote:
> > > Hi,
> > >
> > > On Thu, Nov 24, 2022 at 11:01 AM Alexandru Elisei
> > > <alexandru.elisei@arm.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Tue, Nov 15, 2022 at 11:15:40AM +0000, Fuad Tabba wrote:
> > > > > Allocate all guest ram backed by memfd/ftruncate instead of
> > > > > anonymous mmap. This will make it easier to use kvm with fd-based
> > > > > kvm guest memory proposals [*]. It also would make it easier to
> > > > > use ipc memory sharing should that be needed in the future.
> > > >
> > > > Today, there are two memory allocation paths:
> > > >
> > > > - One using hugetlbfs when --hugetlbfs is specified on the command line, which
> > > >   creates a file.
> > > >
> > > > - One using mmap, when --hugetlbfs is not specified.
> > > >
> > > > How would support in kvmtool for the secret memfd look like? I assume there
> > > > would need to be some kind of command line parameter to kvmtool to instruct it
> > > > to use the secret memfd, right? What I'm trying to figure out is why is
> > > > necessary to make everything a memfd file instead of adding a third memory
> > > > allocation path just for that particular usecase (or merging the hugetlbfs path
> > > > if they are similar enough). Right now, the anonymous memory path is a
> > > > mmap(MAP_ANONYMOUS) call, simple and straightforward, and I would like some more
> > > > convincing that this change is needed.
> > >
> > > This isn't about secret mem, but about the unified proposal for guest
> > > private memory [1].  This proposal requires the use of a file
> > > descriptor (fd) as the canonical reference to guest memory in the host
> > > (i.e., VMM) and does not provide an alternative using a
> > > straightforward anonymous mmap(). The idea is that guest memory
> > > shouldn’t have mapping in the host by default, but unless explicitly
> > > shared and needed.
> >
> > I think you misunderstood me. I wasn't asking why kvmtool should get
> > support for guest private memory, I was asking why kvmtool should allocate
> > **all types of memory** using memfd. Your series allocates **all** memory
> > using memfd. I never said that kvmtool should or should not get support for
> > private memory.
> 
> My reasoning for allocating all memory with memfd is that it's one
> ring to rule them all :) By that I mean, with memfd, we can allocate
> normal memory, hugetlb memory, in the future guest private memory, and
> easily expand it to support things like IPC memory sharing in the
> future.

Allocating anonymous memory is more complex now. And I could argue than the
hugetlbfs case is also more complex because there are now two branches that
do different things based whether it's hugetlbfs or not, instead of one.

As I stands right now, my opinion is that using memfd for anonymous RAM
only adds complexity for zero benefits.

> 
> 
> > >
> > > Moreover, using an fd would be more generic and flexible, which allows
> > > for other use cases (such as IPC), or to map that memory in userspace
> > > when appropriate. It also allows us to use the same interface for
> > > hugetlb. Considering that other VMMs (e.g., qemu [2], crosvm [3])
> > > already back guest memory with memfd, and looking at how private
> > > memory would work [4], it seemed to me that the best way to unify all
> > > of these needs is to have the backend of guest memory be fd-based.
> > >
> > > It would be possible to have that as a separate kvmtool option, where
> > > fd-backed memory would be only for guests that use the new private
> > > memory extensions. However, that would mean more code to maintain that
> > > is essentially doing the same thing (allocating and mapping memory).
> > >
> > > I thought that it would be worth having these patches in kvmtool now
> > > rather than wait until the guest private memory has made it into kvm.
> > > These patches simplify the code as an end result, make it easier to
> >
> > In the non-hugetlbfs case, before:
> >
> >         kvm->arch.ram_alloc_size = kvm->ram_size + SZ_2M;
> >         kvm->arch.ram_alloc_start = mmap_anon_or_hugetlbfs(kvm, kvm->cfg.hugetlbfs_path, kvm->arch.ram_alloc_size);
> >
> >         /*
> >          * mmap_anon_or_hugetlbfs expands to:
> >          * getpagesize()
> >          * mmap()
> >          */
> >
> >         kvm->ram_start = (void *)ALIGN((unsigned long)kvm->arch.ram_alloc_start, SZ_2M);
> >
> > After:
> >         /* mmap_anon_or_hugetlbfs: */
> >         getpagesize();
> >         mmap(NULL, total_map, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> >         memfd_alloc(size, htlbfs_path, blk_size);
> >
> >         /*
> >          * memfd_alloc() expands to:
> >          * memfd_create()
> >          * ftruncate
> >          */
> >
> >         addr_align = (void *)ALIGN((u64)addr_map, align_sz);
> >         mmap(addr_align, size, PROT_RW, MAP_PRIVATE | MAP_FIXED, fd, 0);
> >
> > I'm counting one extra mmap(), one memfd_create() and one ftruncate() that
> > this series adds (not to mention all the boiler plate code to check for
> > errors).
> >
> > Let's use another metric, let's count the number of lines of code. Before:
> > 9 lines of code, after: -3 lines removed from arm/kvm.c and 86 lines of
> > code for memfd_alloc() and mmap_anon_or_hugetlbfs_align().
> >
> > I'm struggling to find a metric by which the resulting code is simpler, as
> > you suggest.
> 
> With simpler I didn't mean fewer lines of code, rather that it's
> easier to reason about, more shared code. With this series, hugetlb

How is all of the code that has been added easier to reason about than one
single mmap call?

> and normal memory creation follow the same path, and with the
> refactoring a lot of arch-specific code is gone.

Can you point me to the arch-specific code that this series removes? As far
as I can tell, the only arch specfic change is replacing
kvm_arch_delete_ram with kvm_delete_ram, which can be done independently of
this series. If it's only that function, I wouldn't call that "a lot" of
arch-specific code.

> 
> >
> > > allocate and map aligned memory without overallocating, and bring
> >
> > If your goal is to remove the overallocting of memory, you can just munmap
> > the extra memory after alignment is performed. To do that you don't need to
> > allocate everything using a memfd.
> >
> > > kvmtool closer to a more consistent way of allocating guest memory, in
> > > a similar manner to other VMMs.
> >
> > I would really appreciate pointing me to where qemu allocates memory using
> > memfd when invoked with -m <size>. I was able to follow the hostmem-ram
> > backend allocation function until g_malloc0(), but I couldn't find the
> > implementation for that.
> 
> You're right. I thought that the memfd backend was the default, but
> looking again it's not.
> 
> > >
> > > Moreover, with the private memory proposal [1], whether the fd-based
> > > support available can be queried by a KVM capability. If it's
> > > available kvmtool would use the fd, if it's not available, it would
> > > use the host-mapped address. Therefore, there isn’t a need for a
> > > command line option, unless for testing.
> >
> > Why would anyone want to use private memory by default for a
> > non-confidential VM?
> 
> The idea is that, at least when pKVM is enabled, we would use the
> proposed extensions for all guests, i.e., memory via a file
> descriptor, regardless whether the guest is protected (thus the memory
> would be private), or not.

kvmtool can be used to run virtual machines when pKVM is not enabled. In
fact, it has been used that way for way longer than pKVM has existed. What
about those users?

> 
> 
> > >
> > > I have implemented this all the way to support the private memory
> > > proposal in kvmtool [5], but I haven’t posted these since the private
> > > memory proposal itself is still in flux. If you’re interested you
> >
> > Are you saying that you are not really sure how the userspace API will end
> > up looking? If that's the case, wouldn't it make more sense to wait for the
> > API to stabilize and then send support for it as one nice series?
> 
> Yes, I'm not sure how it will end up looking. We know that it will be
> fd-based though, which is why I thought it might be good to start with
> that.

If you're not sure how it will end up looking, then why change kvmtool now?

Thanks,
Alex

> 
> Cheers,
> /fuad
> 
> 
> 
> > Thanks,
> > Alex
> >
> > > could have a look on how I would go ahead building on these patches
> > > for full support of private memory backed by an fd.
> > >
> > > > Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> > > > that? If more work is needed for it, then wouldn't it make more sense to do
> > > > all the changes at once? This change might look sensible right now, but it
> > > > might turn out that it was the wrong way to go about it when someone
> > > > actually starts implementing memory sharing.
> > >
> > > I don’t plan on supporting IPC memory sharing. I just mentioned that
> > > as yet another use case that would benefit from guest memory being
> > > fd-based, should kvmtool decide to support it in the future.
> > >
> > > Cheers,
> > > /fuad
> > >
> > > [1] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> > > [2] https://github.com/qemu/qemu
> > > [3] https://chromium.googlesource.com/chromiumos/platform/crosvm/
> > > [4] https://github.com/chao-p/qemu/tree/privmem-v9
> > > [5] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/fdmem-v9-core
> > >
> > >
> > >
> > > >
> > > > Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> > > > that? If more work is needed for it, then wouldn't it make more sense to do
> > > > all the changes at once? This change might look sensible right now, but it
> > > > might turn out that it was the wrong way to go about it when someone
> > > > actually starts implementing memory sharing.
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > >
> > > > > Signed-off-by: Fuad Tabba <tabba@google.com>
> > > > >
> > > > > [*] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> > > > > ---
> > > > >  include/kvm/kvm.h  |  1 +
> > > > >  include/kvm/util.h |  3 +++
> > > > >  kvm.c              |  4 ++++
> > > > >  util/util.c        | 33 ++++++++++++++++++++-------------
> > > > >  4 files changed, 28 insertions(+), 13 deletions(-)
> > > > >
> > > > > diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> > > > > index 3872dc6..d0d519b 100644
> > > > > --- a/include/kvm/kvm.h
> > > > > +++ b/include/kvm/kvm.h
> > > > > @@ -87,6 +87,7 @@ struct kvm {
> > > > >       struct kvm_config       cfg;
> > > > >       int                     sys_fd;         /* For system ioctls(), i.e. /dev/kvm */
> > > > >       int                     vm_fd;          /* For VM ioctls() */
> > > > > +     int                     ram_fd;         /* For guest memory. */
> > > > >       timer_t                 timerid;        /* Posix timer for interrupts */
> > > > >
> > > > >       int                     nrcpus;         /* Number of cpus to run */
> > > > > diff --git a/include/kvm/util.h b/include/kvm/util.h
> > > > > index 61a205b..369603b 100644
> > > > > --- a/include/kvm/util.h
> > > > > +++ b/include/kvm/util.h
> > > > > @@ -140,6 +140,9 @@ static inline int pow2_size(unsigned long x)
> > > > >  }
> > > > >
> > > > >  struct kvm;
> > > > > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size);
> > > > > +void *mmap_anon_or_hugetlbfs_align(struct kvm *kvm, const char *htlbfs_path,
> > > > > +                                u64 size, u64 align);
> > > > >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
> > > > >
> > > > >  #endif /* KVM__UTIL_H */
> > > > > diff --git a/kvm.c b/kvm.c
> > > > > index 78bc0d8..ed29d68 100644
> > > > > --- a/kvm.c
> > > > > +++ b/kvm.c
> > > > > @@ -160,6 +160,7 @@ struct kvm *kvm__new(void)
> > > > >       mutex_init(&kvm->mem_banks_lock);
> > > > >       kvm->sys_fd = -1;
> > > > >       kvm->vm_fd = -1;
> > > > > +     kvm->ram_fd = -1;
> > > > >
> > > > >  #ifdef KVM_BRLOCK_DEBUG
> > > > >       kvm->brlock_sem = (pthread_rwlock_t) PTHREAD_RWLOCK_INITIALIZER;
> > > > > @@ -174,6 +175,9 @@ int kvm__exit(struct kvm *kvm)
> > > > >
> > > > >       kvm__arch_delete_ram(kvm);
> > > > >
> > > > > +     if (kvm->ram_fd >= 0)
> > > > > +             close(kvm->ram_fd);
> > > > > +
> > > > >       list_for_each_entry_safe(bank, tmp, &kvm->mem_banks, list) {
> > > > >               list_del(&bank->list);
> > > > >               free(bank);
> > > > > diff --git a/util/util.c b/util/util.c
> > > > > index d3483d8..278bcc2 100644
> > > > > --- a/util/util.c
> > > > > +++ b/util/util.c
> > > > > @@ -102,36 +102,38 @@ static u64 get_hugepage_blk_size(const char *htlbfs_path)
> > > > >       return sfs.f_bsize;
> > > > >  }
> > > > >
> > > > > -static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size, u64 blk_size)
> > > > > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size)
> > > > >  {
> > > > >       const char *name = "kvmtool";
> > > > >       unsigned int flags = 0;
> > > > >       int fd;
> > > > > -     void *addr;
> > > > > -     int htsize = __builtin_ctzl(blk_size);
> > > > >
> > > > > -     if ((1ULL << htsize) != blk_size)
> > > > > -             die("Hugepage size must be a power of 2.\n");
> > > > > +     if (hugetlb) {
> > > > > +             int htsize = __builtin_ctzl(blk_size);
> > > > >
> > > > > -     flags |= MFD_HUGETLB;
> > > > > -     flags |= htsize << MFD_HUGE_SHIFT;
> > > > > +             if ((1ULL << htsize) != blk_size)
> > > > > +                     die("Hugepage size must be a power of 2.\n");
> > > > > +
> > > > > +             flags |= MFD_HUGETLB;
> > > > > +             flags |= htsize << MFD_HUGE_SHIFT;
> > > > > +     }
> > > > >
> > > > >       fd = memfd_create(name, flags);
> > > > >       if (fd < 0)
> > > > > -             die("Can't memfd_create for hugetlbfs map\n");
> > > > > +             die("Can't memfd_create for memory map\n");
> > > > > +
> > > > >       if (ftruncate(fd, size) < 0)
> > > > >               die("Can't ftruncate for mem mapping size %lld\n",
> > > > >                       (unsigned long long)size);
> > > > > -     addr = mmap(NULL, size, PROT_RW, MAP_PRIVATE, fd, 0);
> > > > > -     close(fd);
> > > > >
> > > > > -     return addr;
> > > > > +     return fd;
> > > > >  }
> > > > >
> > > > >  /* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
> > > > >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> > > > >  {
> > > > >       u64 blk_size = 0;
> > > > > +     int fd;
> > > > >
> > > > >       /*
> > > > >        * We don't /need/ to map guest RAM from hugetlbfs, but we do so
> > > > > @@ -146,9 +148,14 @@ void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> > > > >               }
> > > > >
> > > > >               kvm->ram_pagesize = blk_size;
> > > > > -             return mmap_hugetlbfs(kvm, htlbfs_path, size, blk_size);
> > > > >       } else {
> > > > >               kvm->ram_pagesize = getpagesize();
> > > > > -             return mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
> > > > >       }
> > > > > +
> > > > > +     fd = memfd_alloc(size, htlbfs_path, blk_size);
> > > > > +     if (fd < 0)
> > > > > +             return MAP_FAILED;
> > > > > +
> > > > > +     kvm->ram_fd = fd;
> > > > > +     return mmap(NULL, size, PROT_RW, MAP_PRIVATE, kvm->ram_fd, 0);
> > > > >  }
> > > > > --
> > > > > 2.38.1.431.g37b22c650d-goog
> > > > >

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations
  2022-11-25 11:31           ` Alexandru Elisei
@ 2022-11-28  8:49             ` Fuad Tabba
  2022-11-29 18:09               ` Alexandru Elisei
  0 siblings, 1 reply; 36+ messages in thread
From: Fuad Tabba @ 2022-11-28  8:49 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

First I want to mention that I really appreciate your feedback, which
has already been quite helpful. I would like you to please consider
this to be an RFC, and let's use these patches as a basis for
discussion and how they can be improved when I respin them, even if
that means waiting until the kvm fd-based proposal is finalized.

Now to answer your question...

<snip>

> > My reasoning for allocating all memory with memfd is that it's one
> > ring to rule them all :) By that I mean, with memfd, we can allocate
> > normal memory, hugetlb memory, in the future guest private memory, and
> > easily expand it to support things like IPC memory sharing in the
> > future.
>
> Allocating anonymous memory is more complex now. And I could argue than the
> hugetlbfs case is also more complex because there are now two branches that
> do different things based whether it's hugetlbfs or not, instead of one.

The additional complexity now comes not from using memfd, but from
unmapping and aligning code, which I think does benefit kvmtool in
general.

Hugetlbfd already had a different path before, now at least the other
path it has just has to do with setting flags for memfd_create(),
rather than allocating memory differently.


> As I stands right now, my opinion is that using memfd for anonymous RAM
> only adds complexity for zero benefits.
> >
> >
> > > >
> > > > Moreover, using an fd would be more generic and flexible, which allows
> > > > for other use cases (such as IPC), or to map that memory in userspace
> > > > when appropriate. It also allows us to use the same interface for
> > > > hugetlb. Considering that other VMMs (e.g., qemu [2], crosvm [3])
> > > > already back guest memory with memfd, and looking at how private
> > > > memory would work [4], it seemed to me that the best way to unify all
> > > > of these needs is to have the backend of guest memory be fd-based.
> > > >
> > > > It would be possible to have that as a separate kvmtool option, where
> > > > fd-backed memory would be only for guests that use the new private
> > > > memory extensions. However, that would mean more code to maintain that
> > > > is essentially doing the same thing (allocating and mapping memory).
> > > >
> > > > I thought that it would be worth having these patches in kvmtool now
> > > > rather than wait until the guest private memory has made it into kvm.
> > > > These patches simplify the code as an end result, make it easier to
> > >
> > > In the non-hugetlbfs case, before:
> > >
> > >         kvm->arch.ram_alloc_size = kvm->ram_size + SZ_2M;
> > >         kvm->arch.ram_alloc_start = mmap_anon_or_hugetlbfs(kvm, kvm->cfg.hugetlbfs_path, kvm->arch.ram_alloc_size);
> > >
> > >         /*
> > >          * mmap_anon_or_hugetlbfs expands to:
> > >          * getpagesize()
> > >          * mmap()
> > >          */
> > >
> > >         kvm->ram_start = (void *)ALIGN((unsigned long)kvm->arch.ram_alloc_start, SZ_2M);
> > >
> > > After:
> > >         /* mmap_anon_or_hugetlbfs: */
> > >         getpagesize();
> > >         mmap(NULL, total_map, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > >         memfd_alloc(size, htlbfs_path, blk_size);
> > >
> > >         /*
> > >          * memfd_alloc() expands to:
> > >          * memfd_create()
> > >          * ftruncate
> > >          */
> > >
> > >         addr_align = (void *)ALIGN((u64)addr_map, align_sz);
> > >         mmap(addr_align, size, PROT_RW, MAP_PRIVATE | MAP_FIXED, fd, 0);
> > >
> > > I'm counting one extra mmap(), one memfd_create() and one ftruncate() that
> > > this series adds (not to mention all the boiler plate code to check for
> > > errors).
> > >
> > > Let's use another metric, let's count the number of lines of code. Before:
> > > 9 lines of code, after: -3 lines removed from arm/kvm.c and 86 lines of
> > > code for memfd_alloc() and mmap_anon_or_hugetlbfs_align().
> > >
> > > I'm struggling to find a metric by which the resulting code is simpler, as
> > > you suggest.
> >
> > With simpler I didn't mean fewer lines of code, rather that it's
> > easier to reason about, more shared code. With this series, hugetlb
>
> How is all of the code that has been added easier to reason about than one
> single mmap call?
>
> > and normal memory creation follow the same path, and with the
> > refactoring a lot of arch-specific code is gone.
>
> Can you point me to the arch-specific code that this series removes? As far
> as I can tell, the only arch specfic change is replacing
> kvm_arch_delete_ram with kvm_delete_ram, which can be done independently of
> this series. If it's only that function, I wouldn't call that "a lot" of
> arch-specific code.

kvmtool is an old and well established project. So I think that being
able to remove the memory-alignment code from the arm and riscv kvm.c,
two fields from the arm and riscv struct kvm_arch, as well as
kvm__arch_delete_ram from all architectures, is not that little for a
mature project such as this one. I agree that this could have been
done without using memfd, but at least for me, as a person who has
just posted their first contribution to kvmtool, it was easier to make
these changes when the tracking of the memory is based on an fd rather
than a userspace address (makes alignment and unmapping unused memory
much easier).

>
> >
> > >
> > > > allocate and map aligned memory without overallocating, and bring
> > >
> > > If your goal is to remove the overallocting of memory, you can just munmap
> > > the extra memory after alignment is performed. To do that you don't need to
> > > allocate everything using a memfd.
> > >
> > > > kvmtool closer to a more consistent way of allocating guest memory, in
> > > > a similar manner to other VMMs.
> > >
> > > I would really appreciate pointing me to where qemu allocates memory using
> > > memfd when invoked with -m <size>. I was able to follow the hostmem-ram
> > > backend allocation function until g_malloc0(), but I couldn't find the
> > > implementation for that.
> >
> > You're right. I thought that the memfd backend was the default, but
> > looking again it's not.
> >
> > > >
> > > > Moreover, with the private memory proposal [1], whether the fd-based
> > > > support available can be queried by a KVM capability. If it's
> > > > available kvmtool would use the fd, if it's not available, it would
> > > > use the host-mapped address. Therefore, there isn’t a need for a
> > > > command line option, unless for testing.
> > >
> > > Why would anyone want to use private memory by default for a
> > > non-confidential VM?
> >
> > The idea is that, at least when pKVM is enabled, we would use the
> > proposed extensions for all guests, i.e., memory via a file
> > descriptor, regardless whether the guest is protected (thus the memory
> > would be private), or not.
>
> kvmtool can be used to run virtual machines when pKVM is not enabled. In
> fact, it has been used that way for way longer than pKVM has existed. What
> about those users?

This does not affect these users, which is the point of these patches.
This allows new uses as well as maintaining the existing one, and
enables potentially new ones in the future.

> >
> >
> > > >
> > > > I have implemented this all the way to support the private memory
> > > > proposal in kvmtool [5], but I haven’t posted these since the private
> > > > memory proposal itself is still in flux. If you’re interested you
> > >
> > > Are you saying that you are not really sure how the userspace API will end
> > > up looking? If that's the case, wouldn't it make more sense to wait for the
> > > API to stabilize and then send support for it as one nice series?
> >
> > Yes, I'm not sure how it will end up looking. We know that it will be
> > fd-based though, which is why I thought it might be good to start with
> > that.
>
> If you're not sure how it will end up looking, then why change kvmtool now?

Because we are sure that it will be fd-based, and because I thought
that getting a head start to set the scene would be helpful. The part
that is uncertain is the kvm capabilities, flags, and names of the new
memory region extensions, none of which I address in these patches.

Cheers,
/fuad

> Thanks,
> Alex
>
> >
> > Cheers,
> > /fuad
> >
> >
> >
> > > Thanks,
> > > Alex
> > >
> > > > could have a look on how I would go ahead building on these patches
> > > > for full support of private memory backed by an fd.
> > > >
> > > > > Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> > > > > that? If more work is needed for it, then wouldn't it make more sense to do
> > > > > all the changes at once? This change might look sensible right now, but it
> > > > > might turn out that it was the wrong way to go about it when someone
> > > > > actually starts implementing memory sharing.
> > > >
> > > > I don’t plan on supporting IPC memory sharing. I just mentioned that
> > > > as yet another use case that would benefit from guest memory being
> > > > fd-based, should kvmtool decide to support it in the future.
> > > >
> > > > Cheers,
> > > > /fuad
> > > >
> > > > [1] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> > > > [2] https://github.com/qemu/qemu
> > > > [3] https://chromium.googlesource.com/chromiumos/platform/crosvm/
> > > > [4] https://github.com/chao-p/qemu/tree/privmem-v9
> > > > [5] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/fdmem-v9-core
> > > >
> > > >
> > > >
> > > > >
> > > > > Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> > > > > that? If more work is needed for it, then wouldn't it make more sense to do
> > > > > all the changes at once? This change might look sensible right now, but it
> > > > > might turn out that it was the wrong way to go about it when someone
> > > > > actually starts implementing memory sharing.
> > > > >
> > > > > Thanks,
> > > > > Alex
> > > > >
> > > > > >
> > > > > > Signed-off-by: Fuad Tabba <tabba@google.com>
> > > > > >
> > > > > > [*] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> > > > > > ---
> > > > > >  include/kvm/kvm.h  |  1 +
> > > > > >  include/kvm/util.h |  3 +++
> > > > > >  kvm.c              |  4 ++++
> > > > > >  util/util.c        | 33 ++++++++++++++++++++-------------
> > > > > >  4 files changed, 28 insertions(+), 13 deletions(-)
> > > > > >
> > > > > > diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> > > > > > index 3872dc6..d0d519b 100644
> > > > > > --- a/include/kvm/kvm.h
> > > > > > +++ b/include/kvm/kvm.h
> > > > > > @@ -87,6 +87,7 @@ struct kvm {
> > > > > >       struct kvm_config       cfg;
> > > > > >       int                     sys_fd;         /* For system ioctls(), i.e. /dev/kvm */
> > > > > >       int                     vm_fd;          /* For VM ioctls() */
> > > > > > +     int                     ram_fd;         /* For guest memory. */
> > > > > >       timer_t                 timerid;        /* Posix timer for interrupts */
> > > > > >
> > > > > >       int                     nrcpus;         /* Number of cpus to run */
> > > > > > diff --git a/include/kvm/util.h b/include/kvm/util.h
> > > > > > index 61a205b..369603b 100644
> > > > > > --- a/include/kvm/util.h
> > > > > > +++ b/include/kvm/util.h
> > > > > > @@ -140,6 +140,9 @@ static inline int pow2_size(unsigned long x)
> > > > > >  }
> > > > > >
> > > > > >  struct kvm;
> > > > > > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size);
> > > > > > +void *mmap_anon_or_hugetlbfs_align(struct kvm *kvm, const char *htlbfs_path,
> > > > > > +                                u64 size, u64 align);
> > > > > >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
> > > > > >
> > > > > >  #endif /* KVM__UTIL_H */
> > > > > > diff --git a/kvm.c b/kvm.c
> > > > > > index 78bc0d8..ed29d68 100644
> > > > > > --- a/kvm.c
> > > > > > +++ b/kvm.c
> > > > > > @@ -160,6 +160,7 @@ struct kvm *kvm__new(void)
> > > > > >       mutex_init(&kvm->mem_banks_lock);
> > > > > >       kvm->sys_fd = -1;
> > > > > >       kvm->vm_fd = -1;
> > > > > > +     kvm->ram_fd = -1;
> > > > > >
> > > > > >  #ifdef KVM_BRLOCK_DEBUG
> > > > > >       kvm->brlock_sem = (pthread_rwlock_t) PTHREAD_RWLOCK_INITIALIZER;
> > > > > > @@ -174,6 +175,9 @@ int kvm__exit(struct kvm *kvm)
> > > > > >
> > > > > >       kvm__arch_delete_ram(kvm);
> > > > > >
> > > > > > +     if (kvm->ram_fd >= 0)
> > > > > > +             close(kvm->ram_fd);
> > > > > > +
> > > > > >       list_for_each_entry_safe(bank, tmp, &kvm->mem_banks, list) {
> > > > > >               list_del(&bank->list);
> > > > > >               free(bank);
> > > > > > diff --git a/util/util.c b/util/util.c
> > > > > > index d3483d8..278bcc2 100644
> > > > > > --- a/util/util.c
> > > > > > +++ b/util/util.c
> > > > > > @@ -102,36 +102,38 @@ static u64 get_hugepage_blk_size(const char *htlbfs_path)
> > > > > >       return sfs.f_bsize;
> > > > > >  }
> > > > > >
> > > > > > -static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size, u64 blk_size)
> > > > > > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size)
> > > > > >  {
> > > > > >       const char *name = "kvmtool";
> > > > > >       unsigned int flags = 0;
> > > > > >       int fd;
> > > > > > -     void *addr;
> > > > > > -     int htsize = __builtin_ctzl(blk_size);
> > > > > >
> > > > > > -     if ((1ULL << htsize) != blk_size)
> > > > > > -             die("Hugepage size must be a power of 2.\n");
> > > > > > +     if (hugetlb) {
> > > > > > +             int htsize = __builtin_ctzl(blk_size);
> > > > > >
> > > > > > -     flags |= MFD_HUGETLB;
> > > > > > -     flags |= htsize << MFD_HUGE_SHIFT;
> > > > > > +             if ((1ULL << htsize) != blk_size)
> > > > > > +                     die("Hugepage size must be a power of 2.\n");
> > > > > > +
> > > > > > +             flags |= MFD_HUGETLB;
> > > > > > +             flags |= htsize << MFD_HUGE_SHIFT;
> > > > > > +     }
> > > > > >
> > > > > >       fd = memfd_create(name, flags);
> > > > > >       if (fd < 0)
> > > > > > -             die("Can't memfd_create for hugetlbfs map\n");
> > > > > > +             die("Can't memfd_create for memory map\n");
> > > > > > +
> > > > > >       if (ftruncate(fd, size) < 0)
> > > > > >               die("Can't ftruncate for mem mapping size %lld\n",
> > > > > >                       (unsigned long long)size);
> > > > > > -     addr = mmap(NULL, size, PROT_RW, MAP_PRIVATE, fd, 0);
> > > > > > -     close(fd);
> > > > > >
> > > > > > -     return addr;
> > > > > > +     return fd;
> > > > > >  }
> > > > > >
> > > > > >  /* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
> > > > > >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> > > > > >  {
> > > > > >       u64 blk_size = 0;
> > > > > > +     int fd;
> > > > > >
> > > > > >       /*
> > > > > >        * We don't /need/ to map guest RAM from hugetlbfs, but we do so
> > > > > > @@ -146,9 +148,14 @@ void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> > > > > >               }
> > > > > >
> > > > > >               kvm->ram_pagesize = blk_size;
> > > > > > -             return mmap_hugetlbfs(kvm, htlbfs_path, size, blk_size);
> > > > > >       } else {
> > > > > >               kvm->ram_pagesize = getpagesize();
> > > > > > -             return mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
> > > > > >       }
> > > > > > +
> > > > > > +     fd = memfd_alloc(size, htlbfs_path, blk_size);
> > > > > > +     if (fd < 0)
> > > > > > +             return MAP_FAILED;
> > > > > > +
> > > > > > +     kvm->ram_fd = fd;
> > > > > > +     return mmap(NULL, size, PROT_RW, MAP_PRIVATE, kvm->ram_fd, 0);
> > > > > >  }
> > > > > > --
> > > > > > 2.38.1.431.g37b22c650d-goog
> > > > > >

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations
  2022-11-28  8:49             ` Fuad Tabba
@ 2022-11-29 18:09               ` Alexandru Elisei
  2022-11-30 17:54                 ` Fuad Tabba
  0 siblings, 1 reply; 36+ messages in thread
From: Alexandru Elisei @ 2022-11-29 18:09 UTC (permalink / raw)
  To: Fuad Tabba; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Mon, Nov 28, 2022 at 08:49:29AM +0000, Fuad Tabba wrote:
> Hi,
> 
> First I want to mention that I really appreciate your feedback, which
> has already been quite helpful. I would like you to please consider
> this to be an RFC, and let's use these patches as a basis for
> discussion and how they can be improved when I respin them, even if
> that means waiting until the kvm fd-based proposal is finalized.

For that it's probably best if you add RFC to the subject prefix. That's
very helpful to let the reviewers know what to focus on, more on the
approach than on the finer details.

> 
> Now to answer your question...
> 
> <snip>
> 
> > > My reasoning for allocating all memory with memfd is that it's one
> > > ring to rule them all :) By that I mean, with memfd, we can allocate
> > > normal memory, hugetlb memory, in the future guest private memory, and
> > > easily expand it to support things like IPC memory sharing in the
> > > future.
> >
> > Allocating anonymous memory is more complex now. And I could argue than the
> > hugetlbfs case is also more complex because there are now two branches that
> > do different things based whether it's hugetlbfs or not, instead of one.
> 
> The additional complexity now comes not from using memfd, but from
> unmapping and aligning code, which I think does benefit kvmtool in
> general.

I wasn't referring to the unmapping/aligning part because that can be
implemented without using memfd.

> 
> Hugetlbfd already had a different path before, now at least the other
> path it has just has to do with setting flags for memfd_create(),
> rather than allocating memory differently.

Conceptually, both allocate memory the same way, by creating a temporary
file. I do agree though that using memfd is easier than fidling with the
temporary file name.

> 
> 
> > As I stands right now, my opinion is that using memfd for anonymous RAM
> > only adds complexity for zero benefits.
> > >
> > >
> > > > >
> > > > > Moreover, using an fd would be more generic and flexible, which allows
> > > > > for other use cases (such as IPC), or to map that memory in userspace
> > > > > when appropriate. It also allows us to use the same interface for
> > > > > hugetlb. Considering that other VMMs (e.g., qemu [2], crosvm [3])
> > > > > already back guest memory with memfd, and looking at how private
> > > > > memory would work [4], it seemed to me that the best way to unify all
> > > > > of these needs is to have the backend of guest memory be fd-based.
> > > > >
> > > > > It would be possible to have that as a separate kvmtool option, where
> > > > > fd-backed memory would be only for guests that use the new private
> > > > > memory extensions. However, that would mean more code to maintain that
> > > > > is essentially doing the same thing (allocating and mapping memory).
> > > > >
> > > > > I thought that it would be worth having these patches in kvmtool now
> > > > > rather than wait until the guest private memory has made it into kvm.
> > > > > These patches simplify the code as an end result, make it easier to
> > > >
> > > > In the non-hugetlbfs case, before:
> > > >
> > > >         kvm->arch.ram_alloc_size = kvm->ram_size + SZ_2M;
> > > >         kvm->arch.ram_alloc_start = mmap_anon_or_hugetlbfs(kvm, kvm->cfg.hugetlbfs_path, kvm->arch.ram_alloc_size);
> > > >
> > > >         /*
> > > >          * mmap_anon_or_hugetlbfs expands to:
> > > >          * getpagesize()
> > > >          * mmap()
> > > >          */
> > > >
> > > >         kvm->ram_start = (void *)ALIGN((unsigned long)kvm->arch.ram_alloc_start, SZ_2M);
> > > >
> > > > After:
> > > >         /* mmap_anon_or_hugetlbfs: */
> > > >         getpagesize();
> > > >         mmap(NULL, total_map, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > > >         memfd_alloc(size, htlbfs_path, blk_size);
> > > >
> > > >         /*
> > > >          * memfd_alloc() expands to:
> > > >          * memfd_create()
> > > >          * ftruncate
> > > >          */
> > > >
> > > >         addr_align = (void *)ALIGN((u64)addr_map, align_sz);
> > > >         mmap(addr_align, size, PROT_RW, MAP_PRIVATE | MAP_FIXED, fd, 0);
> > > >
> > > > I'm counting one extra mmap(), one memfd_create() and one ftruncate() that
> > > > this series adds (not to mention all the boiler plate code to check for
> > > > errors).
> > > >
> > > > Let's use another metric, let's count the number of lines of code. Before:
> > > > 9 lines of code, after: -3 lines removed from arm/kvm.c and 86 lines of
> > > > code for memfd_alloc() and mmap_anon_or_hugetlbfs_align().
> > > >
> > > > I'm struggling to find a metric by which the resulting code is simpler, as
> > > > you suggest.
> > >
> > > With simpler I didn't mean fewer lines of code, rather that it's
> > > easier to reason about, more shared code. With this series, hugetlb
> >
> > How is all of the code that has been added easier to reason about than one
> > single mmap call?

Would be nice if this would be answered.

> >
> > > and normal memory creation follow the same path, and with the
> > > refactoring a lot of arch-specific code is gone.
> >
> > Can you point me to the arch-specific code that this series removes? As far
> > as I can tell, the only arch specfic change is replacing
> > kvm_arch_delete_ram with kvm_delete_ram, which can be done independently of
> > this series. If it's only that function, I wouldn't call that "a lot" of
> > arch-specific code.
> 
> kvmtool is an old and well established project. So I think that being
> able to remove the memory-alignment code from the arm and riscv kvm.c,
> two fields from the arm and riscv struct kvm_arch, as well as
> kvm__arch_delete_ram from all architectures, is not that little for a
> mature project such as this one. I agree that this could have been
> done without using memfd, but at least for me, as a person who has
  ^^^^^^^^^^^^^^^^^^^^^^^^
Good to see we're in agreement.

> just posted their first contribution to kvmtool, it was easier to make
> these changes when the tracking of the memory is based on an fd rather
> than a userspace address (makes alignment and unmapping unused memory
> much easier).

How does this look:

diff --git a/arm/kvm.c b/arm/kvm.c
index d51cc15d8b1c..13b0d10c9cd1 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -27,6 +27,7 @@ bool kvm__arch_cpu_supports_vm(void)
 void kvm__init_ram(struct kvm *kvm)
 {
        u64 phys_start, phys_size;
+       unsigned long extra_memory;
        void *host_mem;
        int err;

@@ -48,12 +49,16 @@ void kvm__init_ram(struct kvm *kvm)

        kvm->ram_start = (void *)ALIGN((unsigned long)kvm->arch.ram_alloc_start,
                                        SZ_2M);
+       extra_memory = (unsigned long)kvm->ram_start - (unsigned long)kvm->arch.ram_alloc_start;
+       if (extra_memory) {
+               munmap(kvm->arch.ram_alloc_start,  extra_memory);
+               /* Only here for kvm__arch_delete_ram, the fields should be removed */
+               kvm->arch.ram_alloc_start = kvm->ram_start;
+               kvm->arch.ram_alloc_size = kvm->ram_size;
+       }

-       madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size,
-               MADV_MERGEABLE);
-
-       madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size,
-               MADV_HUGEPAGE);
+       madvise(kvm->ram_start, kvm->ram_size, MADV_MERGEABLE);
+       madvise(kvm->ram_start, kvm->ram_size, MADV_HUGEPAGE);

        phys_start      = kvm->cfg.ram_addr;
        phys_size       = kvm->ram_size;

Warning, untested code.

Then you can fold this into mmap_anon_or_hugetlbfs_aligned(). You can do
whatever you want with the code without giving me any credit.

> 
> >
> > >
> > > >
> > > > > allocate and map aligned memory without overallocating, and bring
> > > >
> > > > If your goal is to remove the overallocting of memory, you can just munmap
> > > > the extra memory after alignment is performed. To do that you don't need to
> > > > allocate everything using a memfd.
> > > >
> > > > > kvmtool closer to a more consistent way of allocating guest memory, in
> > > > > a similar manner to other VMMs.
> > > >
> > > > I would really appreciate pointing me to where qemu allocates memory using
> > > > memfd when invoked with -m <size>. I was able to follow the hostmem-ram
> > > > backend allocation function until g_malloc0(), but I couldn't find the
> > > > implementation for that.
> > >
> > > You're right. I thought that the memfd backend was the default, but
> > > looking again it's not.
> > >
> > > > >
> > > > > Moreover, with the private memory proposal [1], whether the fd-based
> > > > > support available can be queried by a KVM capability. If it's
> > > > > available kvmtool would use the fd, if it's not available, it would
> > > > > use the host-mapped address. Therefore, there isn’t a need for a
> > > > > command line option, unless for testing.
> > > >
> > > > Why would anyone want to use private memory by default for a
> > > > non-confidential VM?
> > >
> > > The idea is that, at least when pKVM is enabled, we would use the
> > > proposed extensions for all guests, i.e., memory via a file
> > > descriptor, regardless whether the guest is protected (thus the memory
> > > would be private), or not.
> >
> > kvmtool can be used to run virtual machines when pKVM is not enabled. In
> > fact, it has been used that way for way longer than pKVM has existed. What
> > about those users?
> 
> This does not affect these users, which is the point of these patches.
> This allows new uses as well as maintaining the existing one, and
> enables potentially new ones in the future.

On the contrary, this affects people that don't use pKVM, because you are
changing how allocating anonymous memory works.

> 
> > >
> > >
> > > > >
> > > > > I have implemented this all the way to support the private memory
> > > > > proposal in kvmtool [5], but I haven’t posted these since the private
> > > > > memory proposal itself is still in flux. If you’re interested you
> > > >
> > > > Are you saying that you are not really sure how the userspace API will end
> > > > up looking? If that's the case, wouldn't it make more sense to wait for the
> > > > API to stabilize and then send support for it as one nice series?
> > >
> > > Yes, I'm not sure how it will end up looking. We know that it will be
> > > fd-based though, which is why I thought it might be good to start with
> > > that.
> >
> > If you're not sure how it will end up looking, then why change kvmtool now?
> 
> Because we are sure that it will be fd-based, and because I thought
> that getting a head start to set the scene would be helpful. The part
> that is uncertain is the kvm capabilities, flags, and names of the new
> memory region extensions, none of which I address in these patches.

I see, that makes sense. My feedback so far is that you haven't provided a
good reason why this change to anonymous memory makes sense right now.

Thanks,
Alex

> 
> Cheers,
> /fuad
> 
> > Thanks,
> > Alex
> >
> > >
> > > Cheers,
> > > /fuad
> > >
> > >
> > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > > could have a look on how I would go ahead building on these patches
> > > > > for full support of private memory backed by an fd.
> > > > >
> > > > > > Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> > > > > > that? If more work is needed for it, then wouldn't it make more sense to do
> > > > > > all the changes at once? This change might look sensible right now, but it
> > > > > > might turn out that it was the wrong way to go about it when someone
> > > > > > actually starts implementing memory sharing.
> > > > >
> > > > > I don’t plan on supporting IPC memory sharing. I just mentioned that
> > > > > as yet another use case that would benefit from guest memory being
> > > > > fd-based, should kvmtool decide to support it in the future.
> > > > >
> > > > > Cheers,
> > > > > /fuad
> > > > >
> > > > > [1] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> > > > > [2] https://github.com/qemu/qemu
> > > > > [3] https://chromium.googlesource.com/chromiumos/platform/crosvm/
> > > > > [4] https://github.com/chao-p/qemu/tree/privmem-v9
> > > > > [5] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/fdmem-v9-core
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> > > > > > that? If more work is needed for it, then wouldn't it make more sense to do
> > > > > > all the changes at once? This change might look sensible right now, but it
> > > > > > might turn out that it was the wrong way to go about it when someone
> > > > > > actually starts implementing memory sharing.
> > > > > >
> > > > > > Thanks,
> > > > > > Alex
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Fuad Tabba <tabba@google.com>
> > > > > > >
> > > > > > > [*] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> > > > > > > ---
> > > > > > >  include/kvm/kvm.h  |  1 +
> > > > > > >  include/kvm/util.h |  3 +++
> > > > > > >  kvm.c              |  4 ++++
> > > > > > >  util/util.c        | 33 ++++++++++++++++++++-------------
> > > > > > >  4 files changed, 28 insertions(+), 13 deletions(-)
> > > > > > >
> > > > > > > diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> > > > > > > index 3872dc6..d0d519b 100644
> > > > > > > --- a/include/kvm/kvm.h
> > > > > > > +++ b/include/kvm/kvm.h
> > > > > > > @@ -87,6 +87,7 @@ struct kvm {
> > > > > > >       struct kvm_config       cfg;
> > > > > > >       int                     sys_fd;         /* For system ioctls(), i.e. /dev/kvm */
> > > > > > >       int                     vm_fd;          /* For VM ioctls() */
> > > > > > > +     int                     ram_fd;         /* For guest memory. */
> > > > > > >       timer_t                 timerid;        /* Posix timer for interrupts */
> > > > > > >
> > > > > > >       int                     nrcpus;         /* Number of cpus to run */
> > > > > > > diff --git a/include/kvm/util.h b/include/kvm/util.h
> > > > > > > index 61a205b..369603b 100644
> > > > > > > --- a/include/kvm/util.h
> > > > > > > +++ b/include/kvm/util.h
> > > > > > > @@ -140,6 +140,9 @@ static inline int pow2_size(unsigned long x)
> > > > > > >  }
> > > > > > >
> > > > > > >  struct kvm;
> > > > > > > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size);
> > > > > > > +void *mmap_anon_or_hugetlbfs_align(struct kvm *kvm, const char *htlbfs_path,
> > > > > > > +                                u64 size, u64 align);
> > > > > > >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
> > > > > > >
> > > > > > >  #endif /* KVM__UTIL_H */
> > > > > > > diff --git a/kvm.c b/kvm.c
> > > > > > > index 78bc0d8..ed29d68 100644
> > > > > > > --- a/kvm.c
> > > > > > > +++ b/kvm.c
> > > > > > > @@ -160,6 +160,7 @@ struct kvm *kvm__new(void)
> > > > > > >       mutex_init(&kvm->mem_banks_lock);
> > > > > > >       kvm->sys_fd = -1;
> > > > > > >       kvm->vm_fd = -1;
> > > > > > > +     kvm->ram_fd = -1;
> > > > > > >
> > > > > > >  #ifdef KVM_BRLOCK_DEBUG
> > > > > > >       kvm->brlock_sem = (pthread_rwlock_t) PTHREAD_RWLOCK_INITIALIZER;
> > > > > > > @@ -174,6 +175,9 @@ int kvm__exit(struct kvm *kvm)
> > > > > > >
> > > > > > >       kvm__arch_delete_ram(kvm);
> > > > > > >
> > > > > > > +     if (kvm->ram_fd >= 0)
> > > > > > > +             close(kvm->ram_fd);
> > > > > > > +
> > > > > > >       list_for_each_entry_safe(bank, tmp, &kvm->mem_banks, list) {
> > > > > > >               list_del(&bank->list);
> > > > > > >               free(bank);
> > > > > > > diff --git a/util/util.c b/util/util.c
> > > > > > > index d3483d8..278bcc2 100644
> > > > > > > --- a/util/util.c
> > > > > > > +++ b/util/util.c
> > > > > > > @@ -102,36 +102,38 @@ static u64 get_hugepage_blk_size(const char *htlbfs_path)
> > > > > > >       return sfs.f_bsize;
> > > > > > >  }
> > > > > > >
> > > > > > > -static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size, u64 blk_size)
> > > > > > > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size)
> > > > > > >  {
> > > > > > >       const char *name = "kvmtool";
> > > > > > >       unsigned int flags = 0;
> > > > > > >       int fd;
> > > > > > > -     void *addr;
> > > > > > > -     int htsize = __builtin_ctzl(blk_size);
> > > > > > >
> > > > > > > -     if ((1ULL << htsize) != blk_size)
> > > > > > > -             die("Hugepage size must be a power of 2.\n");
> > > > > > > +     if (hugetlb) {
> > > > > > > +             int htsize = __builtin_ctzl(blk_size);
> > > > > > >
> > > > > > > -     flags |= MFD_HUGETLB;
> > > > > > > -     flags |= htsize << MFD_HUGE_SHIFT;
> > > > > > > +             if ((1ULL << htsize) != blk_size)
> > > > > > > +                     die("Hugepage size must be a power of 2.\n");
> > > > > > > +
> > > > > > > +             flags |= MFD_HUGETLB;
> > > > > > > +             flags |= htsize << MFD_HUGE_SHIFT;
> > > > > > > +     }
> > > > > > >
> > > > > > >       fd = memfd_create(name, flags);
> > > > > > >       if (fd < 0)
> > > > > > > -             die("Can't memfd_create for hugetlbfs map\n");
> > > > > > > +             die("Can't memfd_create for memory map\n");
> > > > > > > +
> > > > > > >       if (ftruncate(fd, size) < 0)
> > > > > > >               die("Can't ftruncate for mem mapping size %lld\n",
> > > > > > >                       (unsigned long long)size);
> > > > > > > -     addr = mmap(NULL, size, PROT_RW, MAP_PRIVATE, fd, 0);
> > > > > > > -     close(fd);
> > > > > > >
> > > > > > > -     return addr;
> > > > > > > +     return fd;
> > > > > > >  }
> > > > > > >
> > > > > > >  /* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
> > > > > > >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> > > > > > >  {
> > > > > > >       u64 blk_size = 0;
> > > > > > > +     int fd;
> > > > > > >
> > > > > > >       /*
> > > > > > >        * We don't /need/ to map guest RAM from hugetlbfs, but we do so
> > > > > > > @@ -146,9 +148,14 @@ void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> > > > > > >               }
> > > > > > >
> > > > > > >               kvm->ram_pagesize = blk_size;
> > > > > > > -             return mmap_hugetlbfs(kvm, htlbfs_path, size, blk_size);
> > > > > > >       } else {
> > > > > > >               kvm->ram_pagesize = getpagesize();
> > > > > > > -             return mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
> > > > > > >       }
> > > > > > > +
> > > > > > > +     fd = memfd_alloc(size, htlbfs_path, blk_size);
> > > > > > > +     if (fd < 0)
> > > > > > > +             return MAP_FAILED;
> > > > > > > +
> > > > > > > +     kvm->ram_fd = fd;
> > > > > > > +     return mmap(NULL, size, PROT_RW, MAP_PRIVATE, kvm->ram_fd, 0);
> > > > > > >  }
> > > > > > > --
> > > > > > > 2.38.1.431.g37b22c650d-goog
> > > > > > >

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations
  2022-11-29 18:09               ` Alexandru Elisei
@ 2022-11-30 17:54                 ` Fuad Tabba
  0 siblings, 0 replies; 36+ messages in thread
From: Fuad Tabba @ 2022-11-30 17:54 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, julien.thierry.kdev, andre.przywara, will

Hi,

On Tue, Nov 29, 2022 at 6:10 PM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi,
>
> On Mon, Nov 28, 2022 at 08:49:29AM +0000, Fuad Tabba wrote:
> > Hi,
> >
> > First I want to mention that I really appreciate your feedback, which
> > has already been quite helpful. I would like you to please consider
> > this to be an RFC, and let's use these patches as a basis for
> > discussion and how they can be improved when I respin them, even if
> > that means waiting until the kvm fd-based proposal is finalized.
>
> For that it's probably best if you add RFC to the subject prefix. That's
> very helpful to let the reviewers know what to focus on, more on the
> approach than on the finer details.

I will respin this as an RFC, and I will include the patches that I
have that support the restricted memory proposal [*] for pKVM as it
stands now. I hope that would help see where I was thinking this would
be heading.

[*] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/

<snip>

> > > > With simpler I didn't mean fewer lines of code, rather that it's
> > > > easier to reason about, more shared code. With this series, hugetlb
> > >
> > > How is all of the code that has been added easier to reason about than one
> > > single mmap call?
>
> Would be nice if this would be answered.

As I said in a reply to a different comment, for me personally, as a first time
kvmtool contributor, it was easier for me to reason about the memory
when the canonical reference to the memory is a file descriptor that
would not change, rather than a userspace memory address which could
change as it is aligned and trimmed.

I use the word simpler subjectively, that is, in my opinion.

<snip>

> >
> > Because we are sure that it will be fd-based, and because I thought
> > that getting a head start to set the scene would be helpful. The part
> > that is uncertain is the kvm capabilities, flags, and names of the new
> > memory region extensions, none of which I address in these patches.
>
> I see, that makes sense. My feedback so far is that you haven't provided a
> good reason why this change to anonymous memory makes sense right now.

I appreciate your feedback, and I hope we can continue this discussion
when I respin this as an RFC.



Cheers,
/fuad

> Thanks,
> Alex
>
> >
> > Cheers,
> > /fuad
> >
> > > Thanks,
> > > Alex
> > >
> > > >
> > > > Cheers,
> > > > /fuad
> > > >
> > > >
> > > >
> > > > > Thanks,
> > > > > Alex
> > > > >
> > > > > > could have a look on how I would go ahead building on these patches
> > > > > > for full support of private memory backed by an fd.
> > > > > >
> > > > > > > Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> > > > > > > that? If more work is needed for it, then wouldn't it make more sense to do
> > > > > > > all the changes at once? This change might look sensible right now, but it
> > > > > > > might turn out that it was the wrong way to go about it when someone
> > > > > > > actually starts implementing memory sharing.
> > > > > >
> > > > > > I don’t plan on supporting IPC memory sharing. I just mentioned that
> > > > > > as yet another use case that would benefit from guest memory being
> > > > > > fd-based, should kvmtool decide to support it in the future.
> > > > > >
> > > > > > Cheers,
> > > > > > /fuad
> > > > > >
> > > > > > [1] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> > > > > > [2] https://github.com/qemu/qemu
> > > > > > [3] https://chromium.googlesource.com/chromiumos/platform/crosvm/
> > > > > > [4] https://github.com/chao-p/qemu/tree/privmem-v9
> > > > > > [5] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/fdmem-v9-core
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Regarding IPC memory sharing, is mmap'ing an memfd file enough to enable
> > > > > > > that? If more work is needed for it, then wouldn't it make more sense to do
> > > > > > > all the changes at once? This change might look sensible right now, but it
> > > > > > > might turn out that it was the wrong way to go about it when someone
> > > > > > > actually starts implementing memory sharing.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Alex
> > > > > > >
> > > > > > > >
> > > > > > > > Signed-off-by: Fuad Tabba <tabba@google.com>
> > > > > > > >
> > > > > > > > [*] https://lore.kernel.org/all/20221025151344.3784230-1-chao.p.peng@linux.intel.com/
> > > > > > > > ---
> > > > > > > >  include/kvm/kvm.h  |  1 +
> > > > > > > >  include/kvm/util.h |  3 +++
> > > > > > > >  kvm.c              |  4 ++++
> > > > > > > >  util/util.c        | 33 ++++++++++++++++++++-------------
> > > > > > > >  4 files changed, 28 insertions(+), 13 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> > > > > > > > index 3872dc6..d0d519b 100644
> > > > > > > > --- a/include/kvm/kvm.h
> > > > > > > > +++ b/include/kvm/kvm.h
> > > > > > > > @@ -87,6 +87,7 @@ struct kvm {
> > > > > > > >       struct kvm_config       cfg;
> > > > > > > >       int                     sys_fd;         /* For system ioctls(), i.e. /dev/kvm */
> > > > > > > >       int                     vm_fd;          /* For VM ioctls() */
> > > > > > > > +     int                     ram_fd;         /* For guest memory. */
> > > > > > > >       timer_t                 timerid;        /* Posix timer for interrupts */
> > > > > > > >
> > > > > > > >       int                     nrcpus;         /* Number of cpus to run */
> > > > > > > > diff --git a/include/kvm/util.h b/include/kvm/util.h
> > > > > > > > index 61a205b..369603b 100644
> > > > > > > > --- a/include/kvm/util.h
> > > > > > > > +++ b/include/kvm/util.h
> > > > > > > > @@ -140,6 +140,9 @@ static inline int pow2_size(unsigned long x)
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  struct kvm;
> > > > > > > > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size);
> > > > > > > > +void *mmap_anon_or_hugetlbfs_align(struct kvm *kvm, const char *htlbfs_path,
> > > > > > > > +                                u64 size, u64 align);
> > > > > > > >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
> > > > > > > >
> > > > > > > >  #endif /* KVM__UTIL_H */
> > > > > > > > diff --git a/kvm.c b/kvm.c
> > > > > > > > index 78bc0d8..ed29d68 100644
> > > > > > > > --- a/kvm.c
> > > > > > > > +++ b/kvm.c
> > > > > > > > @@ -160,6 +160,7 @@ struct kvm *kvm__new(void)
> > > > > > > >       mutex_init(&kvm->mem_banks_lock);
> > > > > > > >       kvm->sys_fd = -1;
> > > > > > > >       kvm->vm_fd = -1;
> > > > > > > > +     kvm->ram_fd = -1;
> > > > > > > >
> > > > > > > >  #ifdef KVM_BRLOCK_DEBUG
> > > > > > > >       kvm->brlock_sem = (pthread_rwlock_t) PTHREAD_RWLOCK_INITIALIZER;
> > > > > > > > @@ -174,6 +175,9 @@ int kvm__exit(struct kvm *kvm)
> > > > > > > >
> > > > > > > >       kvm__arch_delete_ram(kvm);
> > > > > > > >
> > > > > > > > +     if (kvm->ram_fd >= 0)
> > > > > > > > +             close(kvm->ram_fd);
> > > > > > > > +
> > > > > > > >       list_for_each_entry_safe(bank, tmp, &kvm->mem_banks, list) {
> > > > > > > >               list_del(&bank->list);
> > > > > > > >               free(bank);
> > > > > > > > diff --git a/util/util.c b/util/util.c
> > > > > > > > index d3483d8..278bcc2 100644
> > > > > > > > --- a/util/util.c
> > > > > > > > +++ b/util/util.c
> > > > > > > > @@ -102,36 +102,38 @@ static u64 get_hugepage_blk_size(const char *htlbfs_path)
> > > > > > > >       return sfs.f_bsize;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > -static void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size, u64 blk_size)
> > > > > > > > +int memfd_alloc(u64 size, bool hugetlb, u64 blk_size)
> > > > > > > >  {
> > > > > > > >       const char *name = "kvmtool";
> > > > > > > >       unsigned int flags = 0;
> > > > > > > >       int fd;
> > > > > > > > -     void *addr;
> > > > > > > > -     int htsize = __builtin_ctzl(blk_size);
> > > > > > > >
> > > > > > > > -     if ((1ULL << htsize) != blk_size)
> > > > > > > > -             die("Hugepage size must be a power of 2.\n");
> > > > > > > > +     if (hugetlb) {
> > > > > > > > +             int htsize = __builtin_ctzl(blk_size);
> > > > > > > >
> > > > > > > > -     flags |= MFD_HUGETLB;
> > > > > > > > -     flags |= htsize << MFD_HUGE_SHIFT;
> > > > > > > > +             if ((1ULL << htsize) != blk_size)
> > > > > > > > +                     die("Hugepage size must be a power of 2.\n");
> > > > > > > > +
> > > > > > > > +             flags |= MFD_HUGETLB;
> > > > > > > > +             flags |= htsize << MFD_HUGE_SHIFT;
> > > > > > > > +     }
> > > > > > > >
> > > > > > > >       fd = memfd_create(name, flags);
> > > > > > > >       if (fd < 0)
> > > > > > > > -             die("Can't memfd_create for hugetlbfs map\n");
> > > > > > > > +             die("Can't memfd_create for memory map\n");
> > > > > > > > +
> > > > > > > >       if (ftruncate(fd, size) < 0)
> > > > > > > >               die("Can't ftruncate for mem mapping size %lld\n",
> > > > > > > >                       (unsigned long long)size);
> > > > > > > > -     addr = mmap(NULL, size, PROT_RW, MAP_PRIVATE, fd, 0);
> > > > > > > > -     close(fd);
> > > > > > > >
> > > > > > > > -     return addr;
> > > > > > > > +     return fd;
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  /* This function wraps the decision between hugetlbfs map (if requested) or normal mmap */
> > > > > > > >  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> > > > > > > >  {
> > > > > > > >       u64 blk_size = 0;
> > > > > > > > +     int fd;
> > > > > > > >
> > > > > > > >       /*
> > > > > > > >        * We don't /need/ to map guest RAM from hugetlbfs, but we do so
> > > > > > > > @@ -146,9 +148,14 @@ void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size)
> > > > > > > >               }
> > > > > > > >
> > > > > > > >               kvm->ram_pagesize = blk_size;
> > > > > > > > -             return mmap_hugetlbfs(kvm, htlbfs_path, size, blk_size);
> > > > > > > >       } else {
> > > > > > > >               kvm->ram_pagesize = getpagesize();
> > > > > > > > -             return mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
> > > > > > > >       }
> > > > > > > > +
> > > > > > > > +     fd = memfd_alloc(size, htlbfs_path, blk_size);
> > > > > > > > +     if (fd < 0)
> > > > > > > > +             return MAP_FAILED;
> > > > > > > > +
> > > > > > > > +     kvm->ram_fd = fd;
> > > > > > > > +     return mmap(NULL, size, PROT_RW, MAP_PRIVATE, kvm->ram_fd, 0);
> > > > > > > >  }
> > > > > > > > --
> > > > > > > > 2.38.1.431.g37b22c650d-goog
> > > > > > > >

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-11-30 17:55 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-15 11:15 [PATCH kvmtool v1 00/17] Use memfd for guest vm memory allocation Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 01/17] Initialize the return value in kvm__for_each_mem_bank() Fuad Tabba
2022-11-15 11:59   ` Andre Przywara
2022-11-23 16:08   ` Alexandru Elisei
2022-11-23 17:43     ` Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 02/17] Make mmap_hugetlbfs() static Fuad Tabba
2022-11-15 17:58   ` Andre Przywara
2022-11-15 11:15 ` [PATCH kvmtool v1 03/17] Rename parameter in mmap_anon_or_hugetlbfs() Fuad Tabba
2022-11-23 16:40   ` Alexandru Elisei
2022-11-23 17:44     ` Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 04/17] Add hostmem va to debug print Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 05/17] Factor out getting the hugetlb block size Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 06/17] Use memfd for hugetlbfs when allocating guest ram Fuad Tabba
2022-11-24 10:19   ` Alexandru Elisei
2022-11-24 10:45     ` Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 07/17] Make blk_size a parameter and pass it to mmap_hugetlbfs() Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 08/17] Use memfd for all guest ram allocations Fuad Tabba
2022-11-24 11:01   ` Alexandru Elisei
2022-11-24 15:19     ` Fuad Tabba
2022-11-24 17:14       ` Alexandru Elisei
2022-11-25 10:43         ` Alexandru Elisei
2022-11-25 10:58           ` Fuad Tabba
2022-11-25 10:44         ` Fuad Tabba
2022-11-25 11:31           ` Alexandru Elisei
2022-11-28  8:49             ` Fuad Tabba
2022-11-29 18:09               ` Alexandru Elisei
2022-11-30 17:54                 ` Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 09/17] Allocate pvtime memory with memfd Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 10/17] Allocate vesa " Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 11/17] Add a function that allocates aligned memory if specified Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 12/17] Use new function to align memory Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 13/17] Remove struct fields and code used for alignment Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 14/17] Replace kvm_arch_delete_ram with kvm_delete_ram Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 15/17] Remove no-longer unused macro Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 16/17] Factor out set_user_memory_region code Fuad Tabba
2022-11-15 11:15 ` [PATCH kvmtool v1 17/17] Pass the memory file descriptor and offset when registering ram Fuad Tabba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.