[PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations
@ 2023-01-08 15:44 Yishai Hadas
  2023-01-08 15:44 ` [PATCH V1 vfio 1/6] vfio/mlx5: Fix UBSAN note Yishai Hadas
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Yishai Hadas @ 2023-01-08 15:44 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, diana.craciun,
	eric.auger, yishaih, maorg, cohuck, shameerali.kolothum.thodi

This series changes the vfio and its sub drivers to use
GFP_KERNEL_ACCOUNT for userspace persistent allocations.

The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this
is untrusted allocation triggered from userspace and should be a subject
of kmem accountingis, and as such it is controlled by the cgroup
mechanism. [1]

As part of this change, we allow loading in mlx5 driver larger images
than 512 MB by dropping the arbitrary hard-coded value that we have
today and move to use the max device loading value which is for now 4GB.

In addition, the first patch from the series fixes a UBSAN note in mlx5
that was reported once the kernel was compiled with this option.

[1] https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html

Changes from V0: https://www.spinics.net/lists/kvm/msg299508.html
Patch #2 - Fix MAX_LOAD_SIZE to use BIT_ULL instead of BIT as was
           reported by the krobot test.

Yishai

Jason Gunthorpe (1):
  vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations

Yishai Hadas (5):
  vfio/mlx5: Fix UBSAN note
  vfio/mlx5: Allow loading of larger images than 512 MB
  vfio/hisi: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
  vfio/fsl-mc: Use GFP_KERNEL_ACCOUNT for userspace persistent
    allocations
  vfio/platform: Use GFP_KERNEL_ACCOUNT for userspace persistent
    allocations

 drivers/vfio/container.c                      |  2 +-
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |  2 +-
 drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c        |  4 ++--
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |  4 ++--
 drivers/vfio/pci/mlx5/cmd.c                   | 17 +++++++++--------
 drivers/vfio/pci/mlx5/main.c                  | 19 ++++++++++---------
 drivers/vfio/pci/vfio_pci_config.c            |  6 +++---
 drivers/vfio/pci/vfio_pci_core.c              |  7 ++++---
 drivers/vfio/pci/vfio_pci_igd.c               |  2 +-
 drivers/vfio/pci/vfio_pci_intrs.c             | 10 ++++++----
 drivers/vfio/pci/vfio_pci_rdwr.c              |  2 +-
 drivers/vfio/platform/vfio_platform_common.c  |  2 +-
 drivers/vfio/platform/vfio_platform_irq.c     |  8 ++++----
 drivers/vfio/virqfd.c                         |  2 +-
 14 files changed, 46 insertions(+), 41 deletions(-)

-- 
2.18.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V1 vfio 1/6] vfio/mlx5: Fix UBSAN note
  2023-01-08 15:44 [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations Yishai Hadas
@ 2023-01-08 15:44 ` Yishai Hadas
  2023-01-08 15:44 ` [PATCH V1 vfio 2/6] vfio/mlx5: Allow loading of larger images than 512 MB Yishai Hadas
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Yishai Hadas @ 2023-01-08 15:44 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, diana.craciun,
	eric.auger, yishaih, maorg, cohuck, shameerali.kolothum.thodi

Prevent calling roundup_pow_of_two() with value of 0 as it causes the
below UBSAN note.

Move this code and its few extra related lines to be called only when
it's really applicable.

UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
shift exponent 64 is too large for 64-bit type 'long unsigned int'
CPU: 15 PID: 1639 Comm: live_migration Not tainted 6.1.0-rc4 #1116
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl+0x45/0x59
ubsan_epilogue+0x5/0x36
 __ubsan_handle_shift_out_of_bounds.cold+0x61/0xef
? lock_is_held_type+0x98/0x110
? rcu_read_lock_sched_held+0x3f/0x70
mlx5vf_create_rc_qp.cold+0xe4/0xf2 [mlx5_vfio_pci]
mlx5vf_start_page_tracker+0x769/0xcd0 [mlx5_vfio_pci]
 vfio_device_fops_unl_ioctl+0x63f/0x700 [vfio]
__x64_sys_ioctl+0x433/0x9a0
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
 </TASK>

Fixes: 79c3cf279926 ("vfio/mlx5: Init QP based resources for dirty tracking")
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index 64e68d13cb98..c5dcddbc4126 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -1036,14 +1036,14 @@ mlx5vf_create_rc_qp(struct mlx5_core_dev *mdev,
 	if (!qp)
 		return ERR_PTR(-ENOMEM);

-	qp->rq.wqe_cnt = roundup_pow_of_two(max_recv_wr);
-	log_rq_stride = ilog2(MLX5_SEND_WQE_DS);
-	log_rq_sz = ilog2(qp->rq.wqe_cnt);
 	err = mlx5_db_alloc_node(mdev, &qp->db, mdev->priv.numa_node);
 	if (err)
 		goto err_free;

 	if (max_recv_wr) {
+		qp->rq.wqe_cnt = roundup_pow_of_two(max_recv_wr);
+		log_rq_stride = ilog2(MLX5_SEND_WQE_DS);
+		log_rq_sz = ilog2(qp->rq.wqe_cnt);
 		err = mlx5_frag_buf_alloc_node(mdev,
 			wq_get_byte_sz(log_rq_sz, log_rq_stride),
 			&qp->buf, mdev->priv.numa_node);
-- 
2.18.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V1 vfio 2/6] vfio/mlx5: Allow loading of larger images than 512 MB
  2023-01-08 15:44 [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations Yishai Hadas
  2023-01-08 15:44 ` [PATCH V1 vfio 1/6] vfio/mlx5: Fix UBSAN note Yishai Hadas
@ 2023-01-08 15:44 ` Yishai Hadas
  2023-01-08 15:44 ` [PATCH V1 vfio 3/6] vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations Yishai Hadas
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Yishai Hadas @ 2023-01-08 15:44 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, diana.craciun,
	eric.auger, yishaih, maorg, cohuck, shameerali.kolothum.thodi

Allow loading of larger images than 512 MB by dropping the arbitrary
hard-coded value that we have today and move to use the max device
loading value which is for now 4GB.

As part of that we move to use the GFP_KERNEL_ACCOUNT option upon
allocating the persistent data of mlx5 and rely on the cgroup to provide
the memory limit for the given user.

The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this
is untrusted allocation triggered from userspace and should be a subject
of kmem accountingis, and as such it is controlled by the cgroup
mechanism.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c  | 11 ++++++-----
 drivers/vfio/pci/mlx5/main.c | 19 ++++++++++---------
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index c5dcddbc4126..0586f09c69af 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -373,7 +373,7 @@ mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf,
 	struct mlx5_vhca_data_buffer *buf;
 	int ret;
 
-	buf = kzalloc(sizeof(*buf), GFP_KERNEL);
+	buf = kzalloc(sizeof(*buf), GFP_KERNEL_ACCOUNT);
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 
@@ -1032,7 +1032,7 @@ mlx5vf_create_rc_qp(struct mlx5_core_dev *mdev,
 	void *in;
 	int err;
 
-	qp = kzalloc(sizeof(*qp), GFP_KERNEL);
+	qp = kzalloc(sizeof(*qp), GFP_KERNEL_ACCOUNT);
 	if (!qp)
 		return ERR_PTR(-ENOMEM);
 
@@ -1213,12 +1213,13 @@ static int alloc_recv_pages(struct mlx5_vhca_recv_buf *recv_buf,
 	int i;
 
 	recv_buf->page_list = kvcalloc(npages, sizeof(*recv_buf->page_list),
-				       GFP_KERNEL);
+				       GFP_KERNEL_ACCOUNT);
 	if (!recv_buf->page_list)
 		return -ENOMEM;
 
 	for (;;) {
-		filled = alloc_pages_bulk_array(GFP_KERNEL, npages - done,
+		filled = alloc_pages_bulk_array(GFP_KERNEL_ACCOUNT,
+						npages - done,
 						recv_buf->page_list + done);
 		if (!filled)
 			goto err;
@@ -1248,7 +1249,7 @@ static int register_dma_recv_pages(struct mlx5_core_dev *mdev,
 
 	recv_buf->dma_addrs = kvcalloc(recv_buf->npages,
 				       sizeof(*recv_buf->dma_addrs),
-				       GFP_KERNEL);
+				       GFP_KERNEL_ACCOUNT);
 	if (!recv_buf->dma_addrs)
 		return -ENOMEM;
 
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index 9feb89c6d939..7ba127d8889a 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -21,8 +21,8 @@
 
 #include "cmd.h"
 
-/* Arbitrary to prevent userspace from consuming endless memory */
-#define MAX_MIGRATION_SIZE (512*1024*1024)
+/* Device specification max LOAD size */
+#define MAX_LOAD_SIZE (BIT_ULL(__mlx5_bit_sz(load_vhca_state_in, size)) - 1)
 
 static struct mlx5vf_pci_core_device *mlx5vf_drvdata(struct pci_dev *pdev)
 {
@@ -73,12 +73,13 @@ int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf,
 	int ret;
 
 	to_fill = min_t(unsigned int, npages, PAGE_SIZE / sizeof(*page_list));
-	page_list = kvzalloc(to_fill * sizeof(*page_list), GFP_KERNEL);
+	page_list = kvzalloc(to_fill * sizeof(*page_list), GFP_KERNEL_ACCOUNT);
 	if (!page_list)
 		return -ENOMEM;
 
 	do {
-		filled = alloc_pages_bulk_array(GFP_KERNEL, to_fill, page_list);
+		filled = alloc_pages_bulk_array(GFP_KERNEL_ACCOUNT, to_fill,
+						page_list);
 		if (!filled) {
 			ret = -ENOMEM;
 			goto err;
@@ -87,7 +88,7 @@ int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf,
 		ret = sg_alloc_append_table_from_pages(
 			&buf->table, page_list, filled, 0,
 			filled << PAGE_SHIFT, UINT_MAX, SG_MAX_SINGLE_ALLOC,
-			GFP_KERNEL);
+			GFP_KERNEL_ACCOUNT);
 
 		if (ret)
 			goto err;
@@ -467,7 +468,7 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track)
 	size_t length;
 	int ret;
 
-	migf = kzalloc(sizeof(*migf), GFP_KERNEL);
+	migf = kzalloc(sizeof(*migf), GFP_KERNEL_ACCOUNT);
 	if (!migf)
 		return ERR_PTR(-ENOMEM);
 
@@ -564,7 +565,7 @@ mlx5vf_resume_read_image_no_header(struct mlx5_vhca_data_buffer *vhca_buf,
 {
 	int ret;
 
-	if (requested_length > MAX_MIGRATION_SIZE)
+	if (requested_length > MAX_LOAD_SIZE)
 		return -ENOMEM;
 
 	if (vhca_buf->allocated_length < requested_length) {
@@ -648,7 +649,7 @@ mlx5vf_resume_read_header(struct mlx5_vf_migration_file *migf,
 		u64 flags;
 
 		vhca_buf->header_image_size = le64_to_cpup((__le64 *)to_buff);
-		if (vhca_buf->header_image_size > MAX_MIGRATION_SIZE) {
+		if (vhca_buf->header_image_size > MAX_LOAD_SIZE) {
 			ret = -ENOMEM;
 			goto end;
 		}
@@ -781,7 +782,7 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev)
 	struct mlx5_vhca_data_buffer *buf;
 	int ret;
 
-	migf = kzalloc(sizeof(*migf), GFP_KERNEL);
+	migf = kzalloc(sizeof(*migf), GFP_KERNEL_ACCOUNT);
 	if (!migf)
 		return ERR_PTR(-ENOMEM);
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V1 vfio 3/6] vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
  2023-01-08 15:44 [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations Yishai Hadas
  2023-01-08 15:44 ` [PATCH V1 vfio 1/6] vfio/mlx5: Fix UBSAN note Yishai Hadas
  2023-01-08 15:44 ` [PATCH V1 vfio 2/6] vfio/mlx5: Allow loading of larger images than 512 MB Yishai Hadas
@ 2023-01-08 15:44 ` Yishai Hadas
  2023-01-09 15:56   ` Joao Martins
  2023-01-08 15:44 ` [PATCH V1 vfio 4/6] vfio/hisi: " Yishai Hadas
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Yishai Hadas @ 2023-01-08 15:44 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, diana.craciun,
	eric.auger, yishaih, maorg, cohuck, shameerali.kolothum.thodi

From: Jason Gunthorpe <jgg@nvidia.com>

Use GFP_KERNEL_ACCOUNT for userspace persistent allocations.

The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this
is untrusted allocation triggered from userspace and should be a subject
of kmem accountingis, and as such it is controlled by the cgroup
mechanism.

The way to find the relevant allocations was for example to look at the
close_device function and trace back all the kfrees to their
allocations.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/container.c           |  2 +-
 drivers/vfio/pci/vfio_pci_config.c |  6 +++---
 drivers/vfio/pci/vfio_pci_core.c   |  7 ++++---
 drivers/vfio/pci/vfio_pci_igd.c    |  2 +-
 drivers/vfio/pci/vfio_pci_intrs.c  | 10 ++++++----
 drivers/vfio/pci/vfio_pci_rdwr.c   |  2 +-
 drivers/vfio/virqfd.c              |  2 +-
 7 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index b7a9560ab25e..5f398c493a1b 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -367,7 +367,7 @@ static int vfio_fops_open(struct inode *inode, struct file *filep)
 {
 	struct vfio_container *container;
 
-	container = kzalloc(sizeof(*container), GFP_KERNEL);
+	container = kzalloc(sizeof(*container), GFP_KERNEL_ACCOUNT);
 	if (!container)
 		return -ENOMEM;
 
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 4a350421c5f6..523e0144c86f 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1244,7 +1244,7 @@ static int vfio_msi_cap_len(struct vfio_pci_core_device *vdev, u8 pos)
 	if (vdev->msi_perm)
 		return len;
 
-	vdev->msi_perm = kmalloc(sizeof(struct perm_bits), GFP_KERNEL);
+	vdev->msi_perm = kmalloc(sizeof(struct perm_bits), GFP_KERNEL_ACCOUNT);
 	if (!vdev->msi_perm)
 		return -ENOMEM;
 
@@ -1731,11 +1731,11 @@ int vfio_config_init(struct vfio_pci_core_device *vdev)
 	 * no requirements on the length of a capability, so the gap between
 	 * capabilities needs byte granularity.
 	 */
-	map = kmalloc(pdev->cfg_size, GFP_KERNEL);
+	map = kmalloc(pdev->cfg_size, GFP_KERNEL_ACCOUNT);
 	if (!map)
 		return -ENOMEM;
 
-	vconfig = kmalloc(pdev->cfg_size, GFP_KERNEL);
+	vconfig = kmalloc(pdev->cfg_size, GFP_KERNEL_ACCOUNT);
 	if (!vconfig) {
 		kfree(map);
 		return -ENOMEM;
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 26a541cc64d1..a6492a25ff6a 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -144,7 +144,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
 			 * of the exclusive page in case that hot-add
 			 * device's bar is assigned into it.
 			 */
-			dummy_res = kzalloc(sizeof(*dummy_res), GFP_KERNEL);
+			dummy_res =
+				kzalloc(sizeof(*dummy_res), GFP_KERNEL_ACCOUNT);
 			if (dummy_res == NULL)
 				goto no_mmap;
 
@@ -863,7 +864,7 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev,
 
 	region = krealloc(vdev->region,
 			  (vdev->num_regions + 1) * sizeof(*region),
-			  GFP_KERNEL);
+			  GFP_KERNEL_ACCOUNT);
 	if (!region)
 		return -ENOMEM;
 
@@ -1644,7 +1645,7 @@ static int __vfio_pci_add_vma(struct vfio_pci_core_device *vdev,
 {
 	struct vfio_pci_mmap_vma *mmap_vma;
 
-	mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL);
+	mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL_ACCOUNT);
 	if (!mmap_vma)
 		return -ENOMEM;
 
diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
index 5e6ca5926954..dd70e2431bd7 100644
--- a/drivers/vfio/pci/vfio_pci_igd.c
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -180,7 +180,7 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_core_device *vdev)
 	if (!addr || !(~addr))
 		return -ENODEV;
 
-	opregionvbt = kzalloc(sizeof(*opregionvbt), GFP_KERNEL);
+	opregionvbt = kzalloc(sizeof(*opregionvbt), GFP_KERNEL_ACCOUNT);
 	if (!opregionvbt)
 		return -ENOMEM;
 
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 40c3d7cf163f..bffb0741518b 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -177,7 +177,7 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
 	if (!vdev->pdev->irq)
 		return -ENODEV;
 
-	vdev->ctx = kzalloc(sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL);
+	vdev->ctx = kzalloc(sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL_ACCOUNT);
 	if (!vdev->ctx)
 		return -ENOMEM;
 
@@ -216,7 +216,7 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
 	if (fd < 0) /* Disable only */
 		return 0;
 
-	vdev->ctx[0].name = kasprintf(GFP_KERNEL, "vfio-intx(%s)",
+	vdev->ctx[0].name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)",
 				      pci_name(pdev));
 	if (!vdev->ctx[0].name)
 		return -ENOMEM;
@@ -284,7 +284,8 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
 	if (!is_irq_none(vdev))
 		return -EINVAL;
 
-	vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL);
+	vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx),
+			    GFP_KERNEL_ACCOUNT);
 	if (!vdev->ctx)
 		return -ENOMEM;
 
@@ -343,7 +344,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 	if (fd < 0)
 		return 0;
 
-	vdev->ctx[vector].name = kasprintf(GFP_KERNEL, "vfio-msi%s[%d](%s)",
+	vdev->ctx[vector].name = kasprintf(GFP_KERNEL_ACCOUNT,
+					   "vfio-msi%s[%d](%s)",
 					   msix ? "x" : "", vector,
 					   pci_name(pdev));
 	if (!vdev->ctx[vector].name)
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index e352a033b4ae..e27de61ac9fe 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -470,7 +470,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
 		goto out_unlock;
 	}
 
-	ioeventfd = kzalloc(sizeof(*ioeventfd), GFP_KERNEL);
+	ioeventfd = kzalloc(sizeof(*ioeventfd), GFP_KERNEL_ACCOUNT);
 	if (!ioeventfd) {
 		ret = -ENOMEM;
 		goto out_unlock;
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index 497a17b37865..29c564b7a6e1 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -112,7 +112,7 @@ int vfio_virqfd_enable(void *opaque,
 	int ret = 0;
 	__poll_t events;
 
-	virqfd = kzalloc(sizeof(*virqfd), GFP_KERNEL);
+	virqfd = kzalloc(sizeof(*virqfd), GFP_KERNEL_ACCOUNT);
 	if (!virqfd)
 		return -ENOMEM;
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V1 vfio 4/6] vfio/hisi: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
  2023-01-08 15:44 [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations Yishai Hadas
                   ` (2 preceding siblings ...)
  2023-01-08 15:44 ` [PATCH V1 vfio 3/6] vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations Yishai Hadas
@ 2023-01-08 15:44 ` Yishai Hadas
  2023-01-08 15:44 ` [PATCH V1 vfio 5/6] vfio/fsl-mc: " Yishai Hadas
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Yishai Hadas @ 2023-01-08 15:44 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, diana.craciun,
	eric.auger, yishaih, maorg, cohuck, shameerali.kolothum.thodi

Use GFP_KERNEL_ACCOUNT for userspace persistent allocations.

The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this
is untrusted allocation triggered from userspace and should be a subject
of kmem accountingis, and as such it is controlled by the cgroup
mechanism.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 0bba3b05c6c7..a117eaf21c14 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -744,7 +744,7 @@ hisi_acc_vf_pci_resume(struct hisi_acc_vf_core_device *hisi_acc_vdev)
 {
 	struct hisi_acc_vf_migration_file *migf;
 
-	migf = kzalloc(sizeof(*migf), GFP_KERNEL);
+	migf = kzalloc(sizeof(*migf), GFP_KERNEL_ACCOUNT);
 	if (!migf)
 		return ERR_PTR(-ENOMEM);
 
@@ -863,7 +863,7 @@ hisi_acc_open_saving_migf(struct hisi_acc_vf_core_device *hisi_acc_vdev)
 	struct hisi_acc_vf_migration_file *migf;
 	int ret;
 
-	migf = kzalloc(sizeof(*migf), GFP_KERNEL);
+	migf = kzalloc(sizeof(*migf), GFP_KERNEL_ACCOUNT);
 	if (!migf)
 		return ERR_PTR(-ENOMEM);
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V1 vfio 5/6] vfio/fsl-mc: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
  2023-01-08 15:44 [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations Yishai Hadas
                   ` (3 preceding siblings ...)
  2023-01-08 15:44 ` [PATCH V1 vfio 4/6] vfio/hisi: " Yishai Hadas
@ 2023-01-08 15:44 ` Yishai Hadas
  2023-01-08 15:44 ` [PATCH V1 vfio 6/6] vfio/platform: " Yishai Hadas
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Yishai Hadas @ 2023-01-08 15:44 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, diana.craciun,
	eric.auger, yishaih, maorg, cohuck, shameerali.kolothum.thodi

Use GFP_KERNEL_ACCOUNT for userspace persistent allocations.

The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this
is untrusted allocation triggered from userspace and should be a subject
of kmem accountingis, and as such it is controlled by the cgroup
mechanism.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/fsl-mc/vfio_fsl_mc.c      | 2 +-
 drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index defeb8510ace..c89a047a4cd8 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -28,7 +28,7 @@ static int vfio_fsl_mc_open_device(struct vfio_device *core_vdev)
 	int i;
 
 	vdev->regions = kcalloc(count, sizeof(struct vfio_fsl_mc_region),
-				GFP_KERNEL);
+				GFP_KERNEL_ACCOUNT);
 	if (!vdev->regions)
 		return -ENOMEM;
 
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c b/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c
index 64d01f3fb13d..c51229fccbd6 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c
@@ -29,7 +29,7 @@ static int vfio_fsl_mc_irqs_allocate(struct vfio_fsl_mc_device *vdev)
 
 	irq_count = mc_dev->obj_desc.irq_count;
 
-	mc_irq = kcalloc(irq_count, sizeof(*mc_irq), GFP_KERNEL);
+	mc_irq = kcalloc(irq_count, sizeof(*mc_irq), GFP_KERNEL_ACCOUNT);
 	if (!mc_irq)
 		return -ENOMEM;
 
@@ -77,7 +77,7 @@ static int vfio_set_trigger(struct vfio_fsl_mc_device *vdev,
 	if (fd < 0) /* Disable only */
 		return 0;
 
-	irq->name = kasprintf(GFP_KERNEL, "vfio-irq[%d](%s)",
+	irq->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-irq[%d](%s)",
 			    hwirq, dev_name(&vdev->mc_dev->dev));
 	if (!irq->name)
 		return -ENOMEM;
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V1 vfio 6/6] vfio/platform: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
  2023-01-08 15:44 [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations Yishai Hadas
                   ` (4 preceding siblings ...)
  2023-01-08 15:44 ` [PATCH V1 vfio 5/6] vfio/fsl-mc: " Yishai Hadas
@ 2023-01-08 15:44 ` Yishai Hadas
  2023-01-09 13:11 ` [PATCH V1 vfio 0/6] Move to use cgroups " Jason Gunthorpe
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Yishai Hadas @ 2023-01-08 15:44 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, diana.craciun,
	eric.auger, yishaih, maorg, cohuck, shameerali.kolothum.thodi

Use GFP_KERNEL_ACCOUNT for userspace persistent allocations.

The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this
is untrusted allocation triggered from userspace and should be a subject
of kmem accountingis, and as such it is controlled by the cgroup
mechanism.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/platform/vfio_platform_common.c | 2 +-
 drivers/vfio/platform/vfio_platform_irq.c    | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c
index 1a0a238ffa35..278c92cde555 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -142,7 +142,7 @@ static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
 		cnt++;
 
 	vdev->regions = kcalloc(cnt, sizeof(struct vfio_platform_region),
-				GFP_KERNEL);
+				GFP_KERNEL_ACCOUNT);
 	if (!vdev->regions)
 		return -ENOMEM;
 
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index c5b09ec0a3c9..665197caed89 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -186,9 +186,8 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
 
 	if (fd < 0) /* Disable only */
 		return 0;
-
-	irq->name = kasprintf(GFP_KERNEL, "vfio-irq[%d](%s)",
-						irq->hwirq, vdev->name);
+	irq->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-irq[%d](%s)",
+			      irq->hwirq, vdev->name);
 	if (!irq->name)
 		return -ENOMEM;
 
@@ -286,7 +285,8 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
 	while (vdev->get_irq(vdev, cnt) >= 0)
 		cnt++;
 
-	vdev->irqs = kcalloc(cnt, sizeof(struct vfio_platform_irq), GFP_KERNEL);
+	vdev->irqs = kcalloc(cnt, sizeof(struct vfio_platform_irq),
+			     GFP_KERNEL_ACCOUNT);
 	if (!vdev->irqs)
 		return -ENOMEM;
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations
  2023-01-08 15:44 [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations Yishai Hadas
                   ` (5 preceding siblings ...)
  2023-01-08 15:44 ` [PATCH V1 vfio 6/6] vfio/platform: " Yishai Hadas
@ 2023-01-09 13:11 ` Jason Gunthorpe
  2023-01-17 23:38 ` Alex Williamson
  2023-01-23 19:37 ` Alex Williamson
  8 siblings, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2023-01-09 13:11 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: alex.williamson, kvm, kevin.tian, joao.m.martins, leonro,
	diana.craciun, eric.auger, maorg, cohuck,
	shameerali.kolothum.thodi

On Sun, Jan 08, 2023 at 05:44:21PM +0200, Yishai Hadas wrote:
> This series changes the vfio and its sub drivers to use
> GFP_KERNEL_ACCOUNT for userspace persistent allocations.
> 
> The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this
> is untrusted allocation triggered from userspace and should be a subject
> of kmem accountingis, and as such it is controlled by the cgroup
> mechanism. [1]
> 
> As part of this change, we allow loading in mlx5 driver larger images
> than 512 MB by dropping the arbitrary hard-coded value that we have
> today and move to use the max device loading value which is for now 4GB.
> 
> In addition, the first patch from the series fixes a UBSAN note in mlx5
> that was reported once the kernel was compiled with this option.
> 
> [1] https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html
> 
> Changes from V0: https://www.spinics.net/lists/kvm/msg299508.html
> Patch #2 - Fix MAX_LOAD_SIZE to use BIT_ULL instead of BIT as was
>            reported by the krobot test.
> 
> Yishai
> 
> Jason Gunthorpe (1):
>   vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
> 
> Yishai Hadas (5):
>   vfio/mlx5: Fix UBSAN note
>   vfio/mlx5: Allow loading of larger images than 512 MB
>   vfio/hisi: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
>   vfio/fsl-mc: Use GFP_KERNEL_ACCOUNT for userspace persistent
>     allocations
>   vfio/platform: Use GFP_KERNEL_ACCOUNT for userspace persistent
>     allocations

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 vfio 3/6] vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
  2023-01-08 15:44 ` [PATCH V1 vfio 3/6] vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations Yishai Hadas
@ 2023-01-09 15:56   ` Joao Martins
  2023-01-09 16:09     ` Yishai Hadas
  0 siblings, 1 reply; 16+ messages in thread
From: Joao Martins @ 2023-01-09 15:56 UTC (permalink / raw)
  To: Yishai Hadas, jgg
  Cc: kvm, kevin.tian, leonro, diana.craciun, eric.auger, maorg,
	cohuck, shameerali.kolothum.thodi, alex.williamson

On 08/01/2023 15:44, Yishai Hadas wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
> 
> Use GFP_KERNEL_ACCOUNT for userspace persistent allocations.
> 
> The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this
> is untrusted allocation triggered from userspace and should be a subject
> of kmem accountingis, and as such it is controlled by the cgroup
> mechanism.
> 
> The way to find the relevant allocations was for example to look at the
> close_device function and trace back all the kfrees to their
> allocations.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/vfio/container.c           |  2 +-
>  drivers/vfio/pci/vfio_pci_config.c |  6 +++---
>  drivers/vfio/pci/vfio_pci_core.c   |  7 ++++---
>  drivers/vfio/pci/vfio_pci_igd.c    |  2 +-
>  drivers/vfio/pci/vfio_pci_intrs.c  | 10 ++++++----
>  drivers/vfio/pci/vfio_pci_rdwr.c   |  2 +-
>  drivers/vfio/virqfd.c              |  2 +-
>  7 files changed, 17 insertions(+), 14 deletions(-)
>

I am not sure, but should we add the call in the kzalloc done in
iova_bitmap_init() too ? It is called from DMA_LOGGING_REPORT | FEATURE_GET. It
is not persistent though, but userspace triggerable.

	Joao

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 vfio 3/6] vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
  2023-01-09 15:56   ` Joao Martins
@ 2023-01-09 16:09     ` Yishai Hadas
  2023-01-09 16:21       ` Joao Martins
  0 siblings, 1 reply; 16+ messages in thread
From: Yishai Hadas @ 2023-01-09 16:09 UTC (permalink / raw)
  To: Joao Martins, jgg
  Cc: kvm, kevin.tian, leonro, diana.craciun, eric.auger, maorg,
	cohuck, shameerali.kolothum.thodi, alex.williamson

On 09/01/2023 17:56, Joao Martins wrote:
> On 08/01/2023 15:44, Yishai Hadas wrote:
>> From: Jason Gunthorpe <jgg@nvidia.com>
>>
>> Use GFP_KERNEL_ACCOUNT for userspace persistent allocations.
>>
>> The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this
>> is untrusted allocation triggered from userspace and should be a subject
>> of kmem accountingis, and as such it is controlled by the cgroup
>> mechanism.
>>
>> The way to find the relevant allocations was for example to look at the
>> close_device function and trace back all the kfrees to their
>> allocations.
>>
>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>>   drivers/vfio/container.c           |  2 +-
>>   drivers/vfio/pci/vfio_pci_config.c |  6 +++---
>>   drivers/vfio/pci/vfio_pci_core.c   |  7 ++++---
>>   drivers/vfio/pci/vfio_pci_igd.c    |  2 +-
>>   drivers/vfio/pci/vfio_pci_intrs.c  | 10 ++++++----
>>   drivers/vfio/pci/vfio_pci_rdwr.c   |  2 +-
>>   drivers/vfio/virqfd.c              |  2 +-
>>   7 files changed, 17 insertions(+), 14 deletions(-)
>>
> I am not sure, but should we add the call in the kzalloc done in
> iova_bitmap_init() too ? It is called from DMA_LOGGING_REPORT | FEATURE_GET. It
> is not persistent though, but userspace triggerable.
>
> 	Joao

You referred to the allocation inside iova_bitmap_alloc() I assume, right ?

In any case, we count persistent allocations, so there is no need.

Yishai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 vfio 3/6] vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
  2023-01-09 16:09     ` Yishai Hadas
@ 2023-01-09 16:21       ` Joao Martins
  0 siblings, 0 replies; 16+ messages in thread
From: Joao Martins @ 2023-01-09 16:21 UTC (permalink / raw)
  To: Yishai Hadas, jgg
  Cc: kvm, kevin.tian, leonro, diana.craciun, eric.auger, maorg,
	cohuck, shameerali.kolothum.thodi, alex.williamson



On 09/01/2023 16:09, Yishai Hadas wrote:
> On 09/01/2023 17:56, Joao Martins wrote:
>> On 08/01/2023 15:44, Yishai Hadas wrote:
>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>>
>>> Use GFP_KERNEL_ACCOUNT for userspace persistent allocations.
>>>
>>> The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this
>>> is untrusted allocation triggered from userspace and should be a subject
>>> of kmem accountingis, and as such it is controlled by the cgroup
>>> mechanism.
>>>
>>> The way to find the relevant allocations was for example to look at the
>>> close_device function and trace back all the kfrees to their
>>> allocations.
>>>
>>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>>> ---
>>>   drivers/vfio/container.c           |  2 +-
>>>   drivers/vfio/pci/vfio_pci_config.c |  6 +++---
>>>   drivers/vfio/pci/vfio_pci_core.c   |  7 ++++---
>>>   drivers/vfio/pci/vfio_pci_igd.c    |  2 +-
>>>   drivers/vfio/pci/vfio_pci_intrs.c  | 10 ++++++----
>>>   drivers/vfio/pci/vfio_pci_rdwr.c   |  2 +-
>>>   drivers/vfio/virqfd.c              |  2 +-
>>>   7 files changed, 17 insertions(+), 14 deletions(-)
>>>
>> I am not sure, but should we add the call in the kzalloc done in
>> iova_bitmap_init() too ? It is called from DMA_LOGGING_REPORT | FEATURE_GET. It
>> is not persistent though, but userspace triggerable.
>>
>>     Joao
> 
> You referred to the allocation inside iova_bitmap_alloc() I assume, right ?
> 
Yes.

> In any case, we count persistent allocations, so there is no need.

OK

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations
  2023-01-08 15:44 [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations Yishai Hadas
                   ` (6 preceding siblings ...)
  2023-01-09 13:11 ` [PATCH V1 vfio 0/6] Move to use cgroups " Jason Gunthorpe
@ 2023-01-17 23:38 ` Alex Williamson
  2023-01-18 15:15   ` Jason Gunthorpe
  2023-01-23 19:37 ` Alex Williamson
  8 siblings, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2023-01-17 23:38 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: jgg, kvm, kevin.tian, joao.m.martins, leonro, diana.craciun,
	eric.auger, maorg, cohuck, shameerali.kolothum.thodi

On Sun, 8 Jan 2023 17:44:21 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:

> This series changes the vfio and its sub drivers to use
> GFP_KERNEL_ACCOUNT for userspace persistent allocations.
> 
> The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this
> is untrusted allocation triggered from userspace and should be a subject
> of kmem accountingis, and as such it is controlled by the cgroup
> mechanism. [1]
> 
> As part of this change, we allow loading in mlx5 driver larger images
> than 512 MB by dropping the arbitrary hard-coded value that we have
> today and move to use the max device loading value which is for now 4GB.
> 
> In addition, the first patch from the series fixes a UBSAN note in mlx5
> that was reported once the kernel was compiled with this option.
> 
> [1] https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html
> 
> Changes from V0: https://www.spinics.net/lists/kvm/msg299508.html
> Patch #2 - Fix MAX_LOAD_SIZE to use BIT_ULL instead of BIT as was
>            reported by the krobot test.
> 
> Yishai
> 
> Jason Gunthorpe (1):
>   vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
> 
> Yishai Hadas (5):
>   vfio/mlx5: Fix UBSAN note
>   vfio/mlx5: Allow loading of larger images than 512 MB
>   vfio/hisi: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
>   vfio/fsl-mc: Use GFP_KERNEL_ACCOUNT for userspace persistent
>     allocations
>   vfio/platform: Use GFP_KERNEL_ACCOUNT for userspace persistent
>     allocations
> 
>  drivers/vfio/container.c                      |  2 +-
>  drivers/vfio/fsl-mc/vfio_fsl_mc.c             |  2 +-
>  drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c        |  4 ++--
>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |  4 ++--
>  drivers/vfio/pci/mlx5/cmd.c                   | 17 +++++++++--------
>  drivers/vfio/pci/mlx5/main.c                  | 19 ++++++++++---------
>  drivers/vfio/pci/vfio_pci_config.c            |  6 +++---
>  drivers/vfio/pci/vfio_pci_core.c              |  7 ++++---
>  drivers/vfio/pci/vfio_pci_igd.c               |  2 +-
>  drivers/vfio/pci/vfio_pci_intrs.c             | 10 ++++++----
>  drivers/vfio/pci/vfio_pci_rdwr.c              |  2 +-
>  drivers/vfio/platform/vfio_platform_common.c  |  2 +-
>  drivers/vfio/platform/vfio_platform_irq.c     |  8 ++++----
>  drivers/vfio/virqfd.c                         |  2 +-
>  14 files changed, 46 insertions(+), 41 deletions(-)

The type1 IOMMU backend is notably absent here among the core files, any
reason?  Potentially this removes the dma_avail issue as a means to
prevent userspace from creating an arbitrarily large number of DMA
mappings, right?  For compatibility, this might mean setting the DMA
entry limit to an excessive number instead as some use cases can
already hit the U16_MAX limit now.

Are there any compatibility issues we should expect with this change to
accounting otherwise?

Also a nit, the commit logs have a persistent typo throughout
"accountingis".  I can fix on commit or if we decide on a respin please
fix.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations
  2023-01-17 23:38 ` Alex Williamson
@ 2023-01-18 15:15   ` Jason Gunthorpe
  2023-01-18 17:40     ` Alex Williamson
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2023-01-18 15:15 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, kvm, kevin.tian, joao.m.martins, leonro,
	diana.craciun, eric.auger, maorg, cohuck,
	shameerali.kolothum.thodi

On Tue, Jan 17, 2023 at 04:38:11PM -0700, Alex Williamson wrote:

> The type1 IOMMU backend is notably absent here among the core files, any
> reason?

Mostly fear that charging alot of memory to the cgroup will become a
compatibility problem. I'd be happier if we go along the iommufd path
some more and get some field results from cgroup enablement before we
think about tackling type 1.

With this series in normal non-abusive cases this is probably only a
few pages of ram so it shouldn't be a problem. mlx5 needs huge amounts
of memory but it is new so anyone deploying a new mlx5 configuration
can set their cgroup accordingly.

If we fully do type 1 we are looking at alot of memory. eg a 1TB
guest will charge 2GB just in IOPTE structures at 4k page size. Maybe
that is enough to be a problem, I don't know.

> Potentially this removes the dma_avail issue as a means to
> prevent userspace from creating an arbitrarily large number of DMA
> mappings, right?

Yes, it is what iommufd did

> Are there any compatibility issues we should expect with this change to
> accounting otherwise?

Other than things might start hitting the limit, I don't think so?

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations
  2023-01-18 15:15   ` Jason Gunthorpe
@ 2023-01-18 17:40     ` Alex Williamson
  2023-01-23 15:19       ` Yishai Hadas
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2023-01-18 17:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, kvm, kevin.tian, joao.m.martins, leonro,
	diana.craciun, eric.auger, maorg, cohuck,
	shameerali.kolothum.thodi

On Wed, 18 Jan 2023 11:15:07 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Jan 17, 2023 at 04:38:11PM -0700, Alex Williamson wrote:
> 
> > The type1 IOMMU backend is notably absent here among the core files, any
> > reason?  
> 
> Mostly fear that charging alot of memory to the cgroup will become a
> compatibility problem. I'd be happier if we go along the iommufd path
> some more and get some field results from cgroup enablement before we
> think about tackling type 1.
> 
> With this series in normal non-abusive cases this is probably only a
> few pages of ram so it shouldn't be a problem. mlx5 needs huge amounts
> of memory but it is new so anyone deploying a new mlx5 configuration
> can set their cgroup accordingly.
> 
> If we fully do type 1 we are looking at alot of memory. eg a 1TB
> guest will charge 2GB just in IOPTE structures at 4k page size. Maybe
> that is enough to be a problem, I don't know.
> 
> > Potentially this removes the dma_avail issue as a means to
> > prevent userspace from creating an arbitrarily large number of DMA
> > mappings, right?  
> 
> Yes, it is what iommufd did
>  
> > Are there any compatibility issues we should expect with this change to
> > accounting otherwise?  
> 
> Other than things might start hitting the limit, I don't think so?

Ok, seems like enough FUD to limit the scope for now.  We'll need to
monitor this for native and compat mode iommufd use.  I'll fix the
commit log typo, assuming there are no further comments.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations
  2023-01-18 17:40     ` Alex Williamson
@ 2023-01-23 15:19       ` Yishai Hadas
  0 siblings, 0 replies; 16+ messages in thread
From: Yishai Hadas @ 2023-01-23 15:19 UTC (permalink / raw)
  To: Alex Williamson, Jason Gunthorpe
  Cc: kvm, kevin.tian, joao.m.martins, leonro, diana.craciun,
	eric.auger, maorg, cohuck, shameerali.kolothum.thodi

On 18/01/2023 19:40, Alex Williamson wrote:
> On Wed, 18 Jan 2023 11:15:07 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
>> On Tue, Jan 17, 2023 at 04:38:11PM -0700, Alex Williamson wrote:
>>
>>> The type1 IOMMU backend is notably absent here among the core files, any
>>> reason?
>> Mostly fear that charging alot of memory to the cgroup will become a
>> compatibility problem. I'd be happier if we go along the iommufd path
>> some more and get some field results from cgroup enablement before we
>> think about tackling type 1.
>>
>> With this series in normal non-abusive cases this is probably only a
>> few pages of ram so it shouldn't be a problem. mlx5 needs huge amounts
>> of memory but it is new so anyone deploying a new mlx5 configuration
>> can set their cgroup accordingly.
>>
>> If we fully do type 1 we are looking at alot of memory. eg a 1TB
>> guest will charge 2GB just in IOPTE structures at 4k page size. Maybe
>> that is enough to be a problem, I don't know.
>>
>>> Potentially this removes the dma_avail issue as a means to
>>> prevent userspace from creating an arbitrarily large number of DMA
>>> mappings, right?
>> Yes, it is what iommufd did
>>   
>>> Are there any compatibility issues we should expect with this change to
>>> accounting otherwise?
>> Other than things might start hitting the limit, I don't think so?
> Ok, seems like enough FUD to limit the scope for now.  We'll need to
> monitor this for native and compat mode iommufd use.  I'll fix the
> commit log typo, assuming there are no further comments.  Thanks,
>
> Alex
>
Alex,

Based on the above, can you please take it into your next branch ?

I would like to send soon some small series in mlx5 driver which 
improves some flows as part of pre_copy to reduce downtime as part of 
stop_copy.

Some code there should be sent on top of current series.

Thanks,
Yishai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations
  2023-01-08 15:44 [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations Yishai Hadas
                   ` (7 preceding siblings ...)
  2023-01-17 23:38 ` Alex Williamson
@ 2023-01-23 19:37 ` Alex Williamson
  8 siblings, 0 replies; 16+ messages in thread
From: Alex Williamson @ 2023-01-23 19:37 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: jgg, kvm, kevin.tian, joao.m.martins, leonro, diana.craciun,
	eric.auger, maorg, cohuck, shameerali.kolothum.thodi

On Sun, 8 Jan 2023 17:44:21 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:

> This series changes the vfio and its sub drivers to use
> GFP_KERNEL_ACCOUNT for userspace persistent allocations.
> 
> The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this
> is untrusted allocation triggered from userspace and should be a subject
> of kmem accountingis, and as such it is controlled by the cgroup
> mechanism. [1]
> 
> As part of this change, we allow loading in mlx5 driver larger images
> than 512 MB by dropping the arbitrary hard-coded value that we have
> today and move to use the max device loading value which is for now 4GB.
> 
> In addition, the first patch from the series fixes a UBSAN note in mlx5
> that was reported once the kernel was compiled with this option.
> 
> [1] https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html
> 
> Changes from V0: https://www.spinics.net/lists/kvm/msg299508.html
> Patch #2 - Fix MAX_LOAD_SIZE to use BIT_ULL instead of BIT as was
>            reported by the krobot test.
> 
> Yishai
> 
> Jason Gunthorpe (1):
>   vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
> 
> Yishai Hadas (5):
>   vfio/mlx5: Fix UBSAN note
>   vfio/mlx5: Allow loading of larger images than 512 MB
>   vfio/hisi: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations
>   vfio/fsl-mc: Use GFP_KERNEL_ACCOUNT for userspace persistent
>     allocations
>   vfio/platform: Use GFP_KERNEL_ACCOUNT for userspace persistent
>     allocations
> 
>  drivers/vfio/container.c                      |  2 +-
>  drivers/vfio/fsl-mc/vfio_fsl_mc.c             |  2 +-
>  drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c        |  4 ++--
>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |  4 ++--
>  drivers/vfio/pci/mlx5/cmd.c                   | 17 +++++++++--------
>  drivers/vfio/pci/mlx5/main.c                  | 19 ++++++++++---------
>  drivers/vfio/pci/vfio_pci_config.c            |  6 +++---
>  drivers/vfio/pci/vfio_pci_core.c              |  7 ++++---
>  drivers/vfio/pci/vfio_pci_igd.c               |  2 +-
>  drivers/vfio/pci/vfio_pci_intrs.c             | 10 ++++++----
>  drivers/vfio/pci/vfio_pci_rdwr.c              |  2 +-
>  drivers/vfio/platform/vfio_platform_common.c  |  2 +-
>  drivers/vfio/platform/vfio_platform_irq.c     |  8 ++++----
>  drivers/vfio/virqfd.c                         |  2 +-
>  14 files changed, 46 insertions(+), 41 deletions(-)
> 

Applied to vfio next branch for v6.3.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-01-23 19:38 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-08 15:44 [PATCH V1 vfio 0/6] Move to use cgroups for userspace persistent allocations Yishai Hadas
2023-01-08 15:44 ` [PATCH V1 vfio 1/6] vfio/mlx5: Fix UBSAN note Yishai Hadas
2023-01-08 15:44 ` [PATCH V1 vfio 2/6] vfio/mlx5: Allow loading of larger images than 512 MB Yishai Hadas
2023-01-08 15:44 ` [PATCH V1 vfio 3/6] vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocations Yishai Hadas
2023-01-09 15:56   ` Joao Martins
2023-01-09 16:09     ` Yishai Hadas
2023-01-09 16:21       ` Joao Martins
2023-01-08 15:44 ` [PATCH V1 vfio 4/6] vfio/hisi: " Yishai Hadas
2023-01-08 15:44 ` [PATCH V1 vfio 5/6] vfio/fsl-mc: " Yishai Hadas
2023-01-08 15:44 ` [PATCH V1 vfio 6/6] vfio/platform: " Yishai Hadas
2023-01-09 13:11 ` [PATCH V1 vfio 0/6] Move to use cgroups " Jason Gunthorpe
2023-01-17 23:38 ` Alex Williamson
2023-01-18 15:15   ` Jason Gunthorpe
2023-01-18 17:40     ` Alex Williamson
2023-01-23 15:19       ` Yishai Hadas
2023-01-23 19:37 ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.