linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/19] mm: Introduce a cgroup to limit the amount of locked and pinned memory
@ 2023-01-24  5:42 Alistair Popple
  2023-01-24  5:42 ` [RFC PATCH 01/19] mm: Introduce vm_account Alistair Popple
                   ` (20 more replies)
  0 siblings, 21 replies; 56+ messages in thread
From: Alistair Popple @ 2023-01-24  5:42 UTC (permalink / raw)
  To: linux-mm, cgroups
  Cc: linux-kernel, jgg, jhubbard, tjmercier, hannes, surenb, mkoutny,
	daniel, Alistair Popple

Having large amounts of unmovable or unreclaimable memory in a system
can lead to system instability due to increasing the likelihood of
encountering out-of-memory conditions. Therefore it is desirable to
limit the amount of memory users can lock or pin.

From userspace such limits can be enforced by setting
RLIMIT_MEMLOCK. However there is no standard method that drivers and
other in-kernel users can use to check and enforce this limit.

This has lead to a large number of inconsistencies in how limits are
enforced. For example some drivers will use mm->locked_mm while others
will use mm->pinned_mm or user->locked_mm. It is therefore possible to
have up to three times RLIMIT_MEMLOCKED pinned.

Having pinned memory limited per-task also makes it easy for users to
exceed the limit. For example drivers that pin memory with
pin_user_pages() it tends to remain pinned after fork. To deal with
this and other issues this series introduces a cgroup for tracking and
limiting the number of pages pinned or locked by tasks in the group.

However the existing behaviour with regards to the rlimit needs to be
maintained. Therefore the lesser of the two limits is
enforced. Furthermore having CAP_IPC_LOCK usually bypasses the rlimit,
but this bypass is not allowed for the cgroup.

The first part of this series converts existing drivers which
open-code the use of locked_mm/pinned_mm over to a common interface
which manages the refcounts of the associated task/mm/user
structs. This ensures accounting of pages is consistent and makes it
easier to add charging of the cgroup.

The second part of the series adds the cgroup and converts core mm
code such as mlock over to charging the cgroup before finally
introducing some selftests.

As I don't have access to systems with all the various devices I
haven't been able to test all driver changes. Any help there would be
appreciated.

Alistair Popple (19):
  mm: Introduce vm_account
  drivers/vhost: Convert to use vm_account
  drivers/vdpa: Convert vdpa to use the new vm_structure
  infiniband/umem: Convert to use vm_account
  RMDA/siw: Convert to use vm_account
  RDMA/usnic: convert to use vm_account
  vfio/type1: Charge pinned pages to pinned_vm instead of locked_vm
  vfio/spapr_tce: Convert accounting to pinned_vm
  io_uring: convert to use vm_account
  net: skb: Switch to using vm_account
  xdp: convert to use vm_account
  kvm/book3s_64_vio: Convert account_locked_vm() to vm_account_pinned()
  fpga: dfl: afu: convert to use vm_account
  mm: Introduce a cgroup for pinned memory
  mm/util: Extend vm_account to charge pages against the pin cgroup
  mm/util: Refactor account_locked_vm
  mm: Convert mmap and mlock to use account_locked_vm
  mm/mmap: Charge locked memory to pins cgroup
  selftests/vm: Add pins-cgroup selftest for mlock/mmap

 MAINTAINERS                              |   8 +-
 arch/powerpc/kvm/book3s_64_vio.c         |  10 +-
 arch/powerpc/mm/book3s64/iommu_api.c     |  29 +--
 drivers/fpga/dfl-afu-dma-region.c        |  11 +-
 drivers/fpga/dfl-afu.h                   |   1 +-
 drivers/infiniband/core/umem.c           |  16 +-
 drivers/infiniband/core/umem_odp.c       |   6 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c |  13 +-
 drivers/infiniband/hw/usnic/usnic_uiom.h |   1 +-
 drivers/infiniband/sw/siw/siw.h          |   2 +-
 drivers/infiniband/sw/siw/siw_mem.c      |  20 +--
 drivers/infiniband/sw/siw/siw_verbs.c    |  15 +-
 drivers/vdpa/vdpa_user/vduse_dev.c       |  20 +--
 drivers/vfio/vfio_iommu_spapr_tce.c      |  15 +-
 drivers/vfio/vfio_iommu_type1.c          |  59 +----
 drivers/vhost/vdpa.c                     |   9 +-
 drivers/vhost/vhost.c                    |   2 +-
 drivers/vhost/vhost.h                    |   1 +-
 include/linux/cgroup.h                   |  20 ++-
 include/linux/cgroup_subsys.h            |   4 +-
 include/linux/io_uring_types.h           |   3 +-
 include/linux/kvm_host.h                 |   1 +-
 include/linux/mm.h                       |   5 +-
 include/linux/mm_types.h                 |  88 ++++++++-
 include/linux/skbuff.h                   |   6 +-
 include/net/sock.h                       |   2 +-
 include/net/xdp_sock.h                   |   2 +-
 include/rdma/ib_umem.h                   |   1 +-
 io_uring/io_uring.c                      |  20 +--
 io_uring/notif.c                         |   4 +-
 io_uring/notif.h                         |  10 +-
 io_uring/rsrc.c                          |  38 +---
 io_uring/rsrc.h                          |   9 +-
 mm/Kconfig                               |  11 +-
 mm/Makefile                              |   1 +-
 mm/internal.h                            |   2 +-
 mm/mlock.c                               |  76 +------
 mm/mmap.c                                |  76 +++----
 mm/mremap.c                              |  54 +++--
 mm/pins_cgroup.c                         | 273 ++++++++++++++++++++++++-
 mm/secretmem.c                           |   6 +-
 mm/util.c                                | 196 +++++++++++++++--
 net/core/skbuff.c                        |  47 +---
 net/rds/message.c                        |   9 +-
 net/xdp/xdp_umem.c                       |  38 +--
 tools/testing/selftests/vm/Makefile      |   1 +-
 tools/testing/selftests/vm/pins-cgroup.c | 271 ++++++++++++++++++++++++-
 virt/kvm/kvm_main.c                      |   3 +-
 48 files changed, 1114 insertions(+), 401 deletions(-)
 create mode 100644 mm/pins_cgroup.c
 create mode 100644 tools/testing/selftests/vm/pins-cgroup.c

base-commit: 2241ab53cbb5cdb08a6b2d4688feb13971058f65
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2023-02-06 13:14 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-24  5:42 [RFC PATCH 00/19] mm: Introduce a cgroup to limit the amount of locked and pinned memory Alistair Popple
2023-01-24  5:42 ` [RFC PATCH 01/19] mm: Introduce vm_account Alistair Popple
2023-01-24  6:29   ` Christoph Hellwig
2023-01-24 14:32   ` Jason Gunthorpe
2023-01-30 11:36     ` Alistair Popple
2023-01-31 14:00   ` David Hildenbrand
2023-01-24  5:42 ` [RFC PATCH 02/19] drivers/vhost: Convert to use vm_account Alistair Popple
2023-01-24  5:55   ` Michael S. Tsirkin
2023-01-30 10:43     ` Alistair Popple
2023-01-24 14:34   ` Jason Gunthorpe
2023-01-24  5:42 ` [RFC PATCH 03/19] drivers/vdpa: Convert vdpa to use the new vm_structure Alistair Popple
2023-01-24 14:35   ` Jason Gunthorpe
2023-01-24  5:42 ` [RFC PATCH 04/19] infiniband/umem: Convert to use vm_account Alistair Popple
2023-01-24  5:42 ` [RFC PATCH 05/19] RMDA/siw: " Alistair Popple
2023-01-24 14:37   ` Jason Gunthorpe
2023-01-24 15:22     ` Bernard Metzler
2023-01-24 15:56     ` Bernard Metzler
2023-01-30 11:34       ` Alistair Popple
2023-01-30 13:27         ` Bernard Metzler
2023-01-24  5:42 ` [RFC PATCH 06/19] RDMA/usnic: convert " Alistair Popple
2023-01-24 14:41   ` Jason Gunthorpe
2023-01-30 11:10     ` Alistair Popple
2023-01-24  5:42 ` [RFC PATCH 07/19] vfio/type1: Charge pinned pages to pinned_vm instead of locked_vm Alistair Popple
2023-01-24  5:42 ` [RFC PATCH 08/19] vfio/spapr_tce: Convert accounting to pinned_vm Alistair Popple
2023-01-24  5:42 ` [RFC PATCH 09/19] io_uring: convert to use vm_account Alistair Popple
2023-01-24 14:44   ` Jason Gunthorpe
2023-01-30 11:12     ` Alistair Popple
2023-01-30 13:21       ` Jason Gunthorpe
2023-01-24  5:42 ` [RFC PATCH 10/19] net: skb: Switch to using vm_account Alistair Popple
2023-01-24 14:51   ` Jason Gunthorpe
2023-01-30 11:17     ` Alistair Popple
2023-02-06  4:36       ` Alistair Popple
2023-02-06 13:14         ` Jason Gunthorpe
2023-01-24  5:42 ` [RFC PATCH 11/19] xdp: convert to use vm_account Alistair Popple
2023-01-24  5:42 ` [RFC PATCH 12/19] kvm/book3s_64_vio: Convert account_locked_vm() to vm_account_pinned() Alistair Popple
2023-01-24  5:42 ` [RFC PATCH 13/19] fpga: dfl: afu: convert to use vm_account Alistair Popple
2023-01-24  5:42 ` [RFC PATCH 14/19] mm: Introduce a cgroup for pinned memory Alistair Popple
2023-01-27 21:44   ` Tejun Heo
2023-01-30 13:20     ` Jason Gunthorpe
2023-01-24  5:42 ` [RFC PATCH 15/19] mm/util: Extend vm_account to charge pages against the pin cgroup Alistair Popple
2023-01-24  5:42 ` [RFC PATCH 16/19] mm/util: Refactor account_locked_vm Alistair Popple
2023-01-24  5:42 ` [RFC PATCH 17/19] mm: Convert mmap and mlock to use account_locked_vm Alistair Popple
2023-01-24  5:42 ` [RFC PATCH 18/19] mm/mmap: Charge locked memory to pins cgroup Alistair Popple
2023-01-24  5:42 ` [RFC PATCH 19/19] selftests/vm: Add pins-cgroup selftest for mlock/mmap Alistair Popple
2023-01-24 18:26 ` [RFC PATCH 00/19] mm: Introduce a cgroup to limit the amount of locked and pinned memory Yosry Ahmed
2023-01-31  0:54   ` Alistair Popple
2023-01-31  5:14     ` Yosry Ahmed
2023-01-31 11:22       ` Alistair Popple
2023-01-31 19:49         ` Yosry Ahmed
2023-01-24 20:12 ` Jason Gunthorpe
2023-01-31 13:57   ` David Hildenbrand
2023-01-31 14:03     ` Jason Gunthorpe
2023-01-31 14:06       ` David Hildenbrand
2023-01-31 14:10         ` Jason Gunthorpe
2023-01-31 14:15           ` David Hildenbrand
2023-01-31 14:21             ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).