All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/36] Memory allocation profiling
@ 2024-02-21 19:40 Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 01/36] fix missing vmalloc.h includes Suren Baghdasaryan
                   ` (36 more replies)
  0 siblings, 37 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Overview:
Low overhead [1] per-callsite memory allocation profiling. Not just for
debug kernels, overhead low enough to be deployed in production.

Example output:
  root@moria-kvm:~# sort -rn /proc/allocinfo
   127664128    31168 mm/page_ext.c:270 func:alloc_page_ext
    56373248     4737 mm/slub.c:2259 func:alloc_slab_page
    14880768     3633 mm/readahead.c:247 func:page_cache_ra_unbounded
    14417920     3520 mm/mm_init.c:2530 func:alloc_large_system_hash
    13377536      234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
    11718656     2861 mm/filemap.c:1919 func:__filemap_get_folio
     9192960     2800 kernel/fork.c:307 func:alloc_thread_stack_node
     4206592        4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
     4136960     1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
     3940352      962 mm/memory.c:4214 func:alloc_anon_folio
     2894464    22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
     ...

Since v3:
 - Dropped patch changing string_get_size() [2] as not needed
 - Dropped patch modifying xfs allocators [3] as non needed,
   per Dave Chinner
 - Added Reviewed-by, per Kees Cook
 - Moved prepare_slab_obj_exts_hook() and alloc_slab_obj_exts() where they
   are used, per Vlastimil Babka
 - Fixed SLAB_NO_OBJ_EXT definition to use unused bit, per Vlastimil Babka
 - Refactored patch [4] into other patches, per Vlastimil Babka
 - Replaced snprintf() with seq_buf_printf(), per Kees Cook
 - Changed output to report bytes, per Andrew Morton and Pasha Tatashin
 - Changed output to report [module] only for loadable modules,
   per Vlastimil Babka
 - Moved mem_alloc_profiling_enabled() check earlier, per Vlastimil Babka
 - Changed the code to handle page splitting to be more understandable,
   per Vlastimil Babka
 - Moved alloc_tagging_slab_free_hook(), mark_objexts_empty(),
   mark_failed_objexts_alloc() and handle_failed_objexts_alloc(),
   per Vlastimil Babka
 - Fixed loss of __alloc_size(1, 2) in kvmalloc functions,
   per Vlastimil Babka
 - Refactored the code in show_mem() to avoid memory allocations,
   per Michal Hocko
 - Changed to trylock in show_mem() to avoid blocking in atomic context,
   per Tetsuo Handa
 - Added mm mailing list into MAINTAINERS, per Kees Cook
 - Added base commit SHA, per Andy Shevchenko
 - Added a patch with documentation, per Jani Nikula
 - Fixed 0day bugs
 - Added benchmark results [5], per Steven Rostedt
 - Rebased over Linux 6.8-rc5

Items not yet addressed:
 - An early_boot option to prevent pageext overhead. We are looking into
   ways for using the same sysctr instead of adding additional early boot
   parameter.

Usage:
kconfig options:
 - CONFIG_MEM_ALLOC_PROFILING
 - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
 - CONFIG_MEM_ALLOC_PROFILING_DEBUG
   adds warnings for allocations that weren't accounted because of a
   missing annotation

sysctl:
  /proc/sys/vm/mem_profiling

Runtime info:
  /proc/allocinfo

Notes:

[1]: Overhead
To measure the overhead we are comparing the following configurations:
(1) Baseline with CONFIG_MEMCG_KMEM=n
(2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)
(3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y)
(4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1)
(5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT
(6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)  && CONFIG_MEMCG_KMEM=y
(7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y

Performance overhead:
To evaluate performance we implemented an in-kernel test executing
multiple get_free_page/free_page and kmalloc/kfree calls with allocation
sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
affinity set to a specific CPU to minimize the noise. Below are results
from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
56 core Intel Xeon:

                        kmalloc                 pgalloc
(1 baseline)            6.764s                  16.902s
(2 default disabled)    6.793s  (+0.43%)        17.007s (+0.62%)
(3 default enabled)     7.197s  (+6.40%)        23.666s (+40.02%)
(4 runtime enabled)     7.405s  (+9.48%)        23.901s (+41.41%)
(5 memcg)               13.388s (+97.94%)       48.460s (+186.71%)
(6 def disabled+memcg)  13.332s (+97.10%)       48.105s (+184.61%)
(7 def enabled+memcg)   13.446s (+98.78%)       54.963s (+225.18%)

Memory overhead:
Kernel size:

   text           data        bss         dec         diff
(1) 26515311	      18890222    17018880    62424413
(2) 26524728	      19423818    16740352    62688898    264485
(3) 26524724	      19423818    16740352    62688894    264481
(4) 26524728	      19423818    16740352    62688898    264485
(5) 26541782	      18964374    16957440    62463596    39183

Memory consumption on a 56 core Intel CPU with 125GB of memory:
Code tags:           192 kB
PageExts:         262144 kB (256MB)
SlabExts:           9876 kB (9.6MB)
PcpuExts:            512 kB (0.5MB)

Total overhead is 0.2% of total memory.

[2] https://lore.kernel.org/all/20240212213922.783301-2-surenb@google.com/
[3] https://lore.kernel.org/all/20240212213922.783301-26-surenb@google.com/
[4] https://lore.kernel.org/all/20240212213922.783301-9-surenb@google.com/
[5] Benchmarks:

Hackbench tests run 100 times:
hackbench -s 512 -l 200 -g 15 -f 25 -P
      baseline       disabled profiling           enabled profiling
avg   0.3543         0.3559 (+0.0016)             0.3566 (+0.0023)
stdev 0.0137         0.0188                       0.0077


hackbench -l 10000
      baseline       disabled profiling           enabled profiling
avg   6.4218         6.4306 (+0.0088)             6.5077 (+0.0859)
stdev 0.0933         0.0286                       0.0489

stress-ng tests:
stress-ng --class memory --seq 4 -t 60
stress-ng --class cpu --seq 4 -t 60
Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/

Kent Overstreet (13):
  fix missing vmalloc.h includes
  asm-generic/io.h: Kill vmalloc.h dependency
  mm/slub: Mark slab_free_freelist_hook() __always_inline
  scripts/kallysms: Always include __start and __stop symbols
  fs: Convert alloc_inode_sb() to a macro
  rust: Add a rust helper for krealloc()
  mempool: Hook up to memory allocation profiling
  mm: percpu: Introduce pcpuobj_ext
  mm: percpu: Add codetag reference into pcpuobj_ext
  mm: vmalloc: Enable memory allocation profiling
  rhashtable: Plumb through alloc tag
  MAINTAINERS: Add entries for code tagging and memory allocation
    profiling
  memprofiling: Documentation

Suren Baghdasaryan (23):
  mm: enumerate all gfp flags
  mm: introduce slabobj_ext to support slab object extensions
  mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext
    creation
  mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation
  slab: objext: introduce objext_flags as extension to
    page_memcg_data_flags
  lib: code tagging framework
  lib: code tagging module support
  lib: prevent module unloading if memory is not freed
  lib: add allocation tagging support for memory allocation profiling
  lib: introduce support for page allocation tagging
  mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation
    tags
  change alloc_pages name in dma_map_ops to avoid name conflicts
  mm: enable page allocation tagging
  mm: create new codetag references during page splitting
  mm/page_ext: enable early_page_ext when
    CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
  lib: add codetag reference into slabobj_ext
  mm/slab: add allocation accounting into slab allocation and free paths
  mm/slab: enable slab allocation tagging for kmalloc and friends
  mm: percpu: enable per-cpu allocation tagging
  lib: add memory allocations report in show_mem()
  codetag: debug: skip objext checking when it's for objext itself
  codetag: debug: mark codetags for reserved pages as empty
  codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext
    allocations

 Documentation/admin-guide/sysctl/vm.rst       |  16 +
 Documentation/filesystems/proc.rst            |  29 ++
 Documentation/mm/allocation-profiling.rst     |  86 ++++++
 MAINTAINERS                                   |  17 ++
 arch/alpha/kernel/pci_iommu.c                 |   2 +-
 arch/alpha/lib/checksum.c                     |   1 +
 arch/alpha/lib/fpreg.c                        |   1 +
 arch/alpha/lib/memcpy.c                       |   1 +
 arch/arm/kernel/irq.c                         |   1 +
 arch/arm/kernel/traps.c                       |   1 +
 arch/arm64/kernel/efi.c                       |   1 +
 arch/loongarch/include/asm/kfence.h           |   1 +
 arch/mips/jazz/jazzdma.c                      |   2 +-
 arch/powerpc/kernel/dma-iommu.c               |   2 +-
 arch/powerpc/kernel/iommu.c                   |   1 +
 arch/powerpc/mm/mem.c                         |   1 +
 arch/powerpc/platforms/ps3/system-bus.c       |   4 +-
 arch/powerpc/platforms/pseries/vio.c          |   2 +-
 arch/riscv/kernel/elf_kexec.c                 |   1 +
 arch/riscv/kernel/probes/kprobes.c            |   1 +
 arch/s390/kernel/cert_store.c                 |   1 +
 arch/s390/kernel/ipl.c                        |   1 +
 arch/x86/include/asm/io.h                     |   1 +
 arch/x86/kernel/amd_gart_64.c                 |   2 +-
 arch/x86/kernel/cpu/sgx/main.c                |   1 +
 arch/x86/kernel/irq_64.c                      |   1 +
 arch/x86/mm/fault.c                           |   1 +
 drivers/accel/ivpu/ivpu_mmu_context.c         |   1 +
 drivers/gpu/drm/gma500/mmu.c                  |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |   1 +
 .../gpu/drm/i915/gem/selftests/mock_dmabuf.c  |   1 +
 drivers/gpu/drm/i915/gt/shmem_utils.c         |   1 +
 drivers/gpu/drm/i915/gvt/firmware.c           |   1 +
 drivers/gpu/drm/i915/gvt/gtt.c                |   1 +
 drivers/gpu/drm/i915/gvt/handlers.c           |   1 +
 drivers/gpu/drm/i915/gvt/mmio.c               |   1 +
 drivers/gpu/drm/i915/gvt/vgpu.c               |   1 +
 drivers/gpu/drm/i915/intel_gvt.c              |   1 +
 drivers/gpu/drm/imagination/pvr_vm_mips.c     |   1 +
 drivers/gpu/drm/mediatek/mtk_drm_gem.c        |   1 +
 drivers/gpu/drm/omapdrm/omap_gem.c            |   1 +
 drivers/gpu/drm/v3d/v3d_bo.c                  |   1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_binding.c       |   1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c           |   1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_devcaps.c       |   1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c           |   1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c       |   1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c         |   1 +
 drivers/gpu/drm/xen/xen_drm_front_gem.c       |   1 +
 drivers/hwtracing/coresight/coresight-trbe.c  |   1 +
 drivers/iommu/dma-iommu.c                     |   2 +-
 .../marvell/octeon_ep/octep_pfvf_mbox.c       |   1 +
 .../net/ethernet/microsoft/mana/hw_channel.c  |   1 +
 drivers/parisc/ccio-dma.c                     |   2 +-
 drivers/parisc/sba_iommu.c                    |   2 +-
 drivers/platform/x86/uv_sysfs.c               |   1 +
 drivers/scsi/mpi3mr/mpi3mr_transport.c        |   2 +
 drivers/staging/media/atomisp/pci/hmm/hmm.c   |   2 +-
 drivers/vfio/pci/pds/dirty.c                  |   1 +
 drivers/virt/acrn/mm.c                        |   1 +
 drivers/virtio/virtio_mem.c                   |   1 +
 drivers/xen/grant-dma-ops.c                   |   2 +-
 drivers/xen/swiotlb-xen.c                     |   2 +-
 include/asm-generic/codetag.lds.h             |  14 +
 include/asm-generic/io.h                      |   1 -
 include/asm-generic/vmlinux.lds.h             |   3 +
 include/linux/alloc_tag.h                     | 195 ++++++++++++
 include/linux/codetag.h                       |  81 +++++
 include/linux/dma-map-ops.h                   |   2 +-
 include/linux/fortify-string.h                |   5 +-
 include/linux/fs.h                            |   6 +-
 include/linux/gfp.h                           | 126 +++++---
 include/linux/gfp_types.h                     | 101 +++++--
 include/linux/memcontrol.h                    |  56 +++-
 include/linux/mempool.h                       |  73 +++--
 include/linux/mm.h                            |   9 +
 include/linux/mm_types.h                      |   4 +-
 include/linux/page_ext.h                      |   1 -
 include/linux/pagemap.h                       |   9 +-
 include/linux/pds/pds_common.h                |   2 +
 include/linux/percpu.h                        |  27 +-
 include/linux/pgalloc_tag.h                   | 110 +++++++
 include/linux/rhashtable-types.h              |  11 +-
 include/linux/sched.h                         |  24 ++
 include/linux/slab.h                          | 175 +++++------
 include/linux/string.h                        |   4 +-
 include/linux/vmalloc.h                       |  60 +++-
 include/rdma/rdmavt_qp.h                      |   1 +
 init/Kconfig                                  |   4 +
 kernel/dma/mapping.c                          |   4 +-
 kernel/kallsyms_selftest.c                    |   2 +-
 kernel/module/main.c                          |  25 +-
 lib/Kconfig.debug                             |  31 ++
 lib/Makefile                                  |   3 +
 lib/alloc_tag.c                               | 204 +++++++++++++
 lib/codetag.c                                 | 283 ++++++++++++++++++
 lib/rhashtable.c                              |  28 +-
 mm/compaction.c                               |   7 +-
 mm/debug_vm_pgtable.c                         |   1 +
 mm/filemap.c                                  |   6 +-
 mm/huge_memory.c                              |   2 +
 mm/kfence/core.c                              |  14 +-
 mm/kfence/kfence.h                            |   4 +-
 mm/memcontrol.c                               |  56 +---
 mm/mempolicy.c                                |  52 ++--
 mm/mempool.c                                  |  36 +--
 mm/mm_init.c                                  |  13 +-
 mm/nommu.c                                    |  64 ++--
 mm/page_alloc.c                               |  66 ++--
 mm/page_ext.c                                 |  13 +
 mm/page_owner.c                               |   2 +-
 mm/percpu-internal.h                          |  26 +-
 mm/percpu.c                                   | 120 +++-----
 mm/show_mem.c                                 |  26 ++
 mm/slab.h                                     | 126 ++++++--
 mm/slab_common.c                              |   6 +-
 mm/slub.c                                     | 244 +++++++++++----
 mm/util.c                                     |  44 +--
 mm/vmalloc.c                                  |  88 +++---
 rust/helpers.c                                |   8 +
 scripts/kallsyms.c                            |  13 +
 scripts/module.lds.S                          |   7 +
 sound/pci/hda/cs35l41_hda.c                   |   1 +
 123 files changed, 2269 insertions(+), 682 deletions(-)
 create mode 100644 Documentation/mm/allocation-profiling.rst
 create mode 100644 include/asm-generic/codetag.lds.h
 create mode 100644 include/linux/alloc_tag.h
 create mode 100644 include/linux/codetag.h
 create mode 100644 include/linux/pgalloc_tag.h
 create mode 100644 lib/alloc_tag.c
 create mode 100644 lib/codetag.c


base-commit: 39133352cbed6626956d38ed72012f49b0421e7b
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v4 01/36] fix missing vmalloc.h includes
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 21:09   ` Pasha Tatashin
  2024-02-21 19:40 ` [PATCH v4 02/36] asm-generic/io.h: Kill vmalloc.h dependency Suren Baghdasaryan
                   ` (35 subsequent siblings)
  36 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

From: Kent Overstreet <kent.overstreet@linux.dev>

The next patch drops vmalloc.h from a system header in order to fix
a circular dependency; this adds it to all the files that were pulling
it in implicitly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 arch/alpha/lib/checksum.c                                | 1 +
 arch/alpha/lib/fpreg.c                                   | 1 +
 arch/alpha/lib/memcpy.c                                  | 1 +
 arch/arm/kernel/irq.c                                    | 1 +
 arch/arm/kernel/traps.c                                  | 1 +
 arch/arm64/kernel/efi.c                                  | 1 +
 arch/loongarch/include/asm/kfence.h                      | 1 +
 arch/powerpc/kernel/iommu.c                              | 1 +
 arch/powerpc/mm/mem.c                                    | 1 +
 arch/riscv/kernel/elf_kexec.c                            | 1 +
 arch/riscv/kernel/probes/kprobes.c                       | 1 +
 arch/s390/kernel/cert_store.c                            | 1 +
 arch/s390/kernel/ipl.c                                   | 1 +
 arch/x86/include/asm/io.h                                | 1 +
 arch/x86/kernel/cpu/sgx/main.c                           | 1 +
 arch/x86/kernel/irq_64.c                                 | 1 +
 arch/x86/mm/fault.c                                      | 1 +
 drivers/accel/ivpu/ivpu_mmu_context.c                    | 1 +
 drivers/gpu/drm/gma500/mmu.c                             | 1 +
 drivers/gpu/drm/i915/gem/i915_gem_pages.c                | 1 +
 drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c         | 1 +
 drivers/gpu/drm/i915/gt/shmem_utils.c                    | 1 +
 drivers/gpu/drm/i915/gvt/firmware.c                      | 1 +
 drivers/gpu/drm/i915/gvt/gtt.c                           | 1 +
 drivers/gpu/drm/i915/gvt/handlers.c                      | 1 +
 drivers/gpu/drm/i915/gvt/mmio.c                          | 1 +
 drivers/gpu/drm/i915/gvt/vgpu.c                          | 1 +
 drivers/gpu/drm/i915/intel_gvt.c                         | 1 +
 drivers/gpu/drm/imagination/pvr_vm_mips.c                | 1 +
 drivers/gpu/drm/mediatek/mtk_drm_gem.c                   | 1 +
 drivers/gpu/drm/omapdrm/omap_gem.c                       | 1 +
 drivers/gpu/drm/v3d/v3d_bo.c                             | 1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_binding.c                  | 1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c                      | 1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_devcaps.c                  | 1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c                      | 1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c                  | 1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c                    | 1 +
 drivers/gpu/drm/xen/xen_drm_front_gem.c                  | 1 +
 drivers/hwtracing/coresight/coresight-trbe.c             | 1 +
 drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c | 1 +
 drivers/net/ethernet/microsoft/mana/hw_channel.c         | 1 +
 drivers/platform/x86/uv_sysfs.c                          | 1 +
 drivers/scsi/mpi3mr/mpi3mr_transport.c                   | 2 ++
 drivers/vfio/pci/pds/dirty.c                             | 1 +
 drivers/virt/acrn/mm.c                                   | 1 +
 drivers/virtio/virtio_mem.c                              | 1 +
 include/linux/pds/pds_common.h                           | 2 ++
 include/rdma/rdmavt_qp.h                                 | 1 +
 mm/debug_vm_pgtable.c                                    | 1 +
 sound/pci/hda/cs35l41_hda.c                              | 1 +
 51 files changed, 53 insertions(+)

diff --git a/arch/alpha/lib/checksum.c b/arch/alpha/lib/checksum.c
index 3f35c3ed6948..c29b98ef9c82 100644
--- a/arch/alpha/lib/checksum.c
+++ b/arch/alpha/lib/checksum.c
@@ -14,6 +14,7 @@
 #include <linux/string.h>
 
 #include <asm/byteorder.h>
+#include <asm/checksum.h>
 
 static inline unsigned short from64to16(unsigned long x)
 {
diff --git a/arch/alpha/lib/fpreg.c b/arch/alpha/lib/fpreg.c
index 7c08b225261c..3d32165043f8 100644
--- a/arch/alpha/lib/fpreg.c
+++ b/arch/alpha/lib/fpreg.c
@@ -8,6 +8,7 @@
 #include <linux/compiler.h>
 #include <linux/export.h>
 #include <linux/preempt.h>
+#include <asm/fpu.h>
 #include <asm/thread_info.h>
 
 #if defined(CONFIG_ALPHA_EV6) || defined(CONFIG_ALPHA_EV67)
diff --git a/arch/alpha/lib/memcpy.c b/arch/alpha/lib/memcpy.c
index cbac3dc6d963..0e536a1a39ff 100644
--- a/arch/alpha/lib/memcpy.c
+++ b/arch/alpha/lib/memcpy.c
@@ -18,6 +18,7 @@
 
 #include <linux/types.h>
 #include <linux/export.h>
+#include <linux/string.h>
 
 /*
  * This should be done in one go with ldq_u*2/mask/stq_u. Do it
diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index fe28fc1f759d..dab42d066d06 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -32,6 +32,7 @@
 #include <linux/kallsyms.h>
 #include <linux/proc_fs.h>
 #include <linux/export.h>
+#include <linux/vmalloc.h>
 
 #include <asm/hardware/cache-l2x0.h>
 #include <asm/hardware/cache-uniphier.h>
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 3bad79db5d6e..27addbf0f98c 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -26,6 +26,7 @@
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
 #include <linux/irq.h>
+#include <linux/vmalloc.h>
 
 #include <linux/atomic.h>
 #include <asm/cacheflush.h>
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 0228001347be..a0dc6b88b11b 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -10,6 +10,7 @@
 #include <linux/efi.h>
 #include <linux/init.h>
 #include <linux/screen_info.h>
+#include <linux/vmalloc.h>
 
 #include <asm/efi.h>
 #include <asm/stacktrace.h>
diff --git a/arch/loongarch/include/asm/kfence.h b/arch/loongarch/include/asm/kfence.h
index 6c82aea1c993..54062656dc7b 100644
--- a/arch/loongarch/include/asm/kfence.h
+++ b/arch/loongarch/include/asm/kfence.h
@@ -10,6 +10,7 @@
 #define _ASM_LOONGARCH_KFENCE_H
 
 #include <linux/kfence.h>
+#include <linux/vmalloc.h>
 #include <asm/pgtable.h>
 #include <asm/tlb.h>
 
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index a9bebfd56b3b..25782d361884 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -26,6 +26,7 @@
 #include <linux/iommu.h>
 #include <linux/sched.h>
 #include <linux/debugfs.h>
+#include <linux/vmalloc.h>
 #include <asm/io.h>
 #include <asm/iommu.h>
 #include <asm/pci-bridge.h>
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 3a440004b97d..a197d4c2244b 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -16,6 +16,7 @@
 #include <linux/highmem.h>
 #include <linux/suspend.h>
 #include <linux/dma-direct.h>
+#include <linux/vmalloc.h>
 
 #include <asm/swiotlb.h>
 #include <asm/machdep.h>
diff --git a/arch/riscv/kernel/elf_kexec.c b/arch/riscv/kernel/elf_kexec.c
index 5bd1ec3341fe..92b1e16f99c4 100644
--- a/arch/riscv/kernel/elf_kexec.c
+++ b/arch/riscv/kernel/elf_kexec.c
@@ -19,6 +19,7 @@
 #include <linux/libfdt.h>
 #include <linux/types.h>
 #include <linux/memblock.h>
+#include <linux/vmalloc.h>
 #include <asm/setup.h>
 
 int arch_kimage_file_post_load_cleanup(struct kimage *image)
diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
index 2f08c14a933d..71a8b8945b26 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -6,6 +6,7 @@
 #include <linux/extable.h>
 #include <linux/slab.h>
 #include <linux/stop_machine.h>
+#include <linux/vmalloc.h>
 #include <asm/ptrace.h>
 #include <linux/uaccess.h>
 #include <asm/sections.h>
diff --git a/arch/s390/kernel/cert_store.c b/arch/s390/kernel/cert_store.c
index 554447768bdd..bf983513dd33 100644
--- a/arch/s390/kernel/cert_store.c
+++ b/arch/s390/kernel/cert_store.c
@@ -21,6 +21,7 @@
 #include <linux/seq_file.h>
 #include <linux/slab.h>
 #include <linux/sysfs.h>
+#include <linux/vmalloc.h>
 #include <crypto/sha2.h>
 #include <keys/user-type.h>
 #include <asm/debug.h>
diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
index ba75f6bee774..0854a8450a6e 100644
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -20,6 +20,7 @@
 #include <linux/gfp.h>
 #include <linux/crash_dump.h>
 #include <linux/debug_locks.h>
+#include <linux/vmalloc.h>
 #include <asm/asm-extable.h>
 #include <asm/diag.h>
 #include <asm/ipl.h>
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 3814a9263d64..c6b799d28126 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -42,6 +42,7 @@
 #include <asm/early_ioremap.h>
 #include <asm/pgtable_types.h>
 #include <asm/shared/io.h>
+#include <asm/special_insns.h>
 
 #define build_mmio_read(name, size, type, reg, barrier) \
 static inline type name(const volatile void __iomem *addr) \
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 166692f2d501..27892e57c4ef 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -13,6 +13,7 @@
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
 #include <linux/sysfs.h>
+#include <linux/vmalloc.h>
 #include <asm/sgx.h>
 #include "driver.h"
 #include "encl.h"
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index fe0c859873d1..ade0043ce56e 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -18,6 +18,7 @@
 #include <linux/uaccess.h>
 #include <linux/smp.h>
 #include <linux/sched/task_stack.h>
+#include <linux/vmalloc.h>
 
 #include <asm/cpu_entry_area.h>
 #include <asm/softirq_stack.h>
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 679b09cfe241..af223e57aa63 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -20,6 +20,7 @@
 #include <linux/efi.h>			/* efi_crash_gracefully_on_page_fault()*/
 #include <linux/mm_types.h>
 #include <linux/mm.h>			/* find_and_lock_vma() */
+#include <linux/vmalloc.h>
 
 #include <asm/cpufeature.h>		/* boot_cpu_has, ...		*/
 #include <asm/traps.h>			/* dotraplinkage, ...		*/
diff --git a/drivers/accel/ivpu/ivpu_mmu_context.c b/drivers/accel/ivpu/ivpu_mmu_context.c
index fe6161299236..128aef8e5a19 100644
--- a/drivers/accel/ivpu/ivpu_mmu_context.c
+++ b/drivers/accel/ivpu/ivpu_mmu_context.c
@@ -6,6 +6,7 @@
 #include <linux/bitfield.h>
 #include <linux/highmem.h>
 #include <linux/set_memory.h>
+#include <linux/vmalloc.h>
 
 #include <drm/drm_cache.h>
 
diff --git a/drivers/gpu/drm/gma500/mmu.c b/drivers/gpu/drm/gma500/mmu.c
index a70b01ccdf70..4d78b33eaa82 100644
--- a/drivers/gpu/drm/gma500/mmu.c
+++ b/drivers/gpu/drm/gma500/mmu.c
@@ -5,6 +5,7 @@
  **************************************************************************/
 
 #include <linux/highmem.h>
+#include <linux/vmalloc.h>
 
 #include "mmu.h"
 #include "psb_drv.h"
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 0ba955611dfb..8780aa243105 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -5,6 +5,7 @@
  */
 
 #include <drm/drm_cache.h>
+#include <linux/vmalloc.h>
 
 #include "gt/intel_gt.h"
 #include "gt/intel_tlb.h"
diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c b/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
index b2a5882b8f81..075657018739 100644
--- a/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
@@ -4,6 +4,7 @@
  * Copyright © 2016 Intel Corporation
  */
 
+#include <linux/vmalloc.h>
 #include "mock_dmabuf.h"
 
 static struct sg_table *mock_map_dma_buf(struct dma_buf_attachment *attachment,
diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c
index bccc3a1200bc..1fb6ff77fd89 100644
--- a/drivers/gpu/drm/i915/gt/shmem_utils.c
+++ b/drivers/gpu/drm/i915/gt/shmem_utils.c
@@ -7,6 +7,7 @@
 #include <linux/mm.h>
 #include <linux/pagemap.h>
 #include <linux/shmem_fs.h>
+#include <linux/vmalloc.h>
 
 #include "i915_drv.h"
 #include "gem/i915_gem_object.h"
diff --git a/drivers/gpu/drm/i915/gvt/firmware.c b/drivers/gpu/drm/i915/gvt/firmware.c
index 4dd52ac2043e..d800d267f0e9 100644
--- a/drivers/gpu/drm/i915/gvt/firmware.c
+++ b/drivers/gpu/drm/i915/gvt/firmware.c
@@ -30,6 +30,7 @@
 
 #include <linux/firmware.h>
 #include <linux/crc32.h>
+#include <linux/vmalloc.h>
 
 #include "i915_drv.h"
 #include "gvt.h"
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 094fca9b0e73..58cca4906f41 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -39,6 +39,7 @@
 #include "trace.h"
 
 #include "gt/intel_gt_regs.h"
+#include <linux/vmalloc.h>
 
 #if defined(VERBOSE_DEBUG)
 #define gvt_vdbg_mm(fmt, args...) gvt_dbg_mm(fmt, ##args)
diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c
index efcb00472be2..ea9c30092767 100644
--- a/drivers/gpu/drm/i915/gvt/handlers.c
+++ b/drivers/gpu/drm/i915/gvt/handlers.c
@@ -52,6 +52,7 @@
 #include "display/skl_watermark_regs.h"
 #include "display/vlv_dsi_pll_regs.h"
 #include "gt/intel_gt_regs.h"
+#include <linux/vmalloc.h>
 
 /* XXX FIXME i915 has changed PP_XXX definition */
 #define PCH_PP_STATUS  _MMIO(0xc7200)
diff --git a/drivers/gpu/drm/i915/gvt/mmio.c b/drivers/gpu/drm/i915/gvt/mmio.c
index 5b5def6ddef7..780762f28aa4 100644
--- a/drivers/gpu/drm/i915/gvt/mmio.c
+++ b/drivers/gpu/drm/i915/gvt/mmio.c
@@ -33,6 +33,7 @@
  *
  */
 
+#include <linux/vmalloc.h>
 #include "i915_drv.h"
 #include "i915_reg.h"
 #include "gvt.h"
diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c b/drivers/gpu/drm/i915/gvt/vgpu.c
index 08ad1bd651f1..63c751ca4119 100644
--- a/drivers/gpu/drm/i915/gvt/vgpu.c
+++ b/drivers/gpu/drm/i915/gvt/vgpu.c
@@ -34,6 +34,7 @@
 #include "i915_drv.h"
 #include "gvt.h"
 #include "i915_pvinfo.h"
+#include <linux/vmalloc.h>
 
 void populate_pvinfo_page(struct intel_vgpu *vgpu)
 {
diff --git a/drivers/gpu/drm/i915/intel_gvt.c b/drivers/gpu/drm/i915/intel_gvt.c
index 9b6d87c8b583..5a01d60e5186 100644
--- a/drivers/gpu/drm/i915/intel_gvt.c
+++ b/drivers/gpu/drm/i915/intel_gvt.c
@@ -28,6 +28,7 @@
 #include "gt/intel_context.h"
 #include "gt/intel_ring.h"
 #include "gt/shmem_utils.h"
+#include <linux/vmalloc.h>
 
 /**
  * DOC: Intel GVT-g host support
diff --git a/drivers/gpu/drm/imagination/pvr_vm_mips.c b/drivers/gpu/drm/imagination/pvr_vm_mips.c
index b7fef3c797e6..6563dcde109c 100644
--- a/drivers/gpu/drm/imagination/pvr_vm_mips.c
+++ b/drivers/gpu/drm/imagination/pvr_vm_mips.c
@@ -14,6 +14,7 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/types.h>
+#include <linux/vmalloc.h>
 
 /**
  * pvr_vm_mips_init() - Initialise MIPS FW pagetable
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_gem.c b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
index 4f2e3feabc0f..3e519869b632 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/dma-buf.h>
+#include <linux/vmalloc.h>
 
 #include <drm/drm.h>
 #include <drm/drm_device.h>
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c
index 3421e8389222..9ea0c64c26b5 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -9,6 +9,7 @@
 #include <linux/shmem_fs.h>
 #include <linux/spinlock.h>
 #include <linux/pfn_t.h>
+#include <linux/vmalloc.h>
 
 #include <drm/drm_prime.h>
 #include <drm/drm_vma_manager.h>
diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index 1bdfac8beafd..bd078852cd60 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -21,6 +21,7 @@
 
 #include <linux/dma-buf.h>
 #include <linux/pfn_t.h>
+#include <linux/vmalloc.h>
 
 #include "v3d_drv.h"
 #include "uapi/drm/v3d_drm.h"
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_binding.c b/drivers/gpu/drm/vmwgfx/vmwgfx_binding.c
index ae2de914eb89..2731f6ded1c2 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_binding.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_binding.c
@@ -54,6 +54,7 @@
 #include "vmwgfx_drv.h"
 #include "vmwgfx_binding.h"
 #include "device_include/svga3d_reg.h"
+#include <linux/vmalloc.h>
 
 #define VMW_BINDING_RT_BIT     0
 #define VMW_BINDING_PS_BIT     1
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c b/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c
index 195ff8792e5a..dd4ca6a9c690 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c
@@ -31,6 +31,7 @@
 #include <drm/ttm/ttm_placement.h>
 
 #include <linux/sched/signal.h>
+#include <linux/vmalloc.h>
 
 bool vmw_supports_3d(struct vmw_private *dev_priv)
 {
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_devcaps.c b/drivers/gpu/drm/vmwgfx/vmwgfx_devcaps.c
index 829df395c2ed..6e6beff9e262 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_devcaps.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_devcaps.c
@@ -25,6 +25,7 @@
  *
  **************************************************************************/
 
+#include <linux/vmalloc.h>
 #include "vmwgfx_devcaps.h"
 
 #include "vmwgfx_drv.h"
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index d3e308fdfd5b..7a451410ad77 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -53,6 +53,7 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 #include <linux/version.h>
+#include <linux/vmalloc.h>
 
 #define VMWGFX_DRIVER_DESC "Linux drm driver for VMware graphics devices"
 
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index 36987ef3fc30..4ce22843015e 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -35,6 +35,7 @@
 
 #include <linux/sync_file.h>
 #include <linux/hashtable.h>
+#include <linux/vmalloc.h>
 
 /*
  * Helper macro to get dx_ctx_node if available otherwise print an error
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
index a1da5678c731..835d1eed8dd9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
@@ -31,6 +31,7 @@
 
 #include <drm/vmwgfx_drm.h>
 #include <linux/pci.h>
+#include <linux/vmalloc.h>
 
 int vmw_getparam_ioctl(struct drm_device *dev, void *data,
 		       struct drm_file *file_priv)
diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c b/drivers/gpu/drm/xen/xen_drm_front_gem.c
index 3ad2b4cfd1f0..63112ed975c4 100644
--- a/drivers/gpu/drm/xen/xen_drm_front_gem.c
+++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c
@@ -11,6 +11,7 @@
 #include <linux/dma-buf.h>
 #include <linux/scatterlist.h>
 #include <linux/shmem_fs.h>
+#include <linux/vmalloc.h>
 
 #include <drm/drm_gem.h>
 #include <drm/drm_prime.h>
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 6136776482e6..96a32b213669 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -17,6 +17,7 @@
 
 #include <asm/barrier.h>
 #include <asm/cpufeature.h>
+#include <linux/vmalloc.h>
 
 #include "coresight-self-hosted-trace.h"
 #include "coresight-trbe.h"
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c b/drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c
index 2e2c3be8a0b4..e6eb98d70f3c 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_pfvf_mbox.c
@@ -15,6 +15,7 @@
 #include <linux/io.h>
 #include <linux/pci.h>
 #include <linux/etherdevice.h>
+#include <linux/vmalloc.h>
 
 #include "octep_config.h"
 #include "octep_main.h"
diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index 2729a2c5acf9..11021c34e47e 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -3,6 +3,7 @@
 
 #include <net/mana/gdma.h>
 #include <net/mana/hw_channel.h>
+#include <linux/vmalloc.h>
 
 static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16 *msg_id)
 {
diff --git a/drivers/platform/x86/uv_sysfs.c b/drivers/platform/x86/uv_sysfs.c
index 38d1b692d3c0..40e010877189 100644
--- a/drivers/platform/x86/uv_sysfs.c
+++ b/drivers/platform/x86/uv_sysfs.c
@@ -11,6 +11,7 @@
 #include <linux/device.h>
 #include <linux/slab.h>
 #include <linux/kobject.h>
+#include <linux/vmalloc.h>
 #include <asm/uv/bios.h>
 #include <asm/uv/uv.h>
 #include <asm/uv/uv_hub.h>
diff --git a/drivers/scsi/mpi3mr/mpi3mr_transport.c b/drivers/scsi/mpi3mr/mpi3mr_transport.c
index c0c8ab586957..408a4023406b 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_transport.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_transport.c
@@ -7,6 +7,8 @@
  *
  */
 
+#include <linux/vmalloc.h>
+
 #include "mpi3mr.h"
 
 /**
diff --git a/drivers/vfio/pci/pds/dirty.c b/drivers/vfio/pci/pds/dirty.c
index 8ddf4346fcd5..0a161becd646 100644
--- a/drivers/vfio/pci/pds/dirty.c
+++ b/drivers/vfio/pci/pds/dirty.c
@@ -3,6 +3,7 @@
 
 #include <linux/interval_tree.h>
 #include <linux/vfio.h>
+#include <linux/vmalloc.h>
 
 #include <linux/pds/pds_common.h>
 #include <linux/pds/pds_core_if.h>
diff --git a/drivers/virt/acrn/mm.c b/drivers/virt/acrn/mm.c
index fa5d9ca6be57..c088ee1f1180 100644
--- a/drivers/virt/acrn/mm.c
+++ b/drivers/virt/acrn/mm.c
@@ -12,6 +12,7 @@
 #include <linux/io.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
+#include <linux/vmalloc.h>
 
 #include "acrn_drv.h"
 
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 8e3223294442..e8355f55a8f7 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -21,6 +21,7 @@
 #include <linux/bitmap.h>
 #include <linux/lockdep.h>
 #include <linux/log2.h>
+#include <linux/vmalloc.h>
 
 #include <acpi/acpi_numa.h>
 
diff --git a/include/linux/pds/pds_common.h b/include/linux/pds/pds_common.h
index 30581e2e04cc..5802e1deef24 100644
--- a/include/linux/pds/pds_common.h
+++ b/include/linux/pds/pds_common.h
@@ -4,6 +4,8 @@
 #ifndef _PDS_COMMON_H_
 #define _PDS_COMMON_H_
 
+#include <linux/notifier.h>
+
 #define PDS_CORE_DRV_NAME			"pds_core"
 
 /* the device's internal addressing uses up to 52 bits */
diff --git a/include/rdma/rdmavt_qp.h b/include/rdma/rdmavt_qp.h
index 2e58d5e6ac0e..d67892944193 100644
--- a/include/rdma/rdmavt_qp.h
+++ b/include/rdma/rdmavt_qp.h
@@ -11,6 +11,7 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/rdmavt_cq.h>
 #include <rdma/rvt-abi.h>
+#include <linux/vmalloc.h>
 /*
  * Atomic bit definitions for r_aflags.
  */
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 5662e29fe253..d711246929aa 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -30,6 +30,7 @@
 #include <linux/start_kernel.h>
 #include <linux/sched/mm.h>
 #include <linux/io.h>
+#include <linux/vmalloc.h>
 
 #include <asm/cacheflush.h>
 #include <asm/pgalloc.h>
diff --git a/sound/pci/hda/cs35l41_hda.c b/sound/pci/hda/cs35l41_hda.c
index d3fa6e136744..990b5bd717a1 100644
--- a/sound/pci/hda/cs35l41_hda.c
+++ b/sound/pci/hda/cs35l41_hda.c
@@ -13,6 +13,7 @@
 #include <sound/soc.h>
 #include <linux/pm_runtime.h>
 #include <linux/spi/spi.h>
+#include <linux/vmalloc.h>
 #include "hda_local.h"
 #include "hda_auto_parser.h"
 #include "hda_jack.h"
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 02/36] asm-generic/io.h: Kill vmalloc.h dependency
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 01/36] fix missing vmalloc.h includes Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 21:11   ` Pasha Tatashin
  2024-02-21 19:40 ` [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline Suren Baghdasaryan
                   ` (34 subsequent siblings)
  36 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

From: Kent Overstreet <kent.overstreet@linux.dev>

Needed to avoid a new circular dependency with the memory allocation
profiling series.

Naturally, a whole bunch of files needed to include vmalloc.h that were
previously getting it implicitly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 include/asm-generic/io.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index bac63e874c7b..c27313414a82 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -991,7 +991,6 @@ static inline void iowrite64_rep(volatile void __iomem *addr,
 
 #ifdef __KERNEL__
 
-#include <linux/vmalloc.h>
 #define __io_virt(x) ((void __force *)(x))
 
 /*
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 01/36] fix missing vmalloc.h includes Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 02/36] asm-generic/io.h: Kill vmalloc.h dependency Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 21:15   ` Pasha Tatashin
  2024-02-21 19:40 ` [PATCH v4 04/36] scripts/kallysms: Always include __start and __stop symbols Suren Baghdasaryan
                   ` (33 subsequent siblings)
  36 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

From: Kent Overstreet <kent.overstreet@linux.dev>

It seems we need to be more forceful with the compiler on this one.
This is done for performance reasons only.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 mm/slub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 2ef88bbf56a3..d31b03a8d9d5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2121,7 +2121,7 @@ bool slab_free_hook(struct kmem_cache *s, void *x, bool init)
 	return !kasan_slab_free(s, x, init);
 }
 
-static inline bool slab_free_freelist_hook(struct kmem_cache *s,
+static __always_inline bool slab_free_freelist_hook(struct kmem_cache *s,
 					   void **head, void **tail,
 					   int *cnt)
 {
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 04/36] scripts/kallysms: Always include __start and __stop symbols
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (2 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 21:20   ` Pasha Tatashin
  2024-02-21 19:40 ` [PATCH v4 05/36] fs: Convert alloc_inode_sb() to a macro Suren Baghdasaryan
                   ` (32 subsequent siblings)
  36 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

From: Kent Overstreet <kent.overstreet@linux.dev>

These symbols are used to denote section boundaries: by always including
them we can unify loading sections from modules with loading built-in
sections, which leads to some significant cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 scripts/kallsyms.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 653b92f6d4c8..47978efe4797 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -204,6 +204,11 @@ static int symbol_in_range(const struct sym_entry *s,
 	return 0;
 }
 
+static bool string_starts_with(const char *s, const char *prefix)
+{
+	return strncmp(s, prefix, strlen(prefix)) == 0;
+}
+
 static int symbol_valid(const struct sym_entry *s)
 {
 	const char *name = sym_name(s);
@@ -211,6 +216,14 @@ static int symbol_valid(const struct sym_entry *s)
 	/* if --all-symbols is not specified, then symbols outside the text
 	 * and inittext sections are discarded */
 	if (!all_symbols) {
+		/*
+		 * Symbols starting with __start and __stop are used to denote
+		 * section boundaries, and should always be included:
+		 */
+		if (string_starts_with(name, "__start_") ||
+		    string_starts_with(name, "__stop_"))
+			return 1;
+
 		if (symbol_in_range(s, text_ranges,
 				    ARRAY_SIZE(text_ranges)) == 0)
 			return 0;
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 05/36] fs: Convert alloc_inode_sb() to a macro
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (3 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 04/36] scripts/kallysms: Always include __start and __stop symbols Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 21:23   ` Pasha Tatashin
  2024-02-26 15:44   ` Vlastimil Babka
  2024-02-21 19:40 ` [PATCH v4 06/36] mm: enumerate all gfp flags Suren Baghdasaryan
                   ` (31 subsequent siblings)
  36 siblings, 2 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups,
	Alexander Viro

From: Kent Overstreet <kent.overstreet@linux.dev>

We're introducing alloc tagging, which tracks memory allocations by
callsite. Converting alloc_inode_sb() to a macro means allocations will
be tracked by its caller, which is a bit more useful.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 include/linux/fs.h | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 023f37c60709..08d8246399c3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3010,11 +3010,7 @@ int setattr_should_drop_sgid(struct mnt_idmap *idmap,
  * This must be used for allocating filesystems specific inodes to set
  * up the inode reclaim context correctly.
  */
-static inline void *
-alloc_inode_sb(struct super_block *sb, struct kmem_cache *cache, gfp_t gfp)
-{
-	return kmem_cache_alloc_lru(cache, &sb->s_inode_lru, gfp);
-}
+#define alloc_inode_sb(_sb, _cache, _gfp) kmem_cache_alloc_lru(_cache, &_sb->s_inode_lru, _gfp)
 
 extern void __insert_inode_hash(struct inode *, unsigned long hashval);
 static inline void insert_inode_hash(struct inode *inode)
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 06/36] mm: enumerate all gfp flags
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (4 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 05/36] fs: Convert alloc_inode_sb() to a macro Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 21:25   ` Pasha Tatashin
  2024-02-22 12:12   ` Michal Hocko
  2024-02-21 19:40 ` [PATCH v4 07/36] mm: introduce slabobj_ext to support slab object extensions Suren Baghdasaryan
                   ` (30 subsequent siblings)
  36 siblings, 2 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups,
	Petr Tesařík

Introduce GFP bits enumeration to let compiler track the number of used
bits (which depends on the config options) instead of hardcoding them.
That simplifies __GFP_BITS_SHIFT calculation.

Suggested-by: Petr Tesařík <petr@tesarici.cz>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 include/linux/gfp_types.h | 90 +++++++++++++++++++++++++++------------
 1 file changed, 62 insertions(+), 28 deletions(-)

diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 1b6053da8754..868c8fb1bbc1 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -21,44 +21,78 @@ typedef unsigned int __bitwise gfp_t;
  * include/trace/events/mmflags.h and tools/perf/builtin-kmem.c
  */
 
+enum {
+	___GFP_DMA_BIT,
+	___GFP_HIGHMEM_BIT,
+	___GFP_DMA32_BIT,
+	___GFP_MOVABLE_BIT,
+	___GFP_RECLAIMABLE_BIT,
+	___GFP_HIGH_BIT,
+	___GFP_IO_BIT,
+	___GFP_FS_BIT,
+	___GFP_ZERO_BIT,
+	___GFP_UNUSED_BIT,	/* 0x200u unused */
+	___GFP_DIRECT_RECLAIM_BIT,
+	___GFP_KSWAPD_RECLAIM_BIT,
+	___GFP_WRITE_BIT,
+	___GFP_NOWARN_BIT,
+	___GFP_RETRY_MAYFAIL_BIT,
+	___GFP_NOFAIL_BIT,
+	___GFP_NORETRY_BIT,
+	___GFP_MEMALLOC_BIT,
+	___GFP_COMP_BIT,
+	___GFP_NOMEMALLOC_BIT,
+	___GFP_HARDWALL_BIT,
+	___GFP_THISNODE_BIT,
+	___GFP_ACCOUNT_BIT,
+	___GFP_ZEROTAGS_BIT,
+#ifdef CONFIG_KASAN_HW_TAGS
+	___GFP_SKIP_ZERO_BIT,
+	___GFP_SKIP_KASAN_BIT,
+#endif
+#ifdef CONFIG_LOCKDEP
+	___GFP_NOLOCKDEP_BIT,
+#endif
+	___GFP_LAST_BIT
+};
+
 /* Plain integer GFP bitmasks. Do not use this directly. */
-#define ___GFP_DMA		0x01u
-#define ___GFP_HIGHMEM		0x02u
-#define ___GFP_DMA32		0x04u
-#define ___GFP_MOVABLE		0x08u
-#define ___GFP_RECLAIMABLE	0x10u
-#define ___GFP_HIGH		0x20u
-#define ___GFP_IO		0x40u
-#define ___GFP_FS		0x80u
-#define ___GFP_ZERO		0x100u
+#define ___GFP_DMA		BIT(___GFP_DMA_BIT)
+#define ___GFP_HIGHMEM		BIT(___GFP_HIGHMEM_BIT)
+#define ___GFP_DMA32		BIT(___GFP_DMA32_BIT)
+#define ___GFP_MOVABLE		BIT(___GFP_MOVABLE_BIT)
+#define ___GFP_RECLAIMABLE	BIT(___GFP_RECLAIMABLE_BIT)
+#define ___GFP_HIGH		BIT(___GFP_HIGH_BIT)
+#define ___GFP_IO		BIT(___GFP_IO_BIT)
+#define ___GFP_FS		BIT(___GFP_FS_BIT)
+#define ___GFP_ZERO		BIT(___GFP_ZERO_BIT)
 /* 0x200u unused */
-#define ___GFP_DIRECT_RECLAIM	0x400u
-#define ___GFP_KSWAPD_RECLAIM	0x800u
-#define ___GFP_WRITE		0x1000u
-#define ___GFP_NOWARN		0x2000u
-#define ___GFP_RETRY_MAYFAIL	0x4000u
-#define ___GFP_NOFAIL		0x8000u
-#define ___GFP_NORETRY		0x10000u
-#define ___GFP_MEMALLOC		0x20000u
-#define ___GFP_COMP		0x40000u
-#define ___GFP_NOMEMALLOC	0x80000u
-#define ___GFP_HARDWALL		0x100000u
-#define ___GFP_THISNODE		0x200000u
-#define ___GFP_ACCOUNT		0x400000u
-#define ___GFP_ZEROTAGS		0x800000u
+#define ___GFP_DIRECT_RECLAIM	BIT(___GFP_DIRECT_RECLAIM_BIT)
+#define ___GFP_KSWAPD_RECLAIM	BIT(___GFP_KSWAPD_RECLAIM_BIT)
+#define ___GFP_WRITE		BIT(___GFP_WRITE_BIT)
+#define ___GFP_NOWARN		BIT(___GFP_NOWARN_BIT)
+#define ___GFP_RETRY_MAYFAIL	BIT(___GFP_RETRY_MAYFAIL_BIT)
+#define ___GFP_NOFAIL		BIT(___GFP_NOFAIL_BIT)
+#define ___GFP_NORETRY		BIT(___GFP_NORETRY_BIT)
+#define ___GFP_MEMALLOC		BIT(___GFP_MEMALLOC_BIT)
+#define ___GFP_COMP		BIT(___GFP_COMP_BIT)
+#define ___GFP_NOMEMALLOC	BIT(___GFP_NOMEMALLOC_BIT)
+#define ___GFP_HARDWALL		BIT(___GFP_HARDWALL_BIT)
+#define ___GFP_THISNODE		BIT(___GFP_THISNODE_BIT)
+#define ___GFP_ACCOUNT		BIT(___GFP_ACCOUNT_BIT)
+#define ___GFP_ZEROTAGS		BIT(___GFP_ZEROTAGS_BIT)
 #ifdef CONFIG_KASAN_HW_TAGS
-#define ___GFP_SKIP_ZERO	0x1000000u
-#define ___GFP_SKIP_KASAN	0x2000000u
+#define ___GFP_SKIP_ZERO	BIT(___GFP_SKIP_ZERO_BIT)
+#define ___GFP_SKIP_KASAN	BIT(___GFP_SKIP_KASAN_BIT)
 #else
 #define ___GFP_SKIP_ZERO	0
 #define ___GFP_SKIP_KASAN	0
 #endif
 #ifdef CONFIG_LOCKDEP
-#define ___GFP_NOLOCKDEP	0x4000000u
+#define ___GFP_NOLOCKDEP	BIT(___GFP_NOLOCKDEP_BIT)
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
-/* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
  * Physical address zone modifiers (see linux/mmzone.h - low four bits)
@@ -249,7 +283,7 @@ typedef unsigned int __bitwise gfp_t;
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (26 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT ___GFP_LAST_BIT
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /**
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 07/36] mm: introduce slabobj_ext to support slab object extensions
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (5 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 06/36] mm: enumerate all gfp flags Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 21:30   ` Pasha Tatashin
  2024-02-26 16:26   ` Vlastimil Babka
  2024-02-21 19:40 ` [PATCH v4 08/36] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation Suren Baghdasaryan
                   ` (29 subsequent siblings)
  36 siblings, 2 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Currently slab pages can store only vectors of obj_cgroup pointers in
page->memcg_data. Introduce slabobj_ext structure to allow more data
to be stored for each slab object. Wrap obj_cgroup into slabobj_ext
to support current functionality while allowing to extend slabobj_ext
in the future.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/memcontrol.h |  20 +++++--
 include/linux/mm_types.h   |   4 +-
 init/Kconfig               |   4 ++
 mm/kfence/core.c           |  14 ++---
 mm/kfence/kfence.h         |   4 +-
 mm/memcontrol.c            |  56 +++----------------
 mm/page_owner.c            |   2 +-
 mm/slab.h                  |  62 +++++++++++++--------
 mm/slub.c                  | 109 +++++++++++++++++++++++++++----------
 9 files changed, 156 insertions(+), 119 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 20ff87f8e001..eb1dc181e412 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -348,8 +348,8 @@ struct mem_cgroup {
 extern struct mem_cgroup *root_mem_cgroup;
 
 enum page_memcg_data_flags {
-	/* page->memcg_data is a pointer to an objcgs vector */
-	MEMCG_DATA_OBJCGS = (1UL << 0),
+	/* page->memcg_data is a pointer to an slabobj_ext vector */
+	MEMCG_DATA_OBJEXTS = (1UL << 0),
 	/* page has been accounted as a non-slab kernel page */
 	MEMCG_DATA_KMEM = (1UL << 1),
 	/* the next bit after the last actual flag */
@@ -387,7 +387,7 @@ static inline struct mem_cgroup *__folio_memcg(struct folio *folio)
 	unsigned long memcg_data = folio->memcg_data;
 
 	VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
-	VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJCGS, folio);
+	VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
 	VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio);
 
 	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
@@ -408,7 +408,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
 	unsigned long memcg_data = folio->memcg_data;
 
 	VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
-	VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJCGS, folio);
+	VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
 	VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio);
 
 	return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
@@ -505,7 +505,7 @@ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
 	 */
 	unsigned long memcg_data = READ_ONCE(folio->memcg_data);
 
-	if (memcg_data & MEMCG_DATA_OBJCGS)
+	if (memcg_data & MEMCG_DATA_OBJEXTS)
 		return NULL;
 
 	if (memcg_data & MEMCG_DATA_KMEM) {
@@ -551,7 +551,7 @@ static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *ob
 static inline bool folio_memcg_kmem(struct folio *folio)
 {
 	VM_BUG_ON_PGFLAGS(PageTail(&folio->page), &folio->page);
-	VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJCGS, folio);
+	VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio);
 	return folio->memcg_data & MEMCG_DATA_KMEM;
 }
 
@@ -1633,6 +1633,14 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 }
 #endif /* CONFIG_MEMCG */
 
+/*
+ * Extended information for slab objects stored as an array in page->memcg_data
+ * if MEMCG_DATA_OBJEXTS is set.
+ */
+struct slabobj_ext {
+	struct obj_cgroup *objcg;
+} __aligned(8);
+
 static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
 {
 	__mod_lruvec_kmem_state(p, idx, 1);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8b611e13153e..9ff97f4e74c5 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -169,7 +169,7 @@ struct page {
 	/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
 	atomic_t _refcount;
 
-#ifdef CONFIG_MEMCG
+#ifdef CONFIG_SLAB_OBJ_EXT
 	unsigned long memcg_data;
 #endif
 
@@ -306,7 +306,7 @@ struct folio {
 			};
 			atomic_t _mapcount;
 			atomic_t _refcount;
-#ifdef CONFIG_MEMCG
+#ifdef CONFIG_SLAB_OBJ_EXT
 			unsigned long memcg_data;
 #endif
 #if defined(WANT_PAGE_VIRTUAL)
diff --git a/init/Kconfig b/init/Kconfig
index 8426d59cc634..fe5f5e75bd3f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -929,6 +929,9 @@ config NUMA_BALANCING_DEFAULT_ENABLED
 	  If set, automatic NUMA balancing will be enabled if running on a NUMA
 	  machine.
 
+config SLAB_OBJ_EXT
+	bool
+
 menuconfig CGROUPS
 	bool "Control Group support"
 	select KERNFS
@@ -962,6 +965,7 @@ config MEMCG
 	bool "Memory controller"
 	select PAGE_COUNTER
 	select EVENTFD
+	select SLAB_OBJ_EXT
 	help
 	  Provides control over the memory footprint of tasks in a cgroup.
 
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 8350f5c06f2e..964b8482275b 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -595,9 +595,9 @@ static unsigned long kfence_init_pool(void)
 			continue;
 
 		__folio_set_slab(slab_folio(slab));
-#ifdef CONFIG_MEMCG
-		slab->memcg_data = (unsigned long)&kfence_metadata_init[i / 2 - 1].objcg |
-				   MEMCG_DATA_OBJCGS;
+#ifdef CONFIG_MEMCG_KMEM
+		slab->obj_exts = (unsigned long)&kfence_metadata_init[i / 2 - 1].obj_exts |
+				 MEMCG_DATA_OBJEXTS;
 #endif
 	}
 
@@ -645,8 +645,8 @@ static unsigned long kfence_init_pool(void)
 
 		if (!i || (i % 2))
 			continue;
-#ifdef CONFIG_MEMCG
-		slab->memcg_data = 0;
+#ifdef CONFIG_MEMCG_KMEM
+		slab->obj_exts = 0;
 #endif
 		__folio_clear_slab(slab_folio(slab));
 	}
@@ -1139,8 +1139,8 @@ void __kfence_free(void *addr)
 {
 	struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr);
 
-#ifdef CONFIG_MEMCG
-	KFENCE_WARN_ON(meta->objcg);
+#ifdef CONFIG_MEMCG_KMEM
+	KFENCE_WARN_ON(meta->obj_exts.objcg);
 #endif
 	/*
 	 * If the objects of the cache are SLAB_TYPESAFE_BY_RCU, defer freeing
diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h
index f46fbb03062b..084f5f36e8e7 100644
--- a/mm/kfence/kfence.h
+++ b/mm/kfence/kfence.h
@@ -97,8 +97,8 @@ struct kfence_metadata {
 	struct kfence_track free_track;
 	/* For updating alloc_covered on frees. */
 	u32 alloc_stack_hash;
-#ifdef CONFIG_MEMCG
-	struct obj_cgroup *objcg;
+#ifdef CONFIG_MEMCG_KMEM
+	struct slabobj_ext obj_exts;
 #endif
 };
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1ed40f9d3a27..7021639d2a6f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2977,13 +2977,6 @@ void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg)
 }
 
 #ifdef CONFIG_MEMCG_KMEM
-/*
- * The allocated objcg pointers array is not accounted directly.
- * Moreover, it should not come from DMA buffer and is not readily
- * reclaimable. So those GFP bits should be masked off.
- */
-#define OBJCGS_CLEAR_MASK	(__GFP_DMA | __GFP_RECLAIMABLE | \
-				 __GFP_ACCOUNT | __GFP_NOFAIL)
 
 /*
  * mod_objcg_mlstate() may be called with irq enabled, so
@@ -3003,62 +2996,27 @@ static inline void mod_objcg_mlstate(struct obj_cgroup *objcg,
 	rcu_read_unlock();
 }
 
-int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
-				 gfp_t gfp, bool new_slab)
-{
-	unsigned int objects = objs_per_slab(s, slab);
-	unsigned long memcg_data;
-	void *vec;
-
-	gfp &= ~OBJCGS_CLEAR_MASK;
-	vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp,
-			   slab_nid(slab));
-	if (!vec)
-		return -ENOMEM;
-
-	memcg_data = (unsigned long) vec | MEMCG_DATA_OBJCGS;
-	if (new_slab) {
-		/*
-		 * If the slab is brand new and nobody can yet access its
-		 * memcg_data, no synchronization is required and memcg_data can
-		 * be simply assigned.
-		 */
-		slab->memcg_data = memcg_data;
-	} else if (cmpxchg(&slab->memcg_data, 0, memcg_data)) {
-		/*
-		 * If the slab is already in use, somebody can allocate and
-		 * assign obj_cgroups in parallel. In this case the existing
-		 * objcg vector should be reused.
-		 */
-		kfree(vec);
-		return 0;
-	}
-
-	kmemleak_not_leak(vec);
-	return 0;
-}
-
 static __always_inline
 struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
 {
 	/*
 	 * Slab objects are accounted individually, not per-page.
 	 * Memcg membership data for each individual object is saved in
-	 * slab->memcg_data.
+	 * slab->obj_exts.
 	 */
 	if (folio_test_slab(folio)) {
-		struct obj_cgroup **objcgs;
+		struct slabobj_ext *obj_exts;
 		struct slab *slab;
 		unsigned int off;
 
 		slab = folio_slab(folio);
-		objcgs = slab_objcgs(slab);
-		if (!objcgs)
+		obj_exts = slab_obj_exts(slab);
+		if (!obj_exts)
 			return NULL;
 
 		off = obj_to_index(slab->slab_cache, slab, p);
-		if (objcgs[off])
-			return obj_cgroup_memcg(objcgs[off]);
+		if (obj_exts[off].objcg)
+			return obj_cgroup_memcg(obj_exts[off].objcg);
 
 		return NULL;
 	}
@@ -3066,7 +3024,7 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
 	/*
 	 * folio_memcg_check() is used here, because in theory we can encounter
 	 * a folio where the slab flag has been cleared already, but
-	 * slab->memcg_data has not been freed yet
+	 * slab->obj_exts has not been freed yet
 	 * folio_memcg_check() will guarantee that a proper memory
 	 * cgroup pointer or NULL will be returned.
 	 */
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 5634e5d890f8..262aa7d25f40 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -377,7 +377,7 @@ static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret,
 	if (!memcg_data)
 		goto out_unlock;
 
-	if (memcg_data & MEMCG_DATA_OBJCGS)
+	if (memcg_data & MEMCG_DATA_OBJEXTS)
 		ret += scnprintf(kbuf + ret, count - ret,
 				"Slab cache page\n");
 
diff --git a/mm/slab.h b/mm/slab.h
index 54deeb0428c6..7f19b0a2acd8 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -87,8 +87,8 @@ struct slab {
 	unsigned int __unused;
 
 	atomic_t __page_refcount;
-#ifdef CONFIG_MEMCG
-	unsigned long memcg_data;
+#ifdef CONFIG_SLAB_OBJ_EXT
+	unsigned long obj_exts;
 #endif
 };
 
@@ -97,8 +97,8 @@ struct slab {
 SLAB_MATCH(flags, __page_flags);
 SLAB_MATCH(compound_head, slab_cache);	/* Ensure bit 0 is clear */
 SLAB_MATCH(_refcount, __page_refcount);
-#ifdef CONFIG_MEMCG
-SLAB_MATCH(memcg_data, memcg_data);
+#ifdef CONFIG_SLAB_OBJ_EXT
+SLAB_MATCH(memcg_data, obj_exts);
 #endif
 #undef SLAB_MATCH
 static_assert(sizeof(struct slab) <= sizeof(struct page));
@@ -541,42 +541,60 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla
 	return false;
 }
 
-#ifdef CONFIG_MEMCG_KMEM
+#ifdef CONFIG_SLAB_OBJ_EXT
+
 /*
- * slab_objcgs - get the object cgroups vector associated with a slab
+ * slab_obj_exts - get the pointer to the slab object extension vector
+ * associated with a slab.
  * @slab: a pointer to the slab struct
  *
- * Returns a pointer to the object cgroups vector associated with the slab,
+ * Returns a pointer to the object extension vector associated with the slab,
  * or NULL if no such vector has been associated yet.
  */
-static inline struct obj_cgroup **slab_objcgs(struct slab *slab)
+static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
 {
-	unsigned long memcg_data = READ_ONCE(slab->memcg_data);
+	unsigned long obj_exts = READ_ONCE(slab->obj_exts);
 
-	VM_BUG_ON_PAGE(memcg_data && !(memcg_data & MEMCG_DATA_OBJCGS),
+#ifdef CONFIG_MEMCG
+	VM_BUG_ON_PAGE(obj_exts && !(obj_exts & MEMCG_DATA_OBJEXTS),
 							slab_page(slab));
-	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, slab_page(slab));
+	VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab));
 
-	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+	return (struct slabobj_ext *)(obj_exts & ~MEMCG_DATA_FLAGS_MASK);
+#else
+	return (struct slabobj_ext *)obj_exts;
+#endif
 }
 
-int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
-				 gfp_t gfp, bool new_slab);
-void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
-		     enum node_stat_item idx, int nr);
-#else /* CONFIG_MEMCG_KMEM */
-static inline struct obj_cgroup **slab_objcgs(struct slab *slab)
+int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
+			gfp_t gfp, bool new_slab);
+
+#else /* CONFIG_SLAB_OBJ_EXT */
+
+static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
 {
 	return NULL;
 }
 
-static inline int memcg_alloc_slab_cgroups(struct slab *slab,
-					       struct kmem_cache *s, gfp_t gfp,
-					       bool new_slab)
+static inline int alloc_slab_obj_exts(struct slab *slab,
+				      struct kmem_cache *s, gfp_t gfp,
+				      bool new_slab)
 {
 	return 0;
 }
-#endif /* CONFIG_MEMCG_KMEM */
+
+static inline struct slabobj_ext *
+prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
+{
+	return NULL;
+}
+
+#endif /* CONFIG_SLAB_OBJ_EXT */
+
+#ifdef CONFIG_MEMCG_KMEM
+void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
+		     enum node_stat_item idx, int nr);
+#endif
 
 size_t __ksize(const void *objp);
 
diff --git a/mm/slub.c b/mm/slub.c
index d31b03a8d9d5..76fb600fbc80 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -683,10 +683,10 @@ static inline bool __slab_update_freelist(struct kmem_cache *s, struct slab *sla
 
 	if (s->flags & __CMPXCHG_DOUBLE) {
 		ret = __update_freelist_fast(slab, freelist_old, counters_old,
-				            freelist_new, counters_new);
+					    freelist_new, counters_new);
 	} else {
 		ret = __update_freelist_slow(slab, freelist_old, counters_old,
-				            freelist_new, counters_new);
+					    freelist_new, counters_new);
 	}
 	if (likely(ret))
 		return true;
@@ -710,13 +710,13 @@ static inline bool slab_update_freelist(struct kmem_cache *s, struct slab *slab,
 
 	if (s->flags & __CMPXCHG_DOUBLE) {
 		ret = __update_freelist_fast(slab, freelist_old, counters_old,
-				            freelist_new, counters_new);
+					    freelist_new, counters_new);
 	} else {
 		unsigned long flags;
 
 		local_irq_save(flags);
 		ret = __update_freelist_slow(slab, freelist_old, counters_old,
-				            freelist_new, counters_new);
+					     freelist_new, counters_new);
 		local_irq_restore(flags);
 	}
 	if (likely(ret))
@@ -1881,13 +1881,72 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
 		NR_SLAB_RECLAIMABLE_B : NR_SLAB_UNRECLAIMABLE_B;
 }
 
-#ifdef CONFIG_MEMCG_KMEM
-static inline void memcg_free_slab_cgroups(struct slab *slab)
+#ifdef CONFIG_SLAB_OBJ_EXT
+
+/*
+ * The allocated objcg pointers array is not accounted directly.
+ * Moreover, it should not come from DMA buffer and is not readily
+ * reclaimable. So those GFP bits should be masked off.
+ */
+#define OBJCGS_CLEAR_MASK	(__GFP_DMA | __GFP_RECLAIMABLE | \
+				__GFP_ACCOUNT | __GFP_NOFAIL)
+
+int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
+			gfp_t gfp, bool new_slab)
 {
-	kfree(slab_objcgs(slab));
-	slab->memcg_data = 0;
+	unsigned int objects = objs_per_slab(s, slab);
+	unsigned long obj_exts;
+	void *vec;
+
+	gfp &= ~OBJCGS_CLEAR_MASK;
+	vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
+			   slab_nid(slab));
+	if (!vec)
+		return -ENOMEM;
+
+	obj_exts = (unsigned long)vec;
+#ifdef CONFIG_MEMCG
+	obj_exts |= MEMCG_DATA_OBJEXTS;
+#endif
+	if (new_slab) {
+		/*
+		 * If the slab is brand new and nobody can yet access its
+		 * obj_exts, no synchronization is required and obj_exts can
+		 * be simply assigned.
+		 */
+		slab->obj_exts = obj_exts;
+	} else if (cmpxchg(&slab->obj_exts, 0, obj_exts)) {
+		/*
+		 * If the slab is already in use, somebody can allocate and
+		 * assign slabobj_exts in parallel. In this case the existing
+		 * objcg vector should be reused.
+		 */
+		kfree(vec);
+		return 0;
+	}
+
+	kmemleak_not_leak(vec);
+	return 0;
 }
 
+static inline void free_slab_obj_exts(struct slab *slab)
+{
+	struct slabobj_ext *obj_exts;
+
+	obj_exts = slab_obj_exts(slab);
+	if (!obj_exts)
+		return;
+
+	kfree(obj_exts);
+	slab->obj_exts = 0;
+}
+#else /* CONFIG_SLAB_OBJ_EXT */
+static inline void free_slab_obj_exts(struct slab *slab)
+{
+}
+#endif /* CONFIG_SLAB_OBJ_EXT */
+
+#ifdef CONFIG_MEMCG_KMEM
 static inline size_t obj_full_size(struct kmem_cache *s)
 {
 	/*
@@ -1966,15 +2025,15 @@ static void __memcg_slab_post_alloc_hook(struct kmem_cache *s,
 		if (likely(p[i])) {
 			slab = virt_to_slab(p[i]);
 
-			if (!slab_objcgs(slab) &&
-			    memcg_alloc_slab_cgroups(slab, s, flags, false)) {
+			if (!slab_obj_exts(slab) &&
+			    alloc_slab_obj_exts(slab, s, flags, false)) {
 				obj_cgroup_uncharge(objcg, obj_full_size(s));
 				continue;
 			}
 
 			off = obj_to_index(s, slab, p[i]);
 			obj_cgroup_get(objcg);
-			slab_objcgs(slab)[off] = objcg;
+			slab_obj_exts(slab)[off].objcg = objcg;
 			mod_objcg_state(objcg, slab_pgdat(slab),
 					cache_vmstat_idx(s), obj_full_size(s));
 		} else {
@@ -1995,18 +2054,18 @@ void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg,
 
 static void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
 				   void **p, int objects,
-				   struct obj_cgroup **objcgs)
+				   struct slabobj_ext *obj_exts)
 {
 	for (int i = 0; i < objects; i++) {
 		struct obj_cgroup *objcg;
 		unsigned int off;
 
 		off = obj_to_index(s, slab, p[i]);
-		objcg = objcgs[off];
+		objcg = obj_exts[off].objcg;
 		if (!objcg)
 			continue;
 
-		objcgs[off] = NULL;
+		obj_exts[off].objcg = NULL;
 		obj_cgroup_uncharge(objcg, obj_full_size(s));
 		mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s),
 				-obj_full_size(s));
@@ -2018,16 +2077,16 @@ static __fastpath_inline
 void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 			  int objects)
 {
-	struct obj_cgroup **objcgs;
+	struct slabobj_ext *obj_exts;
 
 	if (!memcg_kmem_online())
 		return;
 
-	objcgs = slab_objcgs(slab);
-	if (likely(!objcgs))
+	obj_exts = slab_obj_exts(slab);
+	if (likely(!obj_exts))
 		return;
 
-	__memcg_slab_free_hook(s, slab, p, objects, objcgs);
+	__memcg_slab_free_hook(s, slab, p, objects, obj_exts);
 }
 
 static inline
@@ -2038,15 +2097,6 @@ void memcg_slab_alloc_error_hook(struct kmem_cache *s, int objects,
 		obj_cgroup_uncharge(objcg, objects * obj_full_size(s));
 }
 #else /* CONFIG_MEMCG_KMEM */
-static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr)
-{
-	return NULL;
-}
-
-static inline void memcg_free_slab_cgroups(struct slab *slab)
-{
-}
-
 static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
 					     struct list_lru *lru,
 					     struct obj_cgroup **objcgp,
@@ -2314,7 +2364,7 @@ static __always_inline void account_slab(struct slab *slab, int order,
 					 struct kmem_cache *s, gfp_t gfp)
 {
 	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
-		memcg_alloc_slab_cgroups(slab, s, gfp, true);
+		alloc_slab_obj_exts(slab, s, gfp, true);
 
 	mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
 			    PAGE_SIZE << order);
@@ -2323,8 +2373,7 @@ static __always_inline void account_slab(struct slab *slab, int order,
 static __always_inline void unaccount_slab(struct slab *slab, int order,
 					   struct kmem_cache *s)
 {
-	if (memcg_kmem_online())
-		memcg_free_slab_cgroups(slab);
+	free_slab_obj_exts(slab);
 
 	mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
 			    -(PAGE_SIZE << order));
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 08/36] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (6 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 07/36] mm: introduce slabobj_ext to support slab object extensions Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-22  0:08   ` Pasha Tatashin
  2024-02-26 16:51   ` Vlastimil Babka
  2024-02-21 19:40 ` [PATCH v4 09/36] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation Suren Baghdasaryan
                   ` (28 subsequent siblings)
  36 siblings, 2 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Introduce __GFP_NO_OBJ_EXT flag in order to prevent recursive allocations
when allocating slabobj_ext on a slab.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 include/linux/gfp_types.h | 11 +++++++++++
 mm/slub.c                 |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 868c8fb1bbc1..e36e168d8cfd 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -52,6 +52,9 @@ enum {
 #endif
 #ifdef CONFIG_LOCKDEP
 	___GFP_NOLOCKDEP_BIT,
+#endif
+#ifdef CONFIG_SLAB_OBJ_EXT
+	___GFP_NO_OBJ_EXT_BIT,
 #endif
 	___GFP_LAST_BIT
 };
@@ -93,6 +96,11 @@ enum {
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
+#ifdef CONFIG_SLAB_OBJ_EXT
+#define ___GFP_NO_OBJ_EXT       BIT(___GFP_NO_OBJ_EXT_BIT)
+#else
+#define ___GFP_NO_OBJ_EXT       0
+#endif
 
 /*
  * Physical address zone modifiers (see linux/mmzone.h - low four bits)
@@ -133,12 +141,15 @@ enum {
  * node with no fallbacks or placement policy enforcements.
  *
  * %__GFP_ACCOUNT causes the allocation to be accounted to kmemcg.
+ *
+ * %__GFP_NO_OBJ_EXT causes slab allocation to have no object extension.
  */
 #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE)
 #define __GFP_WRITE	((__force gfp_t)___GFP_WRITE)
 #define __GFP_HARDWALL   ((__force gfp_t)___GFP_HARDWALL)
 #define __GFP_THISNODE	((__force gfp_t)___GFP_THISNODE)
 #define __GFP_ACCOUNT	((__force gfp_t)___GFP_ACCOUNT)
+#define __GFP_NO_OBJ_EXT   ((__force gfp_t)___GFP_NO_OBJ_EXT)
 
 /**
  * DOC: Watermark modifiers
diff --git a/mm/slub.c b/mm/slub.c
index 76fb600fbc80..ca803b2949fc 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1899,6 +1899,8 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 	void *vec;
 
 	gfp &= ~OBJCGS_CLEAR_MASK;
+	/* Prevent recursive extension vector allocation */
+	gfp |= __GFP_NO_OBJ_EXT;
 	vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
 			   slab_nid(slab));
 	if (!vec)
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 09/36] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (7 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 08/36] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-22  0:09   ` Pasha Tatashin
  2024-02-26 16:52   ` Vlastimil Babka
  2024-02-21 19:40 ` [PATCH v4 10/36] slab: objext: introduce objext_flags as extension to page_memcg_data_flags Suren Baghdasaryan
                   ` (27 subsequent siblings)
  36 siblings, 2 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Slab extension objects can't be allocated before slab infrastructure is
initialized. Some caches, like kmem_cache and kmem_cache_node, are created
before slab infrastructure is initialized. Objects from these caches can't
have extension objects. Introduce SLAB_NO_OBJ_EXT slab flag to mark these
caches and avoid creating extensions for objects allocated from these
slabs.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 include/linux/slab.h | 6 ++++++
 mm/slub.c            | 5 +++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index b5f5ee8308d0..58794043ab5b 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -28,6 +28,12 @@
  */
 /* DEBUG: Perform (expensive) checks on alloc/free */
 #define SLAB_CONSISTENCY_CHECKS	((slab_flags_t __force)0x00000100U)
+/* Slab created using create_boot_cache */
+#ifdef CONFIG_SLAB_OBJ_EXT
+#define SLAB_NO_OBJ_EXT		((slab_flags_t __force)0x00000200U)
+#else
+#define SLAB_NO_OBJ_EXT		0
+#endif
 /* DEBUG: Red zone objs in a cache */
 #define SLAB_RED_ZONE		((slab_flags_t __force)0x00000400U)
 /* DEBUG: Poison objects */
diff --git a/mm/slub.c b/mm/slub.c
index ca803b2949fc..5dc7beda6c0d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5697,7 +5697,8 @@ void __init kmem_cache_init(void)
 		node_set(node, slab_nodes);
 
 	create_boot_cache(kmem_cache_node, "kmem_cache_node",
-		sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN, 0, 0);
+			sizeof(struct kmem_cache_node),
+			SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
 
 	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
 
@@ -5707,7 +5708,7 @@ void __init kmem_cache_init(void)
 	create_boot_cache(kmem_cache, "kmem_cache",
 			offsetof(struct kmem_cache, node) +
 				nr_node_ids * sizeof(struct kmem_cache_node *),
-		       SLAB_HWCACHE_ALIGN, 0, 0);
+			SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
 
 	kmem_cache = bootstrap(&boot_kmem_cache);
 	kmem_cache_node = bootstrap(&boot_kmem_cache_node);
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 10/36] slab: objext: introduce objext_flags as extension to page_memcg_data_flags
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (8 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 09/36] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-22  0:12   ` Pasha Tatashin
  2024-02-26 16:53   ` Vlastimil Babka
  2024-02-21 19:40 ` [PATCH v4 11/36] lib: code tagging framework Suren Baghdasaryan
                   ` (26 subsequent siblings)
  36 siblings, 2 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Introduce objext_flags to store additional objext flags unrelated to memcg.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 include/linux/memcontrol.h | 29 ++++++++++++++++++++++-------
 mm/slab.h                  |  4 +---
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index eb1dc181e412..f3584e98b640 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -356,7 +356,22 @@ enum page_memcg_data_flags {
 	__NR_MEMCG_DATA_FLAGS  = (1UL << 2),
 };
 
-#define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
+#define __FIRST_OBJEXT_FLAG	__NR_MEMCG_DATA_FLAGS
+
+#else /* CONFIG_MEMCG */
+
+#define __FIRST_OBJEXT_FLAG	(1UL << 0)
+
+#endif /* CONFIG_MEMCG */
+
+enum objext_flags {
+	/* the next bit after the last actual flag */
+	__NR_OBJEXTS_FLAGS  = __FIRST_OBJEXT_FLAG,
+};
+
+#define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
+
+#ifdef CONFIG_MEMCG
 
 static inline bool folio_memcg_kmem(struct folio *folio);
 
@@ -390,7 +405,7 @@ static inline struct mem_cgroup *__folio_memcg(struct folio *folio)
 	VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
 	VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio);
 
-	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+	return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
 }
 
 /*
@@ -411,7 +426,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
 	VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
 	VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio);
 
-	return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+	return (struct obj_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
 }
 
 /*
@@ -468,11 +483,11 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
 	if (memcg_data & MEMCG_DATA_KMEM) {
 		struct obj_cgroup *objcg;
 
-		objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+		objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
 		return obj_cgroup_memcg(objcg);
 	}
 
-	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+	return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
 }
 
 /*
@@ -511,11 +526,11 @@ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
 	if (memcg_data & MEMCG_DATA_KMEM) {
 		struct obj_cgroup *objcg;
 
-		objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+		objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
 		return obj_cgroup_memcg(objcg);
 	}
 
-	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+	return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
 }
 
 static inline struct mem_cgroup *page_memcg_check(struct page *page)
diff --git a/mm/slab.h b/mm/slab.h
index 7f19b0a2acd8..13b6ba2abd74 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -560,10 +560,8 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
 							slab_page(slab));
 	VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab));
 
-	return (struct slabobj_ext *)(obj_exts & ~MEMCG_DATA_FLAGS_MASK);
-#else
-	return (struct slabobj_ext *)obj_exts;
 #endif
+	return (struct slabobj_ext *)(obj_exts & ~OBJEXTS_FLAGS_MASK);
 }
 
 int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 11/36] lib: code tagging framework
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (9 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 10/36] slab: objext: introduce objext_flags as extension to page_memcg_data_flags Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 12/36] lib: code tagging module support Suren Baghdasaryan
                   ` (25 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Add basic infrastructure to support code tagging which stores tag common
information consisting of the module name, function, file name and line
number. Provide functions to register a new code tag type and navigate
between code tags.

Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/codetag.h |  68 +++++++++++++
 lib/Kconfig.debug       |   4 +
 lib/Makefile            |   1 +
 lib/codetag.c           | 219 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 292 insertions(+)
 create mode 100644 include/linux/codetag.h
 create mode 100644 lib/codetag.c

diff --git a/include/linux/codetag.h b/include/linux/codetag.h
new file mode 100644
index 000000000000..7734269cdb63
--- /dev/null
+++ b/include/linux/codetag.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * code tagging framework
+ */
+#ifndef _LINUX_CODETAG_H
+#define _LINUX_CODETAG_H
+
+#include <linux/types.h>
+
+struct codetag_iterator;
+struct codetag_type;
+struct codetag_module;
+struct seq_buf;
+struct module;
+
+/*
+ * An instance of this structure is created in a special ELF section at every
+ * code location being tagged.  At runtime, the special section is treated as
+ * an array of these.
+ */
+struct codetag {
+	unsigned int flags; /* used in later patches */
+	unsigned int lineno;
+	const char *modname;
+	const char *function;
+	const char *filename;
+} __aligned(8);
+
+union codetag_ref {
+	struct codetag *ct;
+};
+
+struct codetag_type_desc {
+	const char *section;
+	size_t tag_size;
+};
+
+struct codetag_iterator {
+	struct codetag_type *cttype;
+	struct codetag_module *cmod;
+	unsigned long mod_id;
+	struct codetag *ct;
+};
+
+#ifdef MODULE
+#define CT_MODULE_NAME KBUILD_MODNAME
+#else
+#define CT_MODULE_NAME NULL
+#endif
+
+#define CODE_TAG_INIT {					\
+	.modname	= CT_MODULE_NAME,		\
+	.function	= __func__,			\
+	.filename	= __FILE__,			\
+	.lineno		= __LINE__,			\
+	.flags		= 0,				\
+}
+
+void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
+struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
+struct codetag *codetag_next_ct(struct codetag_iterator *iter);
+
+void codetag_to_text(struct seq_buf *out, struct codetag *ct);
+
+struct codetag_type *
+codetag_register_type(const struct codetag_type_desc *desc);
+
+#endif /* _LINUX_CODETAG_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 975a07f9f1cc..0be2d00c3696 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -968,6 +968,10 @@ config DEBUG_STACKOVERFLOW
 
 	  If in doubt, say "N".
 
+config CODE_TAGGING
+	bool
+	select KALLSYMS
+
 source "lib/Kconfig.kasan"
 source "lib/Kconfig.kfence"
 source "lib/Kconfig.kmsan"
diff --git a/lib/Makefile b/lib/Makefile
index 6b09731d8e61..6b48b22fdfac 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -235,6 +235,7 @@ obj-$(CONFIG_OF_RECONFIG_NOTIFIER_ERROR_INJECT) += \
 	of-reconfig-notifier-error-inject.o
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
 
+obj-$(CONFIG_CODE_TAGGING) += codetag.o
 lib-$(CONFIG_GENERIC_BUG) += bug.o
 
 obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
diff --git a/lib/codetag.c b/lib/codetag.c
new file mode 100644
index 000000000000..8b5b89ad508d
--- /dev/null
+++ b/lib/codetag.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/codetag.h>
+#include <linux/idr.h>
+#include <linux/kallsyms.h>
+#include <linux/module.h>
+#include <linux/seq_buf.h>
+#include <linux/slab.h>
+
+struct codetag_type {
+	struct list_head link;
+	unsigned int count;
+	struct idr mod_idr;
+	struct rw_semaphore mod_lock; /* protects mod_idr */
+	struct codetag_type_desc desc;
+};
+
+struct codetag_range {
+	struct codetag *start;
+	struct codetag *stop;
+};
+
+struct codetag_module {
+	struct module *mod;
+	struct codetag_range range;
+};
+
+static DEFINE_MUTEX(codetag_lock);
+static LIST_HEAD(codetag_types);
+
+void codetag_lock_module_list(struct codetag_type *cttype, bool lock)
+{
+	if (lock)
+		down_read(&cttype->mod_lock);
+	else
+		up_read(&cttype->mod_lock);
+}
+
+struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
+{
+	struct codetag_iterator iter = {
+		.cttype = cttype,
+		.cmod = NULL,
+		.mod_id = 0,
+		.ct = NULL,
+	};
+
+	return iter;
+}
+
+static inline struct codetag *get_first_module_ct(struct codetag_module *cmod)
+{
+	return cmod->range.start < cmod->range.stop ? cmod->range.start : NULL;
+}
+
+static inline
+struct codetag *get_next_module_ct(struct codetag_iterator *iter)
+{
+	struct codetag *res = (struct codetag *)
+			((char *)iter->ct + iter->cttype->desc.tag_size);
+
+	return res < iter->cmod->range.stop ? res : NULL;
+}
+
+struct codetag *codetag_next_ct(struct codetag_iterator *iter)
+{
+	struct codetag_type *cttype = iter->cttype;
+	struct codetag_module *cmod;
+	struct codetag *ct;
+
+	lockdep_assert_held(&cttype->mod_lock);
+
+	if (unlikely(idr_is_empty(&cttype->mod_idr)))
+		return NULL;
+
+	ct = NULL;
+	while (true) {
+		cmod = idr_find(&cttype->mod_idr, iter->mod_id);
+
+		/* If module was removed move to the next one */
+		if (!cmod)
+			cmod = idr_get_next_ul(&cttype->mod_idr,
+					       &iter->mod_id);
+
+		/* Exit if no more modules */
+		if (!cmod)
+			break;
+
+		if (cmod != iter->cmod) {
+			iter->cmod = cmod;
+			ct = get_first_module_ct(cmod);
+		} else
+			ct = get_next_module_ct(iter);
+
+		if (ct)
+			break;
+
+		iter->mod_id++;
+	}
+
+	iter->ct = ct;
+	return ct;
+}
+
+void codetag_to_text(struct seq_buf *out, struct codetag *ct)
+{
+	if (ct->modname)
+		seq_buf_printf(out, "%s:%u [%s] func:%s",
+			       ct->filename, ct->lineno,
+			       ct->modname, ct->function);
+	else
+		seq_buf_printf(out, "%s:%u func:%s",
+			       ct->filename, ct->lineno, ct->function);
+}
+
+static inline size_t range_size(const struct codetag_type *cttype,
+				const struct codetag_range *range)
+{
+	return ((char *)range->stop - (char *)range->start) /
+			cttype->desc.tag_size;
+}
+
+#ifdef CONFIG_MODULES
+static void *get_symbol(struct module *mod, const char *prefix, const char *name)
+{
+	DECLARE_SEQ_BUF(sb, KSYM_NAME_LEN);
+	const char *buf;
+
+	seq_buf_printf(&sb, "%s%s", prefix, name);
+	if (seq_buf_has_overflowed(&sb))
+		return NULL;
+
+	buf = seq_buf_str(&sb);
+	return mod ?
+		(void *)find_kallsyms_symbol_value(mod, buf) :
+		(void *)kallsyms_lookup_name(buf);
+}
+
+static struct codetag_range get_section_range(struct module *mod,
+					      const char *section)
+{
+	return (struct codetag_range) {
+		get_symbol(mod, "__start_", section),
+		get_symbol(mod, "__stop_", section),
+	};
+}
+
+static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
+{
+	struct codetag_range range;
+	struct codetag_module *cmod;
+	int err;
+
+	range = get_section_range(mod, cttype->desc.section);
+	if (!range.start || !range.stop) {
+		pr_warn("Failed to load code tags of type %s from the module %s\n",
+			cttype->desc.section,
+			mod ? mod->name : "(built-in)");
+		return -EINVAL;
+	}
+
+	/* Ignore empty ranges */
+	if (range.start == range.stop)
+		return 0;
+
+	BUG_ON(range.start > range.stop);
+
+	cmod = kmalloc(sizeof(*cmod), GFP_KERNEL);
+	if (unlikely(!cmod))
+		return -ENOMEM;
+
+	cmod->mod = mod;
+	cmod->range = range;
+
+	down_write(&cttype->mod_lock);
+	err = idr_alloc(&cttype->mod_idr, cmod, 0, 0, GFP_KERNEL);
+	if (err >= 0)
+		cttype->count += range_size(cttype, &range);
+	up_write(&cttype->mod_lock);
+
+	if (err < 0) {
+		kfree(cmod);
+		return err;
+	}
+
+	return 0;
+}
+
+#else /* CONFIG_MODULES */
+static int codetag_module_init(struct codetag_type *cttype, struct module *mod) { return 0; }
+#endif /* CONFIG_MODULES */
+
+struct codetag_type *
+codetag_register_type(const struct codetag_type_desc *desc)
+{
+	struct codetag_type *cttype;
+	int err;
+
+	BUG_ON(desc->tag_size <= 0);
+
+	cttype = kzalloc(sizeof(*cttype), GFP_KERNEL);
+	if (unlikely(!cttype))
+		return ERR_PTR(-ENOMEM);
+
+	cttype->desc = *desc;
+	idr_init(&cttype->mod_idr);
+	init_rwsem(&cttype->mod_lock);
+
+	err = codetag_module_init(cttype, NULL);
+	if (unlikely(err)) {
+		kfree(cttype);
+		return ERR_PTR(err);
+	}
+
+	mutex_lock(&codetag_lock);
+	list_add_tail(&cttype->link, &codetag_types);
+	mutex_unlock(&codetag_lock);
+
+	return cttype;
+}
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 12/36] lib: code tagging module support
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (10 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 11/36] lib: code tagging framework Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 13/36] lib: prevent module unloading if memory is not freed Suren Baghdasaryan
                   ` (24 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Add support for code tagging from dynamically loaded modules.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 include/linux/codetag.h | 12 +++++++++
 kernel/module/main.c    |  4 +++
 lib/codetag.c           | 58 +++++++++++++++++++++++++++++++++++++++--
 3 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index 7734269cdb63..c44f5b83f24d 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -33,6 +33,10 @@ union codetag_ref {
 struct codetag_type_desc {
 	const char *section;
 	size_t tag_size;
+	void (*module_load)(struct codetag_type *cttype,
+			    struct codetag_module *cmod);
+	void (*module_unload)(struct codetag_type *cttype,
+			      struct codetag_module *cmod);
 };
 
 struct codetag_iterator {
@@ -65,4 +69,12 @@ void codetag_to_text(struct seq_buf *out, struct codetag *ct);
 struct codetag_type *
 codetag_register_type(const struct codetag_type_desc *desc);
 
+#if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
+void codetag_load_module(struct module *mod);
+void codetag_unload_module(struct module *mod);
+#else
+static inline void codetag_load_module(struct module *mod) {}
+static inline void codetag_unload_module(struct module *mod) {}
+#endif
+
 #endif /* _LINUX_CODETAG_H */
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 36681911c05a..f400ba076cc7 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -56,6 +56,7 @@
 #include <linux/dynamic_debug.h>
 #include <linux/audit.h>
 #include <linux/cfi.h>
+#include <linux/codetag.h>
 #include <linux/debugfs.h>
 #include <uapi/linux/module.h>
 #include "internal.h"
@@ -1242,6 +1243,7 @@ static void free_module(struct module *mod)
 {
 	trace_module_free(mod);
 
+	codetag_unload_module(mod);
 	mod_sysfs_teardown(mod);
 
 	/*
@@ -2978,6 +2980,8 @@ static int load_module(struct load_info *info, const char __user *uargs,
 	/* Get rid of temporary copy. */
 	free_copy(info, flags);
 
+	codetag_load_module(mod);
+
 	/* Done! */
 	trace_module_load(mod);
 
diff --git a/lib/codetag.c b/lib/codetag.c
index 8b5b89ad508d..9af22648dbfa 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -124,15 +124,20 @@ static void *get_symbol(struct module *mod, const char *prefix, const char *name
 {
 	DECLARE_SEQ_BUF(sb, KSYM_NAME_LEN);
 	const char *buf;
+	void *ret;
 
 	seq_buf_printf(&sb, "%s%s", prefix, name);
 	if (seq_buf_has_overflowed(&sb))
 		return NULL;
 
 	buf = seq_buf_str(&sb);
-	return mod ?
+	preempt_disable();
+	ret = mod ?
 		(void *)find_kallsyms_symbol_value(mod, buf) :
 		(void *)kallsyms_lookup_name(buf);
+	preempt_enable();
+
+	return ret;
 }
 
 static struct codetag_range get_section_range(struct module *mod,
@@ -173,8 +178,11 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
 
 	down_write(&cttype->mod_lock);
 	err = idr_alloc(&cttype->mod_idr, cmod, 0, 0, GFP_KERNEL);
-	if (err >= 0)
+	if (err >= 0) {
 		cttype->count += range_size(cttype, &range);
+		if (cttype->desc.module_load)
+			cttype->desc.module_load(cttype, cmod);
+	}
 	up_write(&cttype->mod_lock);
 
 	if (err < 0) {
@@ -217,3 +225,49 @@ codetag_register_type(const struct codetag_type_desc *desc)
 
 	return cttype;
 }
+
+void codetag_load_module(struct module *mod)
+{
+	struct codetag_type *cttype;
+
+	if (!mod)
+		return;
+
+	mutex_lock(&codetag_lock);
+	list_for_each_entry(cttype, &codetag_types, link)
+		codetag_module_init(cttype, mod);
+	mutex_unlock(&codetag_lock);
+}
+
+void codetag_unload_module(struct module *mod)
+{
+	struct codetag_type *cttype;
+
+	if (!mod)
+		return;
+
+	mutex_lock(&codetag_lock);
+	list_for_each_entry(cttype, &codetag_types, link) {
+		struct codetag_module *found = NULL;
+		struct codetag_module *cmod;
+		unsigned long mod_id, tmp;
+
+		down_write(&cttype->mod_lock);
+		idr_for_each_entry_ul(&cttype->mod_idr, cmod, tmp, mod_id) {
+			if (cmod->mod && cmod->mod == mod) {
+				found = cmod;
+				break;
+			}
+		}
+		if (found) {
+			if (cttype->desc.module_unload)
+				cttype->desc.module_unload(cttype, cmod);
+
+			cttype->count -= range_size(cttype, &cmod->range);
+			idr_remove(&cttype->mod_idr, mod_id);
+			kfree(cmod);
+		}
+		up_write(&cttype->mod_lock);
+	}
+	mutex_unlock(&codetag_lock);
+}
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 13/36] lib: prevent module unloading if memory is not freed
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (11 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 12/36] lib: code tagging module support Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-26 16:58   ` Vlastimil Babka
  2024-02-21 19:40 ` [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling Suren Baghdasaryan
                   ` (23 subsequent siblings)
  36 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Skip freeing module's data section if there are non-zero allocation tags
because otherwise, once these allocations are freed, the access to their
code tag would cause UAF.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/codetag.h |  6 +++---
 kernel/module/main.c    | 23 +++++++++++++++--------
 lib/codetag.c           | 11 ++++++++---
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index c44f5b83f24d..bfd0ba5c4185 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -35,7 +35,7 @@ struct codetag_type_desc {
 	size_t tag_size;
 	void (*module_load)(struct codetag_type *cttype,
 			    struct codetag_module *cmod);
-	void (*module_unload)(struct codetag_type *cttype,
+	bool (*module_unload)(struct codetag_type *cttype,
 			      struct codetag_module *cmod);
 };
 
@@ -71,10 +71,10 @@ codetag_register_type(const struct codetag_type_desc *desc);
 
 #if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
 void codetag_load_module(struct module *mod);
-void codetag_unload_module(struct module *mod);
+bool codetag_unload_module(struct module *mod);
 #else
 static inline void codetag_load_module(struct module *mod) {}
-static inline void codetag_unload_module(struct module *mod) {}
+static inline bool codetag_unload_module(struct module *mod) { return true; }
 #endif
 
 #endif /* _LINUX_CODETAG_H */
diff --git a/kernel/module/main.c b/kernel/module/main.c
index f400ba076cc7..658b631e76ad 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1211,15 +1211,19 @@ static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
 	return module_alloc(size);
 }
 
-static void module_memory_free(void *ptr, enum mod_mem_type type)
+static void module_memory_free(void *ptr, enum mod_mem_type type,
+			       bool unload_codetags)
 {
+	if (!unload_codetags && mod_mem_type_is_core_data(type))
+		return;
+
 	if (mod_mem_use_vmalloc(type))
 		vfree(ptr);
 	else
 		module_memfree(ptr);
 }
 
-static void free_mod_mem(struct module *mod)
+static void free_mod_mem(struct module *mod, bool unload_codetags)
 {
 	for_each_mod_mem_type(type) {
 		struct module_memory *mod_mem = &mod->mem[type];
@@ -1230,20 +1234,23 @@ static void free_mod_mem(struct module *mod)
 		/* Free lock-classes; relies on the preceding sync_rcu(). */
 		lockdep_free_key_range(mod_mem->base, mod_mem->size);
 		if (mod_mem->size)
-			module_memory_free(mod_mem->base, type);
+			module_memory_free(mod_mem->base, type,
+					   unload_codetags);
 	}
 
 	/* MOD_DATA hosts mod, so free it at last */
 	lockdep_free_key_range(mod->mem[MOD_DATA].base, mod->mem[MOD_DATA].size);
-	module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA);
+	module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA, unload_codetags);
 }
 
 /* Free a module, remove from lists, etc. */
 static void free_module(struct module *mod)
 {
+	bool unload_codetags;
+
 	trace_module_free(mod);
 
-	codetag_unload_module(mod);
+	unload_codetags = codetag_unload_module(mod);
 	mod_sysfs_teardown(mod);
 
 	/*
@@ -1285,7 +1292,7 @@ static void free_module(struct module *mod)
 	kfree(mod->args);
 	percpu_modfree(mod);
 
-	free_mod_mem(mod);
+	free_mod_mem(mod, unload_codetags);
 }
 
 void *__symbol_get(const char *symbol)
@@ -2298,7 +2305,7 @@ static int move_module(struct module *mod, struct load_info *info)
 	return 0;
 out_enomem:
 	for (t--; t >= 0; t--)
-		module_memory_free(mod->mem[t].base, t);
+		module_memory_free(mod->mem[t].base, t, true);
 	return ret;
 }
 
@@ -2428,7 +2435,7 @@ static void module_deallocate(struct module *mod, struct load_info *info)
 	percpu_modfree(mod);
 	module_arch_freeing_init(mod);
 
-	free_mod_mem(mod);
+	free_mod_mem(mod, true);
 }
 
 int __weak module_finalize(const Elf_Ehdr *hdr,
diff --git a/lib/codetag.c b/lib/codetag.c
index 9af22648dbfa..b13412ca57cc 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -5,6 +5,7 @@
 #include <linux/module.h>
 #include <linux/seq_buf.h>
 #include <linux/slab.h>
+#include <linux/vmalloc.h>
 
 struct codetag_type {
 	struct list_head link;
@@ -239,12 +240,13 @@ void codetag_load_module(struct module *mod)
 	mutex_unlock(&codetag_lock);
 }
 
-void codetag_unload_module(struct module *mod)
+bool codetag_unload_module(struct module *mod)
 {
 	struct codetag_type *cttype;
+	bool unload_ok = true;
 
 	if (!mod)
-		return;
+		return true;
 
 	mutex_lock(&codetag_lock);
 	list_for_each_entry(cttype, &codetag_types, link) {
@@ -261,7 +263,8 @@ void codetag_unload_module(struct module *mod)
 		}
 		if (found) {
 			if (cttype->desc.module_unload)
-				cttype->desc.module_unload(cttype, cmod);
+				if (!cttype->desc.module_unload(cttype, cmod))
+					unload_ok = false;
 
 			cttype->count -= range_size(cttype, &cmod->range);
 			idr_remove(&cttype->mod_idr, mod_id);
@@ -270,4 +273,6 @@ void codetag_unload_module(struct module *mod)
 		up_write(&cttype->mod_lock);
 	}
 	mutex_unlock(&codetag_lock);
+
+	return unload_ok;
 }
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (12 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 13/36] lib: prevent module unloading if memory is not freed Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 23:05   ` Kees Cook
                     ` (2 more replies)
  2024-02-21 19:40 ` [PATCH v4 15/36] lib: introduce support for page allocation tagging Suren Baghdasaryan
                   ` (22 subsequent siblings)
  36 siblings, 3 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Introduce CONFIG_MEM_ALLOC_PROFILING which provides definitions to easily
instrument memory allocators. It registers an "alloc_tags" codetag type
with /proc/allocinfo interface to output allocation tag information when
the feature is enabled.
CONFIG_MEM_ALLOC_PROFILING_DEBUG is provided for debugging the memory
allocation profiling instrumentation.
Memory allocation profiling can be enabled or disabled at runtime using
/proc/sys/vm/mem_profiling sysctl when CONFIG_MEM_ALLOC_PROFILING_DEBUG=n.
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT enables memory allocation
profiling by default.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 Documentation/admin-guide/sysctl/vm.rst |  16 +++
 Documentation/filesystems/proc.rst      |  29 +++++
 include/asm-generic/codetag.lds.h       |  14 +++
 include/asm-generic/vmlinux.lds.h       |   3 +
 include/linux/alloc_tag.h               | 144 +++++++++++++++++++++++
 include/linux/sched.h                   |  24 ++++
 lib/Kconfig.debug                       |  25 ++++
 lib/Makefile                            |   2 +
 lib/alloc_tag.c                         | 149 ++++++++++++++++++++++++
 scripts/module.lds.S                    |   7 ++
 10 files changed, 413 insertions(+)
 create mode 100644 include/asm-generic/codetag.lds.h
 create mode 100644 include/linux/alloc_tag.h
 create mode 100644 lib/alloc_tag.c

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index c59889de122b..e86c968a7a0e 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -43,6 +43,7 @@ Currently, these files are in /proc/sys/vm:
 - legacy_va_layout
 - lowmem_reserve_ratio
 - max_map_count
+- mem_profiling         (only if CONFIG_MEM_ALLOC_PROFILING=y)
 - memory_failure_early_kill
 - memory_failure_recovery
 - min_free_kbytes
@@ -425,6 +426,21 @@ e.g., up to one or two maps per allocation.
 The default value is 65530.
 
 
+mem_profiling
+==============
+
+Enable memory profiling (when CONFIG_MEM_ALLOC_PROFILING=y)
+
+1: Enable memory profiling.
+
+0: Disable memory profiling.
+
+Enabling memory profiling introduces a small performance overhead for all
+memory allocations.
+
+The default value depends on CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT.
+
+
 memory_failure_early_kill:
 ==========================
 
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 104c6d047d9b..8150dc3d689c 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -688,6 +688,7 @@ files are there, and which are missing.
  ============ ===============================================================
  File         Content
  ============ ===============================================================
+ allocinfo    Memory allocations profiling information
  apm          Advanced power management info
  bootconfig   Kernel command line obtained from boot config,
  	      and, if there were kernel parameters from the
@@ -953,6 +954,34 @@ also be allocatable although a lot of filesystem metadata may have to be
 reclaimed to achieve this.
 
 
+allocinfo
+~~~~~~~
+
+Provides information about memory allocations at all locations in the code
+base. Each allocation in the code is identified by its source file, line
+number, module (if originates from a loadable module) and the function calling
+the allocation. The number of bytes allocated and number of calls at each
+location are reported.
+
+Example output.
+
+::
+
+    > sort -rn /proc/allocinfo
+   127664128    31168 mm/page_ext.c:270 func:alloc_page_ext
+    56373248     4737 mm/slub.c:2259 func:alloc_slab_page
+    14880768     3633 mm/readahead.c:247 func:page_cache_ra_unbounded
+    14417920     3520 mm/mm_init.c:2530 func:alloc_large_system_hash
+    13377536      234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
+    11718656     2861 mm/filemap.c:1919 func:__filemap_get_folio
+     9192960     2800 kernel/fork.c:307 func:alloc_thread_stack_node
+     4206592        4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
+     4136960     1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
+     3940352      962 mm/memory.c:4214 func:alloc_anon_folio
+     2894464    22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
+     ...
+
+
 meminfo
 ~~~~~~~
 
diff --git a/include/asm-generic/codetag.lds.h b/include/asm-generic/codetag.lds.h
new file mode 100644
index 000000000000..64f536b80380
--- /dev/null
+++ b/include/asm-generic/codetag.lds.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_CODETAG_LDS_H
+#define __ASM_GENERIC_CODETAG_LDS_H
+
+#define SECTION_WITH_BOUNDARIES(_name)	\
+	. = ALIGN(8);			\
+	__start_##_name = .;		\
+	KEEP(*(_name))			\
+	__stop_##_name = .;
+
+#define CODETAG_SECTIONS()		\
+	SECTION_WITH_BOUNDARIES(alloc_tags)
+
+#endif /* __ASM_GENERIC_CODETAG_LDS_H */
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 5dd3a61d673d..c9997dc50c50 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -50,6 +50,8 @@
  *               [__nosave_begin, __nosave_end] for the nosave data
  */
 
+#include <asm-generic/codetag.lds.h>
+
 #ifndef LOAD_OFFSET
 #define LOAD_OFFSET 0
 #endif
@@ -366,6 +368,7 @@
 	. = ALIGN(8);							\
 	BOUNDED_SECTION_BY(__dyndbg_classes, ___dyndbg_classes)		\
 	BOUNDED_SECTION_BY(__dyndbg, ___dyndbg)				\
+	CODETAG_SECTIONS()						\
 	LIKELY_PROFILE()		       				\
 	BRANCH_PROFILE()						\
 	TRACE_PRINTKS()							\
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
new file mode 100644
index 000000000000..be3ba955846c
--- /dev/null
+++ b/include/linux/alloc_tag.h
@@ -0,0 +1,144 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * allocation tagging
+ */
+#ifndef _LINUX_ALLOC_TAG_H
+#define _LINUX_ALLOC_TAG_H
+
+#include <linux/bug.h>
+#include <linux/codetag.h>
+#include <linux/container_of.h>
+#include <linux/preempt.h>
+#include <asm/percpu.h>
+#include <linux/cpumask.h>
+#include <linux/static_key.h>
+
+struct alloc_tag_counters {
+	u64 bytes;
+	u64 calls;
+};
+
+/*
+ * An instance of this structure is created in a special ELF section at every
+ * allocation callsite. At runtime, the special section is treated as
+ * an array of these. Embedded codetag utilizes codetag framework.
+ */
+struct alloc_tag {
+	struct codetag			ct;
+	struct alloc_tag_counters __percpu	*counters;
+} __aligned(8);
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+
+static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct)
+{
+	return container_of(ct, struct alloc_tag, ct);
+}
+
+#ifdef ARCH_NEEDS_WEAK_PER_CPU
+/*
+ * When percpu variables are required to be defined as weak, static percpu
+ * variables can't be used inside a function (see comments for DECLARE_PER_CPU_SECTION).
+ */
+#error "Memory allocation profiling is incompatible with ARCH_NEEDS_WEAK_PER_CPU"
+#endif
+
+#define DEFINE_ALLOC_TAG(_alloc_tag)						\
+	static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);	\
+	static struct alloc_tag _alloc_tag __used __aligned(8)			\
+	__section("alloc_tags") = {						\
+		.ct = CODE_TAG_INIT,						\
+		.counters = &_alloc_tag_cntr };
+
+DECLARE_STATIC_KEY_MAYBE(CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT,
+			mem_alloc_profiling_key);
+
+static inline bool mem_alloc_profiling_enabled(void)
+{
+	return static_branch_maybe(CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT,
+				   &mem_alloc_profiling_key);
+}
+
+static inline struct alloc_tag_counters alloc_tag_read(struct alloc_tag *tag)
+{
+	struct alloc_tag_counters v = { 0, 0 };
+	struct alloc_tag_counters *counter;
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		counter = per_cpu_ptr(tag->counters, cpu);
+		v.bytes += counter->bytes;
+		v.calls += counter->calls;
+	}
+
+	return v;
+}
+
+static inline void __alloc_tag_sub(union codetag_ref *ref, size_t bytes)
+{
+	struct alloc_tag *tag;
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+	WARN_ONCE(ref && !ref->ct, "alloc_tag was not set\n");
+#endif
+	if (!ref || !ref->ct)
+		return;
+
+	tag = ct_to_alloc_tag(ref->ct);
+
+	this_cpu_sub(tag->counters->bytes, bytes);
+	this_cpu_dec(tag->counters->calls);
+
+	ref->ct = NULL;
+}
+
+static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
+{
+	__alloc_tag_sub(ref, bytes);
+}
+
+static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes)
+{
+	__alloc_tag_sub(ref, bytes);
+}
+
+static inline void alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag *tag)
+{
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+	WARN_ONCE(ref && ref->ct,
+		  "alloc_tag was not cleared (got tag for %s:%u)\n",\
+		  ref->ct->filename, ref->ct->lineno);
+
+	WARN_ONCE(!tag, "current->alloc_tag not set");
+#endif
+	if (!ref || !tag)
+		return;
+
+	ref->ct = &tag->ct;
+	/*
+	 * We need in increment the call counter every time we have a new
+	 * allocation or when we split a large allocation into smaller ones.
+	 * Each new reference for every sub-allocation needs to increment call
+	 * counter because when we free each part the counter will be decremented.
+	 */
+	this_cpu_inc(tag->counters->calls);
+}
+
+static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag, size_t bytes)
+{
+	alloc_tag_ref_set(ref, tag);
+	this_cpu_add(tag->counters->bytes, bytes);
+}
+
+#else /* CONFIG_MEM_ALLOC_PROFILING */
+
+#define DEFINE_ALLOC_TAG(_alloc_tag)
+static inline bool mem_alloc_profiling_enabled(void) { return false; }
+static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) {}
+static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes) {}
+static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
+				 size_t bytes) {}
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING */
+
+#endif /* _LINUX_ALLOC_TAG_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ffe8f618ab86..eede1f92bcc6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -770,6 +770,10 @@ struct task_struct {
 	unsigned int			flags;
 	unsigned int			ptrace;
 
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+	struct alloc_tag		*alloc_tag;
+#endif
+
 #ifdef CONFIG_SMP
 	int				on_cpu;
 	struct __call_single_node	wake_entry;
@@ -810,6 +814,7 @@ struct task_struct {
 	struct task_group		*sched_task_group;
 #endif
 
+
 #ifdef CONFIG_UCLAMP_TASK
 	/*
 	 * Clamp values requested for a scheduling entity.
@@ -2183,4 +2188,23 @@ static inline int sched_core_idle_cpu(int cpu) { return idle_cpu(cpu); }
 
 extern void sched_set_stop_task(int cpu, struct task_struct *stop);
 
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+static inline struct alloc_tag *alloc_tag_save(struct alloc_tag *tag)
+{
+	swap(current->alloc_tag, tag);
+	return tag;
+}
+
+static inline void alloc_tag_restore(struct alloc_tag *tag, struct alloc_tag *old)
+{
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+	WARN(current->alloc_tag != tag, "current->alloc_tag was changed:\n");
+#endif
+	current->alloc_tag = old;
+}
+#else
+#define alloc_tag_save(_tag)			NULL
+#define alloc_tag_restore(_tag, _old)		do {} while (0)
+#endif
+
 #endif
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 0be2d00c3696..78d258ca508f 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -972,6 +972,31 @@ config CODE_TAGGING
 	bool
 	select KALLSYMS
 
+config MEM_ALLOC_PROFILING
+	bool "Enable memory allocation profiling"
+	default n
+	depends on PROC_FS
+	depends on !DEBUG_FORCE_WEAK_PER_CPU
+	select CODE_TAGGING
+	help
+	  Track allocation source code and record total allocation size
+	  initiated at that code location. The mechanism can be used to track
+	  memory leaks with a low performance and memory impact.
+
+config MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
+	bool "Enable memory allocation profiling by default"
+	default y
+	depends on MEM_ALLOC_PROFILING
+
+config MEM_ALLOC_PROFILING_DEBUG
+	bool "Memory allocation profiler debugging"
+	default n
+	depends on MEM_ALLOC_PROFILING
+	select MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
+	help
+	  Adds warnings with helpful error messages for memory allocation
+	  profiling.
+
 source "lib/Kconfig.kasan"
 source "lib/Kconfig.kfence"
 source "lib/Kconfig.kmsan"
diff --git a/lib/Makefile b/lib/Makefile
index 6b48b22fdfac..859112f09bf5 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -236,6 +236,8 @@ obj-$(CONFIG_OF_RECONFIG_NOTIFIER_ERROR_INJECT) += \
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
 
 obj-$(CONFIG_CODE_TAGGING) += codetag.o
+obj-$(CONFIG_MEM_ALLOC_PROFILING) += alloc_tag.o
+
 lib-$(CONFIG_GENERIC_BUG) += bug.o
 
 obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
new file mode 100644
index 000000000000..f09c8a422bc2
--- /dev/null
+++ b/lib/alloc_tag.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/alloc_tag.h>
+#include <linux/fs.h>
+#include <linux/gfp.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_buf.h>
+#include <linux/seq_file.h>
+
+static struct codetag_type *alloc_tag_cttype;
+
+DEFINE_STATIC_KEY_MAYBE(CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT,
+			mem_alloc_profiling_key);
+
+static void *allocinfo_start(struct seq_file *m, loff_t *pos)
+{
+	struct codetag_iterator *iter;
+	struct codetag *ct;
+	loff_t node = *pos;
+
+	iter = kzalloc(sizeof(*iter), GFP_KERNEL);
+	m->private = iter;
+	if (!iter)
+		return NULL;
+
+	codetag_lock_module_list(alloc_tag_cttype, true);
+	*iter = codetag_get_ct_iter(alloc_tag_cttype);
+	while ((ct = codetag_next_ct(iter)) != NULL && node)
+		node--;
+
+	return ct ? iter : NULL;
+}
+
+static void *allocinfo_next(struct seq_file *m, void *arg, loff_t *pos)
+{
+	struct codetag_iterator *iter = (struct codetag_iterator *)arg;
+	struct codetag *ct = codetag_next_ct(iter);
+
+	(*pos)++;
+	if (!ct)
+		return NULL;
+
+	return iter;
+}
+
+static void allocinfo_stop(struct seq_file *m, void *arg)
+{
+	struct codetag_iterator *iter = (struct codetag_iterator *)m->private;
+
+	if (iter) {
+		codetag_lock_module_list(alloc_tag_cttype, false);
+		kfree(iter);
+	}
+}
+
+static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct)
+{
+	struct alloc_tag *tag = ct_to_alloc_tag(ct);
+	struct alloc_tag_counters counter = alloc_tag_read(tag);
+	s64 bytes = counter.bytes;
+
+	seq_buf_printf(out, "%12lli %8llu ", bytes, counter.calls);
+	codetag_to_text(out, ct);
+	seq_buf_putc(out, ' ');
+	seq_buf_putc(out, '\n');
+}
+
+static int allocinfo_show(struct seq_file *m, void *arg)
+{
+	struct codetag_iterator *iter = (struct codetag_iterator *)arg;
+	char *bufp;
+	size_t n = seq_get_buf(m, &bufp);
+	struct seq_buf buf;
+
+	seq_buf_init(&buf, bufp, n);
+	alloc_tag_to_text(&buf, iter->ct);
+	seq_commit(m, seq_buf_used(&buf));
+	return 0;
+}
+
+static const struct seq_operations allocinfo_seq_op = {
+	.start	= allocinfo_start,
+	.next	= allocinfo_next,
+	.stop	= allocinfo_stop,
+	.show	= allocinfo_show,
+};
+
+static void __init procfs_init(void)
+{
+	proc_create_seq("allocinfo", 0444, NULL, &allocinfo_seq_op);
+}
+
+static bool alloc_tag_module_unload(struct codetag_type *cttype,
+				    struct codetag_module *cmod)
+{
+	struct codetag_iterator iter = codetag_get_ct_iter(cttype);
+	struct alloc_tag_counters counter;
+	bool module_unused = true;
+	struct alloc_tag *tag;
+	struct codetag *ct;
+
+	for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) {
+		if (iter.cmod != cmod)
+			continue;
+
+		tag = ct_to_alloc_tag(ct);
+		counter = alloc_tag_read(tag);
+
+		if (WARN(counter.bytes,
+			 "%s:%u module %s func:%s has %llu allocated at module unload",
+			 ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
+			module_unused = false;
+	}
+
+	return module_unused;
+}
+
+static struct ctl_table memory_allocation_profiling_sysctls[] = {
+	{
+		.procname	= "mem_profiling",
+		.data		= &mem_alloc_profiling_key,
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+		.mode		= 0444,
+#else
+		.mode		= 0644,
+#endif
+		.proc_handler	= proc_do_static_key,
+	},
+	{ }
+};
+
+static int __init alloc_tag_init(void)
+{
+	const struct codetag_type_desc desc = {
+		.section	= "alloc_tags",
+		.tag_size	= sizeof(struct alloc_tag),
+		.module_unload	= alloc_tag_module_unload,
+	};
+
+	alloc_tag_cttype = codetag_register_type(&desc);
+	if (IS_ERR_OR_NULL(alloc_tag_cttype))
+		return PTR_ERR(alloc_tag_cttype);
+
+	register_sysctl_init("vm", memory_allocation_profiling_sysctls);
+	procfs_init();
+
+	return 0;
+}
+module_init(alloc_tag_init);
diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index bf5bcf2836d8..45c67a0994f3 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -9,6 +9,8 @@
 #define DISCARD_EH_FRAME	*(.eh_frame)
 #endif
 
+#include <asm-generic/codetag.lds.h>
+
 SECTIONS {
 	/DISCARD/ : {
 		*(.discard)
@@ -47,12 +49,17 @@ SECTIONS {
 	.data : {
 		*(.data .data.[0-9a-zA-Z_]*)
 		*(.data..L*)
+		CODETAG_SECTIONS()
 	}
 
 	.rodata : {
 		*(.rodata .rodata.[0-9a-zA-Z_]*)
 		*(.rodata..L*)
 	}
+#else
+	.data : {
+		CODETAG_SECTIONS()
+	}
 #endif
 }
 
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 15/36] lib: introduce support for page allocation tagging
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (13 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-26 17:07   ` Vlastimil Babka
  2024-02-21 19:40 ` [PATCH v4 16/36] mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation tags Suren Baghdasaryan
                   ` (21 subsequent siblings)
  36 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Introduce helper functions to easily instrument page allocators by
storing a pointer to the allocation tag associated with the code that
allocated the page in a page_ext field.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 include/linux/page_ext.h    |  1 -
 include/linux/pgalloc_tag.h | 78 +++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug           |  1 +
 lib/alloc_tag.c             | 17 ++++++++
 mm/mm_init.c                |  1 +
 mm/page_alloc.c             |  4 ++
 mm/page_ext.c               |  4 ++
 7 files changed, 105 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/pgalloc_tag.h

diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index be98564191e6..07e0656898f9 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -4,7 +4,6 @@
 
 #include <linux/types.h>
 #include <linux/stacktrace.h>
-#include <linux/stackdepot.h>
 
 struct pglist_data;
 
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
new file mode 100644
index 000000000000..b49ab955300f
--- /dev/null
+++ b/include/linux/pgalloc_tag.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * page allocation tagging
+ */
+#ifndef _LINUX_PGALLOC_TAG_H
+#define _LINUX_PGALLOC_TAG_H
+
+#include <linux/alloc_tag.h>
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+
+#include <linux/page_ext.h>
+
+extern struct page_ext_operations page_alloc_tagging_ops;
+extern struct page_ext *page_ext_get(struct page *page);
+extern void page_ext_put(struct page_ext *page_ext);
+
+static inline union codetag_ref *codetag_ref_from_page_ext(struct page_ext *page_ext)
+{
+	return (void *)page_ext + page_alloc_tagging_ops.offset;
+}
+
+static inline struct page_ext *page_ext_from_codetag_ref(union codetag_ref *ref)
+{
+	return (void *)ref - page_alloc_tagging_ops.offset;
+}
+
+/* Should be called only if mem_alloc_profiling_enabled() */
+static inline union codetag_ref *get_page_tag_ref(struct page *page)
+{
+	if (page) {
+		struct page_ext *page_ext = page_ext_get(page);
+
+		if (page_ext)
+			return codetag_ref_from_page_ext(page_ext);
+	}
+	return NULL;
+}
+
+static inline void put_page_tag_ref(union codetag_ref *ref)
+{
+	page_ext_put(page_ext_from_codetag_ref(ref));
+}
+
+static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
+				   unsigned int order)
+{
+	if (mem_alloc_profiling_enabled()) {
+		union codetag_ref *ref = get_page_tag_ref(page);
+
+		if (ref) {
+			alloc_tag_add(ref, task->alloc_tag, PAGE_SIZE << order);
+			put_page_tag_ref(ref);
+		}
+	}
+}
+
+static inline void pgalloc_tag_sub(struct page *page, unsigned int order)
+{
+	if (mem_alloc_profiling_enabled()) {
+		union codetag_ref *ref = get_page_tag_ref(page);
+
+		if (ref) {
+			alloc_tag_sub(ref, PAGE_SIZE << order);
+			put_page_tag_ref(ref);
+		}
+	}
+}
+
+#else /* CONFIG_MEM_ALLOC_PROFILING */
+
+static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
+				   unsigned int order) {}
+static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {}
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING */
+
+#endif /* _LINUX_PGALLOC_TAG_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 78d258ca508f..7bbdb0ddb011 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -978,6 +978,7 @@ config MEM_ALLOC_PROFILING
 	depends on PROC_FS
 	depends on !DEBUG_FORCE_WEAK_PER_CPU
 	select CODE_TAGGING
+	select PAGE_EXTENSION
 	help
 	  Track allocation source code and record total allocation size
 	  initiated at that code location. The mechanism can be used to track
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index f09c8a422bc2..cb5adec4b2e2 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -3,6 +3,7 @@
 #include <linux/fs.h>
 #include <linux/gfp.h>
 #include <linux/module.h>
+#include <linux/page_ext.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_buf.h>
 #include <linux/seq_file.h>
@@ -115,6 +116,22 @@ static bool alloc_tag_module_unload(struct codetag_type *cttype,
 	return module_unused;
 }
 
+static __init bool need_page_alloc_tagging(void)
+{
+	return true;
+}
+
+static __init void init_page_alloc_tagging(void)
+{
+}
+
+struct page_ext_operations page_alloc_tagging_ops = {
+	.size = sizeof(union codetag_ref),
+	.need = need_page_alloc_tagging,
+	.init = init_page_alloc_tagging,
+};
+EXPORT_SYMBOL(page_alloc_tagging_ops);
+
 static struct ctl_table memory_allocation_profiling_sysctls[] = {
 	{
 		.procname	= "mem_profiling",
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 2c19f5515e36..e9ea2919d02d 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -24,6 +24,7 @@
 #include <linux/page_ext.h>
 #include <linux/pti.h>
 #include <linux/pgtable.h>
+#include <linux/stackdepot.h>
 #include <linux/swap.h>
 #include <linux/cma.h>
 #include <linux/crash_dump.h>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 150d4f23b010..edb79a55a252 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -53,6 +53,7 @@
 #include <linux/khugepaged.h>
 #include <linux/delayacct.h>
 #include <linux/cacheinfo.h>
+#include <linux/pgalloc_tag.h>
 #include <asm/div64.h>
 #include "internal.h"
 #include "shuffle.h"
@@ -1100,6 +1101,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 		/* Do not let hwpoison pages hit pcplists/buddy */
 		reset_page_owner(page, order);
 		page_table_check_free(page, order);
+		pgalloc_tag_sub(page, order);
 		return false;
 	}
 
@@ -1139,6 +1141,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 	reset_page_owner(page, order);
 	page_table_check_free(page, order);
+	pgalloc_tag_sub(page, order);
 
 	if (!PageHighMem(page)) {
 		debug_check_no_locks_freed(page_address(page),
@@ -1532,6 +1535,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 
 	set_page_owner(page, order, gfp_flags);
 	page_table_check_alloc(page, order);
+	pgalloc_tag_add(page, current, order);
 }
 
 static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 4548fcc66d74..3c58fe8a24df 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -10,6 +10,7 @@
 #include <linux/page_idle.h>
 #include <linux/page_table_check.h>
 #include <linux/rcupdate.h>
+#include <linux/pgalloc_tag.h>
 
 /*
  * struct page extension
@@ -82,6 +83,9 @@ static struct page_ext_operations *page_ext_ops[] __initdata = {
 #if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT)
 	&page_idle_ops,
 #endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+	&page_alloc_tagging_ops,
+#endif
 #ifdef CONFIG_PAGE_TABLE_CHECK
 	&page_table_check_ops,
 #endif
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 16/36] mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation tags
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (14 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 15/36] lib: introduce support for page allocation tagging Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 17/36] change alloc_pages name in dma_map_ops to avoid name conflicts Suren Baghdasaryan
                   ` (20 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

As each allocation tag generates a per-cpu variable, more space is required
to store them. Increase PERCPU_MODULE_RESERVE to provide enough area. A
better long-term solution would be to allocate this memory dynamically.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
---
 include/linux/percpu.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 8c677f185901..62b5eb45bd89 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -14,7 +14,11 @@
 
 /* enough to cover all DEFINE_PER_CPUs in modules */
 #ifdef CONFIG_MODULES
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+#define PERCPU_MODULE_RESERVE		(8 << 12)
+#else
 #define PERCPU_MODULE_RESERVE		(8 << 10)
+#endif
 #else
 #define PERCPU_MODULE_RESERVE		0
 #endif
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 17/36] change alloc_pages name in dma_map_ops to avoid name conflicts
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (15 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 16/36] mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation tags Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 18/36] mm: enable page allocation tagging Suren Baghdasaryan
                   ` (19 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

After redefining alloc_pages, all uses of that name are being replaced.
Change the conflicting names to prevent preprocessor from replacing them
when it's not intended.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 arch/alpha/kernel/pci_iommu.c           | 2 +-
 arch/mips/jazz/jazzdma.c                | 2 +-
 arch/powerpc/kernel/dma-iommu.c         | 2 +-
 arch/powerpc/platforms/ps3/system-bus.c | 4 ++--
 arch/powerpc/platforms/pseries/vio.c    | 2 +-
 arch/x86/kernel/amd_gart_64.c           | 2 +-
 drivers/iommu/dma-iommu.c               | 2 +-
 drivers/parisc/ccio-dma.c               | 2 +-
 drivers/parisc/sba_iommu.c              | 2 +-
 drivers/xen/grant-dma-ops.c             | 2 +-
 drivers/xen/swiotlb-xen.c               | 2 +-
 include/linux/dma-map-ops.h             | 2 +-
 kernel/dma/mapping.c                    | 4 ++--
 13 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index c81183935e97..7fcf3e9b7103 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -929,7 +929,7 @@ const struct dma_map_ops alpha_pci_ops = {
 	.dma_supported		= alpha_pci_supported,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
-	.alloc_pages		= dma_common_alloc_pages,
+	.alloc_pages_op		= dma_common_alloc_pages,
 	.free_pages		= dma_common_free_pages,
 };
 EXPORT_SYMBOL(alpha_pci_ops);
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index eabddb89d221..c97b089b9902 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -617,7 +617,7 @@ const struct dma_map_ops jazz_dma_ops = {
 	.sync_sg_for_device	= jazz_dma_sync_sg_for_device,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
-	.alloc_pages		= dma_common_alloc_pages,
+	.alloc_pages_op		= dma_common_alloc_pages,
 	.free_pages		= dma_common_free_pages,
 };
 EXPORT_SYMBOL(jazz_dma_ops);
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 8920862ffd79..f0ae39e77e37 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -216,6 +216,6 @@ const struct dma_map_ops dma_iommu_ops = {
 	.get_required_mask	= dma_iommu_get_required_mask,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
-	.alloc_pages		= dma_common_alloc_pages,
+	.alloc_pages_op		= dma_common_alloc_pages,
 	.free_pages		= dma_common_free_pages,
 };
diff --git a/arch/powerpc/platforms/ps3/system-bus.c b/arch/powerpc/platforms/ps3/system-bus.c
index d6b5f5ecd515..56dc6b29a3e7 100644
--- a/arch/powerpc/platforms/ps3/system-bus.c
+++ b/arch/powerpc/platforms/ps3/system-bus.c
@@ -695,7 +695,7 @@ static const struct dma_map_ops ps3_sb_dma_ops = {
 	.unmap_page = ps3_unmap_page,
 	.mmap = dma_common_mmap,
 	.get_sgtable = dma_common_get_sgtable,
-	.alloc_pages = dma_common_alloc_pages,
+	.alloc_pages_op = dma_common_alloc_pages,
 	.free_pages = dma_common_free_pages,
 };
 
@@ -709,7 +709,7 @@ static const struct dma_map_ops ps3_ioc0_dma_ops = {
 	.unmap_page = ps3_unmap_page,
 	.mmap = dma_common_mmap,
 	.get_sgtable = dma_common_get_sgtable,
-	.alloc_pages = dma_common_alloc_pages,
+	.alloc_pages_op = dma_common_alloc_pages,
 	.free_pages = dma_common_free_pages,
 };
 
diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c
index 2dc9cbc4bcd8..0c90fc4c3796 100644
--- a/arch/powerpc/platforms/pseries/vio.c
+++ b/arch/powerpc/platforms/pseries/vio.c
@@ -611,7 +611,7 @@ static const struct dma_map_ops vio_dma_mapping_ops = {
 	.get_required_mask = dma_iommu_get_required_mask,
 	.mmap		   = dma_common_mmap,
 	.get_sgtable	   = dma_common_get_sgtable,
-	.alloc_pages	   = dma_common_alloc_pages,
+	.alloc_pages_op	   = dma_common_alloc_pages,
 	.free_pages	   = dma_common_free_pages,
 };
 
diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index 2ae98f754e59..c884deca839b 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -676,7 +676,7 @@ static const struct dma_map_ops gart_dma_ops = {
 	.get_sgtable			= dma_common_get_sgtable,
 	.dma_supported			= dma_direct_supported,
 	.get_required_mask		= dma_direct_get_required_mask,
-	.alloc_pages			= dma_direct_alloc_pages,
+	.alloc_pages_op			= dma_direct_alloc_pages,
 	.free_pages			= dma_direct_free_pages,
 };
 
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 50ccc4f1ef81..8a1f7f5d1bca 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1710,7 +1710,7 @@ static const struct dma_map_ops iommu_dma_ops = {
 	.flags			= DMA_F_PCI_P2PDMA_SUPPORTED,
 	.alloc			= iommu_dma_alloc,
 	.free			= iommu_dma_free,
-	.alloc_pages		= dma_common_alloc_pages,
+	.alloc_pages_op		= dma_common_alloc_pages,
 	.free_pages		= dma_common_free_pages,
 	.alloc_noncontiguous	= iommu_dma_alloc_noncontiguous,
 	.free_noncontiguous	= iommu_dma_free_noncontiguous,
diff --git a/drivers/parisc/ccio-dma.c b/drivers/parisc/ccio-dma.c
index 9ce0d20a6c58..feef537257d0 100644
--- a/drivers/parisc/ccio-dma.c
+++ b/drivers/parisc/ccio-dma.c
@@ -1022,7 +1022,7 @@ static const struct dma_map_ops ccio_ops = {
 	.map_sg =		ccio_map_sg,
 	.unmap_sg =		ccio_unmap_sg,
 	.get_sgtable =		dma_common_get_sgtable,
-	.alloc_pages =		dma_common_alloc_pages,
+	.alloc_pages_op =	dma_common_alloc_pages,
 	.free_pages =		dma_common_free_pages,
 };
 
diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c
index 784037837f65..fc3863c09f83 100644
--- a/drivers/parisc/sba_iommu.c
+++ b/drivers/parisc/sba_iommu.c
@@ -1090,7 +1090,7 @@ static const struct dma_map_ops sba_ops = {
 	.map_sg =		sba_map_sg,
 	.unmap_sg =		sba_unmap_sg,
 	.get_sgtable =		dma_common_get_sgtable,
-	.alloc_pages =		dma_common_alloc_pages,
+	.alloc_pages_op =	dma_common_alloc_pages,
 	.free_pages =		dma_common_free_pages,
 };
 
diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
index 76f6f26265a3..29257d2639db 100644
--- a/drivers/xen/grant-dma-ops.c
+++ b/drivers/xen/grant-dma-ops.c
@@ -282,7 +282,7 @@ static int xen_grant_dma_supported(struct device *dev, u64 mask)
 static const struct dma_map_ops xen_grant_dma_ops = {
 	.alloc = xen_grant_dma_alloc,
 	.free = xen_grant_dma_free,
-	.alloc_pages = xen_grant_dma_alloc_pages,
+	.alloc_pages_op = xen_grant_dma_alloc_pages,
 	.free_pages = xen_grant_dma_free_pages,
 	.mmap = dma_common_mmap,
 	.get_sgtable = dma_common_get_sgtable,
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 0e6c6c25d154..1c4ef5111651 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -403,7 +403,7 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
 	.dma_supported = xen_swiotlb_dma_supported,
 	.mmap = dma_common_mmap,
 	.get_sgtable = dma_common_get_sgtable,
-	.alloc_pages = dma_common_alloc_pages,
+	.alloc_pages_op = dma_common_alloc_pages,
 	.free_pages = dma_common_free_pages,
 	.max_mapping_size = swiotlb_max_mapping_size,
 };
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 4abc60f04209..9ee319851b5f 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -29,7 +29,7 @@ struct dma_map_ops {
 			unsigned long attrs);
 	void (*free)(struct device *dev, size_t size, void *vaddr,
 			dma_addr_t dma_handle, unsigned long attrs);
-	struct page *(*alloc_pages)(struct device *dev, size_t size,
+	struct page *(*alloc_pages_op)(struct device *dev, size_t size,
 			dma_addr_t *dma_handle, enum dma_data_direction dir,
 			gfp_t gfp);
 	void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 58db8fd70471..5e2d51e1cdf6 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -570,9 +570,9 @@ static struct page *__dma_alloc_pages(struct device *dev, size_t size,
 	size = PAGE_ALIGN(size);
 	if (dma_alloc_direct(dev, ops))
 		return dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
-	if (!ops->alloc_pages)
+	if (!ops->alloc_pages_op)
 		return NULL;
-	return ops->alloc_pages(dev, size, dma_handle, dir, gfp);
+	return ops->alloc_pages_op(dev, size, dma_handle, dir, gfp);
 }
 
 struct page *dma_alloc_pages(struct device *dev, size_t size,
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 18/36] mm: enable page allocation tagging
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (16 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 17/36] change alloc_pages name in dma_map_ops to avoid name conflicts Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 19/36] mm: create new codetag references during page splitting Suren Baghdasaryan
                   ` (18 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Redefine page allocators to record allocation tags upon their invocation.
Instrument post_alloc_hook and free_pages_prepare to modify current
allocation tag.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 include/linux/alloc_tag.h |  14 +++++
 include/linux/gfp.h       | 126 ++++++++++++++++++++++++--------------
 include/linux/pagemap.h   |   9 ++-
 mm/compaction.c           |   7 ++-
 mm/filemap.c              |   6 +-
 mm/mempolicy.c            |  52 ++++++++--------
 mm/page_alloc.c           |  60 +++++++++---------
 7 files changed, 164 insertions(+), 110 deletions(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index be3ba955846c..86ed5d24a030 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -141,4 +141,18 @@ static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
 
 #endif /* CONFIG_MEM_ALLOC_PROFILING */
 
+#define alloc_hooks_tag(_tag, _do_alloc)				\
+({									\
+	struct alloc_tag * __maybe_unused _old = alloc_tag_save(_tag);	\
+	typeof(_do_alloc) _res = _do_alloc;				\
+	alloc_tag_restore(_tag, _old);					\
+	_res;								\
+})
+
+#define alloc_hooks(_do_alloc)						\
+({									\
+	DEFINE_ALLOC_TAG(_alloc_tag);					\
+	alloc_hooks_tag(&_alloc_tag, _do_alloc);			\
+})
+
 #endif /* _LINUX_ALLOC_TAG_H */
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index de292a007138..bc0fd5259b0b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -6,6 +6,8 @@
 
 #include <linux/mmzone.h>
 #include <linux/topology.h>
+#include <linux/alloc_tag.h>
+#include <linux/sched.h>
 
 struct vm_area_struct;
 struct mempolicy;
@@ -175,42 +177,46 @@ static inline void arch_free_page(struct page *page, int order) { }
 static inline void arch_alloc_page(struct page *page, int order) { }
 #endif
 
-struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
+struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
 		nodemask_t *nodemask);
-struct folio *__folio_alloc(gfp_t gfp, unsigned int order, int preferred_nid,
+#define __alloc_pages(...)			alloc_hooks(__alloc_pages_noprof(__VA_ARGS__))
+
+struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
 		nodemask_t *nodemask);
+#define __folio_alloc(...)			alloc_hooks(__folio_alloc_noprof(__VA_ARGS__))
 
-unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
+unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
 				nodemask_t *nodemask, int nr_pages,
 				struct list_head *page_list,
 				struct page **page_array);
+#define __alloc_pages_bulk(...)			alloc_hooks(alloc_pages_bulk_noprof(__VA_ARGS__))
 
-unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp,
+unsigned long alloc_pages_bulk_array_mempolicy_noprof(gfp_t gfp,
 				unsigned long nr_pages,
 				struct page **page_array);
+#define  alloc_pages_bulk_array_mempolicy(...)				\
+	alloc_hooks(alloc_pages_bulk_array_mempolicy_noprof(__VA_ARGS__))
 
 /* Bulk allocate order-0 pages */
-static inline unsigned long
-alloc_pages_bulk_list(gfp_t gfp, unsigned long nr_pages, struct list_head *list)
-{
-	return __alloc_pages_bulk(gfp, numa_mem_id(), NULL, nr_pages, list, NULL);
-}
+#define alloc_pages_bulk_list(_gfp, _nr_pages, _list)			\
+	__alloc_pages_bulk(_gfp, numa_mem_id(), NULL, _nr_pages, _list, NULL)
 
-static inline unsigned long
-alloc_pages_bulk_array(gfp_t gfp, unsigned long nr_pages, struct page **page_array)
-{
-	return __alloc_pages_bulk(gfp, numa_mem_id(), NULL, nr_pages, NULL, page_array);
-}
+#define alloc_pages_bulk_array(_gfp, _nr_pages, _page_array)		\
+	__alloc_pages_bulk(_gfp, numa_mem_id(), NULL, _nr_pages, NULL, _page_array)
 
 static inline unsigned long
-alloc_pages_bulk_array_node(gfp_t gfp, int nid, unsigned long nr_pages, struct page **page_array)
+alloc_pages_bulk_array_node_noprof(gfp_t gfp, int nid, unsigned long nr_pages,
+				   struct page **page_array)
 {
 	if (nid == NUMA_NO_NODE)
 		nid = numa_mem_id();
 
-	return __alloc_pages_bulk(gfp, nid, NULL, nr_pages, NULL, page_array);
+	return alloc_pages_bulk_noprof(gfp, nid, NULL, nr_pages, NULL, page_array);
 }
 
+#define alloc_pages_bulk_array_node(...)				\
+	alloc_hooks(alloc_pages_bulk_array_node_noprof(__VA_ARGS__))
+
 static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask)
 {
 	gfp_t warn_gfp = gfp_mask & (__GFP_THISNODE|__GFP_NOWARN);
@@ -230,82 +236,104 @@ static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask)
  * online. For more general interface, see alloc_pages_node().
  */
 static inline struct page *
-__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
+__alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order)
 {
 	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
 	warn_if_node_offline(nid, gfp_mask);
 
-	return __alloc_pages(gfp_mask, order, nid, NULL);
+	return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
 }
 
+#define  __alloc_pages_node(...)		alloc_hooks(__alloc_pages_node_noprof(__VA_ARGS__))
+
 static inline
-struct folio *__folio_alloc_node(gfp_t gfp, unsigned int order, int nid)
+struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
 {
 	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
 	warn_if_node_offline(nid, gfp);
 
-	return __folio_alloc(gfp, order, nid, NULL);
+	return __folio_alloc_noprof(gfp, order, nid, NULL);
 }
 
+#define  __folio_alloc_node(...)		alloc_hooks(__folio_alloc_node_noprof(__VA_ARGS__))
+
 /*
  * Allocate pages, preferring the node given as nid. When nid == NUMA_NO_NODE,
  * prefer the current CPU's closest node. Otherwise node must be valid and
  * online.
  */
-static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
-						unsigned int order)
+static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
+						   unsigned int order)
 {
 	if (nid == NUMA_NO_NODE)
 		nid = numa_mem_id();
 
-	return __alloc_pages_node(nid, gfp_mask, order);
+	return __alloc_pages_node_noprof(nid, gfp_mask, order);
 }
 
+#define  alloc_pages_node(...)			alloc_hooks(alloc_pages_node_noprof(__VA_ARGS__))
+
 #ifdef CONFIG_NUMA
-struct page *alloc_pages(gfp_t gfp, unsigned int order);
-struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
+struct page *alloc_pages_noprof(gfp_t gfp, unsigned int order);
+struct page *alloc_pages_mpol_noprof(gfp_t gfp, unsigned int order,
 		struct mempolicy *mpol, pgoff_t ilx, int nid);
-struct folio *folio_alloc(gfp_t gfp, unsigned int order);
-struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
+struct folio *folio_alloc_noprof(gfp_t gfp, unsigned int order);
+struct folio *vma_alloc_folio_noprof(gfp_t gfp, int order, struct vm_area_struct *vma,
 		unsigned long addr, bool hugepage);
 #else
-static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order)
+static inline struct page *alloc_pages_noprof(gfp_t gfp_mask, unsigned int order)
 {
-	return alloc_pages_node(numa_node_id(), gfp_mask, order);
+	return alloc_pages_node_noprof(numa_node_id(), gfp_mask, order);
 }
-static inline struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
+static inline struct page *alloc_pages_mpol_noprof(gfp_t gfp, unsigned int order,
 		struct mempolicy *mpol, pgoff_t ilx, int nid)
 {
-	return alloc_pages(gfp, order);
+	return alloc_pages_noprof(gfp, order);
 }
-static inline struct folio *folio_alloc(gfp_t gfp, unsigned int order)
+static inline struct folio *folio_alloc_noprof(gfp_t gfp, unsigned int order)
 {
 	return __folio_alloc_node(gfp, order, numa_node_id());
 }
-#define vma_alloc_folio(gfp, order, vma, addr, hugepage)		\
-	folio_alloc(gfp, order)
+#define vma_alloc_folio_noprof(gfp, order, vma, addr, hugepage)		\
+	folio_alloc_noprof(gfp, order)
 #endif
+
+#define alloc_pages(...)			alloc_hooks(alloc_pages_noprof(__VA_ARGS__))
+#define alloc_pages_mpol(...)			alloc_hooks(alloc_pages_mpol_noprof(__VA_ARGS__))
+#define folio_alloc(...)			alloc_hooks(folio_alloc_noprof(__VA_ARGS__))
+#define vma_alloc_folio(...)			alloc_hooks(vma_alloc_folio_noprof(__VA_ARGS__))
+
 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
-static inline struct page *alloc_page_vma(gfp_t gfp,
+
+static inline struct page *alloc_page_vma_noprof(gfp_t gfp,
 		struct vm_area_struct *vma, unsigned long addr)
 {
-	struct folio *folio = vma_alloc_folio(gfp, 0, vma, addr, false);
+	struct folio *folio = vma_alloc_folio_noprof(gfp, 0, vma, addr, false);
 
 	return &folio->page;
 }
+#define alloc_page_vma(...)			alloc_hooks(alloc_page_vma_noprof(__VA_ARGS__))
+
+extern unsigned long get_free_pages_noprof(gfp_t gfp_mask, unsigned int order);
+#define __get_free_pages(...)			alloc_hooks(get_free_pages_noprof(__VA_ARGS__))
 
-extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
-extern unsigned long get_zeroed_page(gfp_t gfp_mask);
+extern unsigned long get_zeroed_page_noprof(gfp_t gfp_mask);
+#define get_zeroed_page(...)			alloc_hooks(get_zeroed_page_noprof(__VA_ARGS__))
+
+void *alloc_pages_exact_noprof(size_t size, gfp_t gfp_mask) __alloc_size(1);
+#define alloc_pages_exact(...)			alloc_hooks(alloc_pages_exact_noprof(__VA_ARGS__))
 
-void *alloc_pages_exact(size_t size, gfp_t gfp_mask) __alloc_size(1);
 void free_pages_exact(void *virt, size_t size);
-__meminit void *alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask) __alloc_size(2);
 
-#define __get_free_page(gfp_mask) \
-		__get_free_pages((gfp_mask), 0)
+__meminit void *alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mask) __alloc_size(2);
+#define alloc_pages_exact_nid(...)					\
+	alloc_hooks(alloc_pages_exact_nid_noprof(__VA_ARGS__))
+
+#define __get_free_page(gfp_mask)					\
+	__get_free_pages((gfp_mask), 0)
 
-#define __get_dma_pages(gfp_mask, order) \
-		__get_free_pages((gfp_mask) | GFP_DMA, (order))
+#define __get_dma_pages(gfp_mask, order)				\
+	__get_free_pages((gfp_mask) | GFP_DMA, (order))
 
 extern void __free_pages(struct page *page, unsigned int order);
 extern void free_pages(unsigned long addr, unsigned int order);
@@ -357,10 +385,14 @@ extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
 
 #ifdef CONFIG_CONTIG_ALLOC
 /* The below functions must be run on a range from a single zone. */
-extern int alloc_contig_range(unsigned long start, unsigned long end,
+extern int alloc_contig_range_noprof(unsigned long start, unsigned long end,
 			      unsigned migratetype, gfp_t gfp_mask);
-extern struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
-				       int nid, nodemask_t *nodemask);
+#define alloc_contig_range(...)			alloc_hooks(alloc_contig_range_noprof(__VA_ARGS__))
+
+extern struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
+					      int nid, nodemask_t *nodemask);
+#define alloc_contig_pages(...)			alloc_hooks(alloc_contig_pages_noprof(__VA_ARGS__))
+
 #endif
 void free_contig_range(unsigned long pfn, unsigned long nr_pages);
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 2df35e65557d..35636e67e2e1 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -542,14 +542,17 @@ static inline void *detach_page_private(struct page *page)
 #endif
 
 #ifdef CONFIG_NUMA
-struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order);
+struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
 #else
-static inline struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order)
+static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
 {
-	return folio_alloc(gfp, order);
+	return folio_alloc_noprof(gfp, order);
 }
 #endif
 
+#define filemap_alloc_folio(...)				\
+	alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__))
+
 static inline struct page *__page_cache_alloc(gfp_t gfp)
 {
 	return &filemap_alloc_folio(gfp, 0)->page;
diff --git a/mm/compaction.c b/mm/compaction.c
index 4add68d40e8d..f4c0e682c979 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1781,7 +1781,7 @@ static void isolate_freepages(struct compact_control *cc)
  * This is a migrate-callback that "allocates" freepages by taking pages
  * from the isolated freelists in the block we are migrating to.
  */
-static struct folio *compaction_alloc(struct folio *src, unsigned long data)
+static struct folio *compaction_alloc_noprof(struct folio *src, unsigned long data)
 {
 	struct compact_control *cc = (struct compact_control *)data;
 	struct folio *dst;
@@ -1800,6 +1800,11 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
 	return dst;
 }
 
+static struct folio *compaction_alloc(struct folio *src, unsigned long data)
+{
+	return alloc_hooks(compaction_alloc_noprof(src, data));
+}
+
 /*
  * This is a migrate-callback that "frees" freepages back to the isolated
  * freelist.  All pages on the freelist are from the same zone, so there is no
diff --git a/mm/filemap.c b/mm/filemap.c
index 750e779c23db..e51e474545ad 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -957,7 +957,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
 EXPORT_SYMBOL_GPL(filemap_add_folio);
 
 #ifdef CONFIG_NUMA
-struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order)
+struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
 {
 	int n;
 	struct folio *folio;
@@ -972,9 +972,9 @@ struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order)
 
 		return folio;
 	}
-	return folio_alloc(gfp, order);
+	return folio_alloc_noprof(gfp, order);
 }
-EXPORT_SYMBOL(filemap_alloc_folio);
+EXPORT_SYMBOL(filemap_alloc_folio_noprof);
 #endif
 
 /*
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 10a590ee1c89..c329d00b975f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2070,15 +2070,15 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
 	 */
 	preferred_gfp = gfp | __GFP_NOWARN;
 	preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
-	page = __alloc_pages(preferred_gfp, order, nid, nodemask);
+	page = __alloc_pages_noprof(preferred_gfp, order, nid, nodemask);
 	if (!page)
-		page = __alloc_pages(gfp, order, nid, NULL);
+		page = __alloc_pages_noprof(gfp, order, nid, NULL);
 
 	return page;
 }
 
 /**
- * alloc_pages_mpol - Allocate pages according to NUMA mempolicy.
+ * alloc_pages_mpol_noprof - Allocate pages according to NUMA mempolicy.
  * @gfp: GFP flags.
  * @order: Order of the page allocation.
  * @pol: Pointer to the NUMA mempolicy.
@@ -2087,7 +2087,7 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
  *
  * Return: The page on success or NULL if allocation fails.
  */
-struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
+struct page *alloc_pages_mpol_noprof(gfp_t gfp, unsigned int order,
 		struct mempolicy *pol, pgoff_t ilx, int nid)
 {
 	nodemask_t *nodemask;
@@ -2117,7 +2117,7 @@ struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
 			 * First, try to allocate THP only on local node, but
 			 * don't reclaim unnecessarily, just compact.
 			 */
-			page = __alloc_pages_node(nid,
+			page = __alloc_pages_node_noprof(nid,
 				gfp | __GFP_THISNODE | __GFP_NORETRY, order);
 			if (page || !(gfp & __GFP_DIRECT_RECLAIM))
 				return page;
@@ -2130,7 +2130,7 @@ struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
 		}
 	}
 
-	page = __alloc_pages(gfp, order, nid, nodemask);
+	page = __alloc_pages_noprof(gfp, order, nid, nodemask);
 
 	if (unlikely(pol->mode == MPOL_INTERLEAVE) && page) {
 		/* skip NUMA_INTERLEAVE_HIT update if numa stats is disabled */
@@ -2146,7 +2146,7 @@ struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
 }
 
 /**
- * vma_alloc_folio - Allocate a folio for a VMA.
+ * vma_alloc_folio_noprof - Allocate a folio for a VMA.
  * @gfp: GFP flags.
  * @order: Order of the folio.
  * @vma: Pointer to VMA.
@@ -2161,7 +2161,7 @@ struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
  *
  * Return: The folio on success or NULL if allocation fails.
  */
-struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
+struct folio *vma_alloc_folio_noprof(gfp_t gfp, int order, struct vm_area_struct *vma,
 		unsigned long addr, bool hugepage)
 {
 	struct mempolicy *pol;
@@ -2169,15 +2169,15 @@ struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
 	struct page *page;
 
 	pol = get_vma_policy(vma, addr, order, &ilx);
-	page = alloc_pages_mpol(gfp | __GFP_COMP, order,
-				pol, ilx, numa_node_id());
+	page = alloc_pages_mpol_noprof(gfp | __GFP_COMP, order,
+				       pol, ilx, numa_node_id());
 	mpol_cond_put(pol);
 	return page_rmappable_folio(page);
 }
-EXPORT_SYMBOL(vma_alloc_folio);
+EXPORT_SYMBOL(vma_alloc_folio_noprof);
 
 /**
- * alloc_pages - Allocate pages.
+ * alloc_pages_noprof - Allocate pages.
  * @gfp: GFP flags.
  * @order: Power of two of number of pages to allocate.
  *
@@ -2190,7 +2190,7 @@ EXPORT_SYMBOL(vma_alloc_folio);
  * flags are used.
  * Return: The page on success or NULL if allocation fails.
  */
-struct page *alloc_pages(gfp_t gfp, unsigned int order)
+struct page *alloc_pages_noprof(gfp_t gfp, unsigned int order)
 {
 	struct mempolicy *pol = &default_policy;
 
@@ -2201,16 +2201,16 @@ struct page *alloc_pages(gfp_t gfp, unsigned int order)
 	if (!in_interrupt() && !(gfp & __GFP_THISNODE))
 		pol = get_task_policy(current);
 
-	return alloc_pages_mpol(gfp, order,
-				pol, NO_INTERLEAVE_INDEX, numa_node_id());
+	return alloc_pages_mpol_noprof(gfp, order, pol, NO_INTERLEAVE_INDEX,
+				       numa_node_id());
 }
-EXPORT_SYMBOL(alloc_pages);
+EXPORT_SYMBOL(alloc_pages_noprof);
 
-struct folio *folio_alloc(gfp_t gfp, unsigned int order)
+struct folio *folio_alloc_noprof(gfp_t gfp, unsigned int order)
 {
-	return page_rmappable_folio(alloc_pages(gfp | __GFP_COMP, order));
+	return page_rmappable_folio(alloc_pages_noprof(gfp | __GFP_COMP, order));
 }
-EXPORT_SYMBOL(folio_alloc);
+EXPORT_SYMBOL(folio_alloc_noprof);
 
 static unsigned long alloc_pages_bulk_array_interleave(gfp_t gfp,
 		struct mempolicy *pol, unsigned long nr_pages,
@@ -2229,13 +2229,13 @@ static unsigned long alloc_pages_bulk_array_interleave(gfp_t gfp,
 
 	for (i = 0; i < nodes; i++) {
 		if (delta) {
-			nr_allocated = __alloc_pages_bulk(gfp,
+			nr_allocated = alloc_pages_bulk_noprof(gfp,
 					interleave_nodes(pol), NULL,
 					nr_pages_per_node + 1, NULL,
 					page_array);
 			delta--;
 		} else {
-			nr_allocated = __alloc_pages_bulk(gfp,
+			nr_allocated = alloc_pages_bulk_noprof(gfp,
 					interleave_nodes(pol), NULL,
 					nr_pages_per_node, NULL, page_array);
 		}
@@ -2257,11 +2257,11 @@ static unsigned long alloc_pages_bulk_array_preferred_many(gfp_t gfp, int nid,
 	preferred_gfp = gfp | __GFP_NOWARN;
 	preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
 
-	nr_allocated  = __alloc_pages_bulk(preferred_gfp, nid, &pol->nodes,
+	nr_allocated  = alloc_pages_bulk_noprof(preferred_gfp, nid, &pol->nodes,
 					   nr_pages, NULL, page_array);
 
 	if (nr_allocated < nr_pages)
-		nr_allocated += __alloc_pages_bulk(gfp, numa_node_id(), NULL,
+		nr_allocated += alloc_pages_bulk_noprof(gfp, numa_node_id(), NULL,
 				nr_pages - nr_allocated, NULL,
 				page_array + nr_allocated);
 	return nr_allocated;
@@ -2273,7 +2273,7 @@ static unsigned long alloc_pages_bulk_array_preferred_many(gfp_t gfp, int nid,
  * It can accelerate memory allocation especially interleaving
  * allocate memory.
  */
-unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp,
+unsigned long alloc_pages_bulk_array_mempolicy_noprof(gfp_t gfp,
 		unsigned long nr_pages, struct page **page_array)
 {
 	struct mempolicy *pol = &default_policy;
@@ -2293,8 +2293,8 @@ unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp,
 
 	nid = numa_node_id();
 	nodemask = policy_nodemask(gfp, pol, NO_INTERLEAVE_INDEX, &nid);
-	return __alloc_pages_bulk(gfp, nid, nodemask,
-				  nr_pages, NULL, page_array);
+	return alloc_pages_bulk_noprof(gfp, nid, nodemask,
+				       nr_pages, NULL, page_array);
 }
 
 int vma_dup_policy(struct vm_area_struct *src, struct vm_area_struct *dst)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index edb79a55a252..58c0e8b948a4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4380,7 +4380,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
  *
  * Returns the number of pages on the list or array.
  */
-unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
+unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
 			nodemask_t *nodemask, int nr_pages,
 			struct list_head *page_list,
 			struct page **page_array)
@@ -4516,7 +4516,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	pcp_trylock_finish(UP_flags);
 
 failed:
-	page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
+	page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask);
 	if (page) {
 		if (page_list)
 			list_add(&page->lru, page_list);
@@ -4527,13 +4527,13 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 
 	goto out;
 }
-EXPORT_SYMBOL_GPL(__alloc_pages_bulk);
+EXPORT_SYMBOL_GPL(alloc_pages_bulk_noprof);
 
 /*
  * This is the 'heart' of the zoned buddy allocator.
  */
-struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
-							nodemask_t *nodemask)
+struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
+				      int preferred_nid, nodemask_t *nodemask)
 {
 	struct page *page;
 	unsigned int alloc_flags = ALLOC_WMARK_LOW;
@@ -4595,38 +4595,38 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
 
 	return page;
 }
-EXPORT_SYMBOL(__alloc_pages);
+EXPORT_SYMBOL(__alloc_pages_noprof);
 
-struct folio *__folio_alloc(gfp_t gfp, unsigned int order, int preferred_nid,
+struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
 		nodemask_t *nodemask)
 {
-	struct page *page = __alloc_pages(gfp | __GFP_COMP, order,
+	struct page *page = __alloc_pages_noprof(gfp | __GFP_COMP, order,
 					preferred_nid, nodemask);
 	return page_rmappable_folio(page);
 }
-EXPORT_SYMBOL(__folio_alloc);
+EXPORT_SYMBOL(__folio_alloc_noprof);
 
 /*
  * Common helper functions. Never use with __GFP_HIGHMEM because the returned
  * address cannot represent highmem pages. Use alloc_pages and then kmap if
  * you need to access high mem.
  */
-unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order)
+unsigned long get_free_pages_noprof(gfp_t gfp_mask, unsigned int order)
 {
 	struct page *page;
 
-	page = alloc_pages(gfp_mask & ~__GFP_HIGHMEM, order);
+	page = alloc_pages_noprof(gfp_mask & ~__GFP_HIGHMEM, order);
 	if (!page)
 		return 0;
 	return (unsigned long) page_address(page);
 }
-EXPORT_SYMBOL(__get_free_pages);
+EXPORT_SYMBOL(get_free_pages_noprof);
 
-unsigned long get_zeroed_page(gfp_t gfp_mask)
+unsigned long get_zeroed_page_noprof(gfp_t gfp_mask)
 {
-	return __get_free_page(gfp_mask | __GFP_ZERO);
+	return get_free_pages_noprof(gfp_mask | __GFP_ZERO, 0);
 }
-EXPORT_SYMBOL(get_zeroed_page);
+EXPORT_SYMBOL(get_zeroed_page_noprof);
 
 /**
  * __free_pages - Free pages allocated with alloc_pages().
@@ -4818,7 +4818,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
 }
 
 /**
- * alloc_pages_exact - allocate an exact number physically-contiguous pages.
+ * alloc_pages_exact_noprof - allocate an exact number physically-contiguous pages.
  * @size: the number of bytes to allocate
  * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
  *
@@ -4832,7 +4832,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
  *
  * Return: pointer to the allocated area or %NULL in case of error.
  */
-void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
+void *alloc_pages_exact_noprof(size_t size, gfp_t gfp_mask)
 {
 	unsigned int order = get_order(size);
 	unsigned long addr;
@@ -4840,13 +4840,13 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
 	if (WARN_ON_ONCE(gfp_mask & (__GFP_COMP | __GFP_HIGHMEM)))
 		gfp_mask &= ~(__GFP_COMP | __GFP_HIGHMEM);
 
-	addr = __get_free_pages(gfp_mask, order);
+	addr = get_free_pages_noprof(gfp_mask, order);
 	return make_alloc_exact(addr, order, size);
 }
-EXPORT_SYMBOL(alloc_pages_exact);
+EXPORT_SYMBOL(alloc_pages_exact_noprof);
 
 /**
- * alloc_pages_exact_nid - allocate an exact number of physically-contiguous
+ * alloc_pages_exact_nid_noprof - allocate an exact number of physically-contiguous
  *			   pages on a node.
  * @nid: the preferred node ID where memory should be allocated
  * @size: the number of bytes to allocate
@@ -4857,7 +4857,7 @@ EXPORT_SYMBOL(alloc_pages_exact);
  *
  * Return: pointer to the allocated area or %NULL in case of error.
  */
-void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
+void * __meminit alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mask)
 {
 	unsigned int order = get_order(size);
 	struct page *p;
@@ -4865,7 +4865,7 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
 	if (WARN_ON_ONCE(gfp_mask & (__GFP_COMP | __GFP_HIGHMEM)))
 		gfp_mask &= ~(__GFP_COMP | __GFP_HIGHMEM);
 
-	p = alloc_pages_node(nid, gfp_mask, order);
+	p = alloc_pages_node_noprof(nid, gfp_mask, order);
 	if (!p)
 		return NULL;
 	return make_alloc_exact((unsigned long)page_address(p), order, size);
@@ -6283,7 +6283,7 @@ int __alloc_contig_migrate_range(struct compact_control *cc,
 }
 
 /**
- * alloc_contig_range() -- tries to allocate given range of pages
+ * alloc_contig_range_noprof() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
  * @migratetype:	migratetype of the underlying pageblocks (either
@@ -6303,7 +6303,7 @@ int __alloc_contig_migrate_range(struct compact_control *cc,
  * pages which PFN is in [start, end) are allocated for the caller and
  * need to be freed with free_contig_range().
  */
-int alloc_contig_range(unsigned long start, unsigned long end,
+int alloc_contig_range_noprof(unsigned long start, unsigned long end,
 		       unsigned migratetype, gfp_t gfp_mask)
 {
 	unsigned long outer_start, outer_end;
@@ -6427,15 +6427,15 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	undo_isolate_page_range(start, end, migratetype);
 	return ret;
 }
-EXPORT_SYMBOL(alloc_contig_range);
+EXPORT_SYMBOL(alloc_contig_range_noprof);
 
 static int __alloc_contig_pages(unsigned long start_pfn,
 				unsigned long nr_pages, gfp_t gfp_mask)
 {
 	unsigned long end_pfn = start_pfn + nr_pages;
 
-	return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE,
-				  gfp_mask);
+	return alloc_contig_range_noprof(start_pfn, end_pfn, MIGRATE_MOVABLE,
+				   gfp_mask);
 }
 
 static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
@@ -6470,7 +6470,7 @@ static bool zone_spans_last_pfn(const struct zone *zone,
 }
 
 /**
- * alloc_contig_pages() -- tries to find and allocate contiguous range of pages
+ * alloc_contig_pages_noprof() -- tries to find and allocate contiguous range of pages
  * @nr_pages:	Number of contiguous pages to allocate
  * @gfp_mask:	GFP mask to limit search and used during compaction
  * @nid:	Target node
@@ -6490,8 +6490,8 @@ static bool zone_spans_last_pfn(const struct zone *zone,
  *
  * Return: pointer to contiguous pages on success, or NULL if not successful.
  */
-struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
-				int nid, nodemask_t *nodemask)
+struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
+				 int nid, nodemask_t *nodemask)
 {
 	unsigned long ret, pfn, flags;
 	struct zonelist *zonelist;
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 19/36] mm: create new codetag references during page splitting
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (17 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 18/36] mm: enable page allocation tagging Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-27 10:11   ` Vlastimil Babka
  2024-02-21 19:40 ` [PATCH v4 20/36] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y Suren Baghdasaryan
                   ` (17 subsequent siblings)
  36 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

When a high-order page is split into smaller ones, each newly split
page should get its codetag. The original codetag is reused for these
pages but it's recorded as 0-byte allocation because original codetag
already accounts for the original high-order allocated page.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/pgalloc_tag.h | 30 ++++++++++++++++++++++++++++++
 mm/huge_memory.c            |  2 ++
 mm/page_alloc.c             |  2 ++
 3 files changed, 34 insertions(+)

diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index b49ab955300f..9e6ad8e0e4aa 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -67,11 +67,41 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int order)
 	}
 }
 
+static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
+{
+	int i;
+	struct page_ext *page_ext;
+	union codetag_ref *ref;
+	struct alloc_tag *tag;
+
+	if (!mem_alloc_profiling_enabled())
+		return;
+
+	page_ext = page_ext_get(page);
+	if (unlikely(!page_ext))
+		return;
+
+	ref = codetag_ref_from_page_ext(page_ext);
+	if (!ref->ct)
+		goto out;
+
+	tag = ct_to_alloc_tag(ref->ct);
+	page_ext = page_ext_next(page_ext);
+	for (i = 1; i < nr; i++) {
+		/* Set new reference to point to the original tag */
+		alloc_tag_ref_set(codetag_ref_from_page_ext(page_ext), tag);
+		page_ext = page_ext_next(page_ext);
+	}
+out:
+	page_ext_put(page_ext);
+}
+
 #else /* CONFIG_MEM_ALLOC_PROFILING */
 
 static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
 				   unsigned int order) {}
 static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {}
+static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {}
 
 #endif /* CONFIG_MEM_ALLOC_PROFILING */
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 94c958f7ebb5..86daae671319 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -38,6 +38,7 @@
 #include <linux/sched/sysctl.h>
 #include <linux/memory-tiers.h>
 #include <linux/compat.h>
+#include <linux/pgalloc_tag.h>
 
 #include <asm/tlb.h>
 #include <asm/pgalloc.h>
@@ -2899,6 +2900,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 	/* Caller disabled irqs, so they are still disabled here */
 
 	split_page_owner(head, nr);
+	pgalloc_tag_split(head, nr);
 
 	/* See comment in __split_huge_page_tail() */
 	if (PageAnon(head)) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 58c0e8b948a4..4bc5b4720fee 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2621,6 +2621,7 @@ void split_page(struct page *page, unsigned int order)
 	for (i = 1; i < (1 << order); i++)
 		set_page_refcounted(page + i);
 	split_page_owner(page, 1 << order);
+	pgalloc_tag_split(page, 1 << order);
 	split_page_memcg(page, 1 << order);
 }
 EXPORT_SYMBOL_GPL(split_page);
@@ -4806,6 +4807,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
 		struct page *last = page + nr;
 
 		split_page_owner(page, 1 << order);
+		pgalloc_tag_split(page, 1 << order);
 		split_page_memcg(page, 1 << order);
 		while (page < --last)
 			set_page_refcounted(last);
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 20/36] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (18 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 19/36] mm: create new codetag references during page splitting Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-27 10:18   ` Vlastimil Babka
  2024-02-21 19:40 ` [PATCH v4 21/36] lib: add codetag reference into slabobj_ext Suren Baghdasaryan
                   ` (16 subsequent siblings)
  36 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

For all page allocations to be tagged, page_ext has to be initialized
before the first page allocation. Early tasks allocate their stacks
using page allocator before alloc_node_page_ext() initializes page_ext
area, unless early_page_ext is enabled. Therefore these allocations will
generate a warning when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled.
Enable early_page_ext whenever CONFIG_MEM_ALLOC_PROFILING_DEBUG=y to
ensure page_ext initialization prior to any page allocation. This will
have all the negative effects associated with early_page_ext, such as
possible longer boot time, therefore we enable it only when debugging
with CONFIG_MEM_ALLOC_PROFILING_DEBUG enabled and not universally for
CONFIG_MEM_ALLOC_PROFILING.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/page_ext.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/page_ext.c b/mm/page_ext.c
index 3c58fe8a24df..e7d8f1a5589e 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -95,7 +95,16 @@ unsigned long page_ext_size;
 
 static unsigned long total_usage;
 
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+/*
+ * To ensure correct allocation tagging for pages, page_ext should be available
+ * before the first page allocation. Otherwise early task stacks will be
+ * allocated before page_ext initialization and missing tags will be flagged.
+ */
+bool early_page_ext __meminitdata = true;
+#else
 bool early_page_ext __meminitdata;
+#endif
 static int __init setup_early_page_ext(char *str)
 {
 	early_page_ext = true;
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 21/36] lib: add codetag reference into slabobj_ext
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (19 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 20/36] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-27 10:19   ` Vlastimil Babka
  2024-02-21 19:40 ` [PATCH v4 22/36] mm/slab: add allocation accounting into slab allocation and free paths Suren Baghdasaryan
                   ` (15 subsequent siblings)
  36 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

To store code tag for every slab object, a codetag reference is embedded
into slabobj_ext when CONFIG_MEM_ALLOC_PROFILING=y.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 include/linux/memcontrol.h | 5 +++++
 lib/Kconfig.debug          | 1 +
 2 files changed, 6 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f3584e98b640..2b010316016c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1653,7 +1653,12 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
  * if MEMCG_DATA_OBJEXTS is set.
  */
 struct slabobj_ext {
+#ifdef CONFIG_MEMCG_KMEM
 	struct obj_cgroup *objcg;
+#endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+	union codetag_ref ref;
+#endif
 } __aligned(8);
 
 static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 7bbdb0ddb011..9ecfcdb54417 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -979,6 +979,7 @@ config MEM_ALLOC_PROFILING
 	depends on !DEBUG_FORCE_WEAK_PER_CPU
 	select CODE_TAGGING
 	select PAGE_EXTENSION
+	select SLAB_OBJ_EXT
 	help
 	  Track allocation source code and record total allocation size
 	  initiated at that code location. The mechanism can be used to track
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 22/36] mm/slab: add allocation accounting into slab allocation and free paths
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (20 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 21/36] lib: add codetag reference into slabobj_ext Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-27 13:07   ` Vlastimil Babka
  2024-02-21 19:40 ` [PATCH v4 23/36] mm/slab: enable slab allocation tagging for kmalloc and friends Suren Baghdasaryan
                   ` (14 subsequent siblings)
  36 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Account slab allocations using codetag reference embedded into slabobj_ext.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 mm/slab.h | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/slub.c |  9 ++++++++
 2 files changed, 75 insertions(+)

diff --git a/mm/slab.h b/mm/slab.h
index 13b6ba2abd74..c4bd0d5348cb 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -567,6 +567,46 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
 int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 			gfp_t gfp, bool new_slab);
 
+static inline bool need_slab_obj_ext(void)
+{
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+	if (mem_alloc_profiling_enabled())
+		return true;
+#endif
+	/*
+	 * CONFIG_MEMCG_KMEM creates vector of obj_cgroup objects conditionally
+	 * inside memcg_slab_post_alloc_hook. No other users for now.
+	 */
+	return false;
+}
+
+static inline struct slabobj_ext *
+prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
+{
+	struct slab *slab;
+
+	if (!p)
+		return NULL;
+
+	if (!need_slab_obj_ext())
+		return NULL;
+
+	if (s->flags & SLAB_NO_OBJ_EXT)
+		return NULL;
+
+	if (flags & __GFP_NO_OBJ_EXT)
+		return NULL;
+
+	slab = virt_to_slab(p);
+	if (!slab_obj_exts(slab) &&
+	    WARN(alloc_slab_obj_exts(slab, s, flags, false),
+		 "%s, %s: Failed to create slab extension vector!\n",
+		 __func__, s->name))
+		return NULL;
+
+	return slab_obj_exts(slab) + obj_to_index(s, slab, p);
+}
+
 #else /* CONFIG_SLAB_OBJ_EXT */
 
 static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
@@ -589,6 +629,32 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
 
 #endif /* CONFIG_SLAB_OBJ_EXT */
 
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+
+static inline void alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab,
+					void **p, int objects)
+{
+	struct slabobj_ext *obj_exts;
+	int i;
+
+	obj_exts = slab_obj_exts(slab);
+	if (!obj_exts)
+		return;
+
+	for (i = 0; i < objects; i++) {
+		unsigned int off = obj_to_index(s, slab, p[i]);
+
+		alloc_tag_sub(&obj_exts[off].ref, s->size);
+	}
+}
+
+#else
+
+static inline void alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab,
+					void **p, int objects) {}
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING */
+
 #ifdef CONFIG_MEMCG_KMEM
 void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 		     enum node_stat_item idx, int nr);
diff --git a/mm/slub.c b/mm/slub.c
index 5dc7beda6c0d..a69b6b4c8df6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3826,6 +3826,7 @@ void slab_post_alloc_hook(struct kmem_cache *s,	struct obj_cgroup *objcg,
 			  unsigned int orig_size)
 {
 	unsigned int zero_size = s->object_size;
+	struct slabobj_ext *obj_exts;
 	bool kasan_init = init;
 	size_t i;
 	gfp_t init_flags = flags & gfp_allowed_mask;
@@ -3868,6 +3869,12 @@ void slab_post_alloc_hook(struct kmem_cache *s,	struct obj_cgroup *objcg,
 		kmemleak_alloc_recursive(p[i], s->object_size, 1,
 					 s->flags, init_flags);
 		kmsan_slab_alloc(s, p[i], init_flags);
+		obj_exts = prepare_slab_obj_exts_hook(s, flags, p[i]);
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+		/* obj_exts can be allocated for other reasons */
+		if (likely(obj_exts) && mem_alloc_profiling_enabled())
+			alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
+#endif
 	}
 
 	memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
@@ -4346,6 +4353,7 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
 	       unsigned long addr)
 {
 	memcg_slab_free_hook(s, slab, &object, 1);
+	alloc_tagging_slab_free_hook(s, slab, &object, 1);
 
 	if (likely(slab_free_hook(s, object, slab_want_init_on_free(s))))
 		do_slab_free(s, slab, object, object, 1, addr);
@@ -4356,6 +4364,7 @@ void slab_free_bulk(struct kmem_cache *s, struct slab *slab, void *head,
 		    void *tail, void **p, int cnt, unsigned long addr)
 {
 	memcg_slab_free_hook(s, slab, p, cnt);
+	alloc_tagging_slab_free_hook(s, slab, p, cnt);
 	/*
 	 * With KASAN enabled slab_free_freelist_hook modifies the freelist
 	 * to remove objects, whose reuse must be delayed.
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 23/36] mm/slab: enable slab allocation tagging for kmalloc and friends
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (21 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 22/36] mm/slab: add allocation accounting into slab allocation and free paths Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 24/36] rust: Add a rust helper for krealloc() Suren Baghdasaryan
                   ` (13 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Redefine kmalloc, krealloc, kzalloc, kcalloc, etc. to record allocations
and deallocations done by these functions.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 include/linux/fortify-string.h |   5 +-
 include/linux/slab.h           | 169 +++++++++++++++++----------------
 include/linux/string.h         |   4 +-
 mm/slab_common.c               |   6 +-
 mm/slub.c                      |  52 +++++-----
 mm/util.c                      |  20 ++--
 6 files changed, 130 insertions(+), 126 deletions(-)

diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h
index 89a6888f2f9e..55f66bd8a366 100644
--- a/include/linux/fortify-string.h
+++ b/include/linux/fortify-string.h
@@ -697,9 +697,9 @@ __FORTIFY_INLINE void *memchr_inv(const void * const POS0 p, int c, size_t size)
 	return __real_memchr_inv(p, c, size);
 }
 
-extern void *__real_kmemdup(const void *src, size_t len, gfp_t gfp) __RENAME(kmemdup)
+extern void *__real_kmemdup(const void *src, size_t len, gfp_t gfp) __RENAME(kmemdup_noprof)
 								    __realloc_size(2);
-__FORTIFY_INLINE void *kmemdup(const void * const POS0 p, size_t size, gfp_t gfp)
+__FORTIFY_INLINE void *kmemdup_noprof(const void * const POS0 p, size_t size, gfp_t gfp)
 {
 	const size_t p_size = __struct_size(p);
 
@@ -709,6 +709,7 @@ __FORTIFY_INLINE void *kmemdup(const void * const POS0 p, size_t size, gfp_t gfp
 		fortify_panic(__func__);
 	return __real_kmemdup(p, size, gfp);
 }
+#define kmemdup(...)	alloc_hooks(kmemdup_noprof(__VA_ARGS__))
 
 /**
  * strcpy - Copy a string into another string buffer
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 58794043ab5b..61e2a486d529 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -229,7 +229,10 @@ int kmem_cache_shrink(struct kmem_cache *s);
 /*
  * Common kmalloc functions provided by all allocators
  */
-void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2);
+void * __must_check krealloc_noprof(const void *objp, size_t new_size,
+				    gfp_t flags) __realloc_size(2);
+#define krealloc(...)				alloc_hooks(krealloc_noprof(__VA_ARGS__))
+
 void kfree(const void *objp);
 void kfree_sensitive(const void *objp);
 size_t __ksize(const void *objp);
@@ -481,7 +484,10 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
 static_assert(PAGE_SHIFT <= 20);
 #define kmalloc_index(s) __kmalloc_index(s, true)
 
-void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1);
+#include <linux/alloc_tag.h>
+
+void *__kmalloc_noprof(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1);
+#define __kmalloc(...)				alloc_hooks(__kmalloc_noprof(__VA_ARGS__))
 
 /**
  * kmem_cache_alloc - Allocate an object
@@ -493,9 +499,14 @@ void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_siz
  *
  * Return: pointer to the new object or %NULL in case of error
  */
-void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags) __assume_slab_alignment __malloc;
-void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
-			   gfp_t gfpflags) __assume_slab_alignment __malloc;
+void *kmem_cache_alloc_noprof(struct kmem_cache *cachep,
+			      gfp_t flags) __assume_slab_alignment __malloc;
+#define kmem_cache_alloc(...)			alloc_hooks(kmem_cache_alloc_noprof(__VA_ARGS__))
+
+void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
+			    gfp_t gfpflags) __assume_slab_alignment __malloc;
+#define kmem_cache_alloc_lru(...)	alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
+
 void kmem_cache_free(struct kmem_cache *s, void *objp);
 
 /*
@@ -506,29 +517,40 @@ void kmem_cache_free(struct kmem_cache *s, void *objp);
  * Note that interrupts must be enabled when calling these functions.
  */
 void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p);
-int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p);
+
+int kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags, size_t size, void **p);
+#define kmem_cache_alloc_bulk(...)	alloc_hooks(kmem_cache_alloc_bulk_noprof(__VA_ARGS__))
 
 static __always_inline void kfree_bulk(size_t size, void **p)
 {
 	kmem_cache_free_bulk(NULL, size, p);
 }
 
-void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment
+void *__kmalloc_node_noprof(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment
 							 __alloc_size(1);
-void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment
-									 __malloc;
+#define __kmalloc_node(...)			alloc_hooks(__kmalloc_node_noprof(__VA_ARGS__))
+
+void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t flags,
+				   int node) __assume_slab_alignment __malloc;
+#define kmem_cache_alloc_node(...)	alloc_hooks(kmem_cache_alloc_node_noprof(__VA_ARGS__))
 
-void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
+void *kmalloc_trace_noprof(struct kmem_cache *s, gfp_t flags, size_t size)
 		    __assume_kmalloc_alignment __alloc_size(3);
 
-void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
-			 int node, size_t size) __assume_kmalloc_alignment
+void *kmalloc_node_trace_noprof(struct kmem_cache *s, gfp_t gfpflags,
+		int node, size_t size) __assume_kmalloc_alignment
 						__alloc_size(4);
-void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
+#define kmalloc_trace(...)			alloc_hooks(kmalloc_trace_noprof(__VA_ARGS__))
+
+#define kmalloc_node_trace(...)			alloc_hooks(kmalloc_node_trace_noprof(__VA_ARGS__))
+
+void *kmalloc_large_noprof(size_t size, gfp_t flags) __assume_page_alignment
 					      __alloc_size(1);
+#define kmalloc_large(...)			alloc_hooks(kmalloc_large_noprof(__VA_ARGS__))
 
-void *kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_alignment
+void *kmalloc_large_node_noprof(size_t size, gfp_t flags, int node) __assume_page_alignment
 							     __alloc_size(1);
+#define kmalloc_large_node(...)			alloc_hooks(kmalloc_large_node_noprof(__VA_ARGS__))
 
 /**
  * kmalloc - allocate kernel memory
@@ -584,37 +606,39 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_align
  *	Try really hard to succeed the allocation but fail
  *	eventually.
  */
-static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
+static __always_inline __alloc_size(1) void *kmalloc_noprof(size_t size, gfp_t flags)
 {
 	if (__builtin_constant_p(size) && size) {
 		unsigned int index;
 
 		if (size > KMALLOC_MAX_CACHE_SIZE)
-			return kmalloc_large(size, flags);
+			return kmalloc_large_noprof(size, flags);
 
 		index = kmalloc_index(size);
-		return kmalloc_trace(
+		return kmalloc_trace_noprof(
 				kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
 				flags, size);
 	}
-	return __kmalloc(size, flags);
+	return __kmalloc_noprof(size, flags);
 }
+#define kmalloc(...)				alloc_hooks(kmalloc_noprof(__VA_ARGS__))
 
-static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
+static __always_inline __alloc_size(1) void *kmalloc_node_noprof(size_t size, gfp_t flags, int node)
 {
 	if (__builtin_constant_p(size) && size) {
 		unsigned int index;
 
 		if (size > KMALLOC_MAX_CACHE_SIZE)
-			return kmalloc_large_node(size, flags, node);
+			return kmalloc_large_node_noprof(size, flags, node);
 
 		index = kmalloc_index(size);
-		return kmalloc_node_trace(
+		return kmalloc_node_trace_noprof(
 				kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
 				flags, node, size);
 	}
-	return __kmalloc_node(size, flags, node);
+	return __kmalloc_node_noprof(size, flags, node);
 }
+#define kmalloc_node(...)			alloc_hooks(kmalloc_node_noprof(__VA_ARGS__))
 
 /**
  * kmalloc_array - allocate memory for an array.
@@ -622,16 +646,17 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla
  * @size: element size.
  * @flags: the type of memory to allocate (see kmalloc).
  */
-static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_t flags)
+static inline __alloc_size(1, 2) void *kmalloc_array_noprof(size_t n, size_t size, gfp_t flags)
 {
 	size_t bytes;
 
 	if (unlikely(check_mul_overflow(n, size, &bytes)))
 		return NULL;
 	if (__builtin_constant_p(n) && __builtin_constant_p(size))
-		return kmalloc(bytes, flags);
-	return __kmalloc(bytes, flags);
+		return kmalloc_noprof(bytes, flags);
+	return kmalloc_noprof(bytes, flags);
 }
+#define kmalloc_array(...)			alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))
 
 /**
  * krealloc_array - reallocate memory for an array.
@@ -640,18 +665,19 @@ static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_
  * @new_size: new size of a single member of the array
  * @flags: the type of memory to allocate (see kmalloc)
  */
-static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p,
-								      size_t new_n,
-								      size_t new_size,
-								      gfp_t flags)
+static inline __realloc_size(2, 3) void * __must_check krealloc_array_noprof(void *p,
+								       size_t new_n,
+								       size_t new_size,
+								       gfp_t flags)
 {
 	size_t bytes;
 
 	if (unlikely(check_mul_overflow(new_n, new_size, &bytes)))
 		return NULL;
 
-	return krealloc(p, bytes, flags);
+	return krealloc_noprof(p, bytes, flags);
 }
+#define krealloc_array(...)			alloc_hooks(krealloc_array_noprof(__VA_ARGS__))
 
 /**
  * kcalloc - allocate memory for an array. The memory is set to zero.
@@ -659,16 +685,12 @@ static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p,
  * @size: element size.
  * @flags: the type of memory to allocate (see kmalloc).
  */
-static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flags)
-{
-	return kmalloc_array(n, size, flags | __GFP_ZERO);
-}
+#define kcalloc(_n, _size, _flags)		kmalloc_array(_n, _size, (_flags) | __GFP_ZERO)
 
-void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
+void *kmalloc_node_track_caller_noprof(size_t size, gfp_t flags, int node,
 				  unsigned long caller) __alloc_size(1);
-#define kmalloc_node_track_caller(size, flags, node) \
-	__kmalloc_node_track_caller(size, flags, node, \
-				    _RET_IP_)
+#define kmalloc_node_track_caller(...)		\
+	alloc_hooks(kmalloc_node_track_caller_noprof(__VA_ARGS__, _RET_IP_))
 
 /*
  * kmalloc_track_caller is a special version of kmalloc that records the
@@ -678,11 +700,9 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
  * allocator where we care about the real place the memory allocation
  * request comes from.
  */
-#define kmalloc_track_caller(size, flags) \
-	__kmalloc_node_track_caller(size, flags, \
-				    NUMA_NO_NODE, _RET_IP_)
+#define kmalloc_track_caller(...)		kmalloc_node_track_caller(__VA_ARGS__, NUMA_NO_NODE)
 
-static inline __alloc_size(1, 2) void *kmalloc_array_node(size_t n, size_t size, gfp_t flags,
+static inline __alloc_size(1, 2) void *kmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags,
 							  int node)
 {
 	size_t bytes;
@@ -690,75 +710,56 @@ static inline __alloc_size(1, 2) void *kmalloc_array_node(size_t n, size_t size,
 	if (unlikely(check_mul_overflow(n, size, &bytes)))
 		return NULL;
 	if (__builtin_constant_p(n) && __builtin_constant_p(size))
-		return kmalloc_node(bytes, flags, node);
-	return __kmalloc_node(bytes, flags, node);
+		return kmalloc_node_noprof(bytes, flags, node);
+	return __kmalloc_node_noprof(bytes, flags, node);
 }
+#define kmalloc_array_node(...)			alloc_hooks(kmalloc_array_node_noprof(__VA_ARGS__))
 
-static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t flags, int node)
-{
-	return kmalloc_array_node(n, size, flags | __GFP_ZERO, node);
-}
+#define kcalloc_node(_n, _size, _flags, _node)	\
+	kmalloc_array_node(_n, _size, (_flags) | __GFP_ZERO, _node)
 
 /*
  * Shortcuts
  */
-static inline void *kmem_cache_zalloc(struct kmem_cache *k, gfp_t flags)
-{
-	return kmem_cache_alloc(k, flags | __GFP_ZERO);
-}
+#define kmem_cache_zalloc(_k, _flags)		kmem_cache_alloc(_k, (_flags)|__GFP_ZERO)
 
 /**
  * kzalloc - allocate memory. The memory is set to zero.
  * @size: how many bytes of memory are required.
  * @flags: the type of memory to allocate (see kmalloc).
  */
-static inline __alloc_size(1) void *kzalloc(size_t size, gfp_t flags)
+static inline __alloc_size(1) void *kzalloc_noprof(size_t size, gfp_t flags)
 {
-	return kmalloc(size, flags | __GFP_ZERO);
+	return kmalloc_noprof(size, flags | __GFP_ZERO);
 }
+#define kzalloc(...)				alloc_hooks(kzalloc_noprof(__VA_ARGS__))
+#define kzalloc_node(_size, _flags, _node)	kmalloc_node(_size, (_flags)|__GFP_ZERO, _node)
 
-/**
- * kzalloc_node - allocate zeroed memory from a particular memory node.
- * @size: how many bytes of memory are required.
- * @flags: the type of memory to allocate (see kmalloc).
- * @node: memory node from which to allocate
- */
-static inline __alloc_size(1) void *kzalloc_node(size_t size, gfp_t flags, int node)
-{
-	return kmalloc_node(size, flags | __GFP_ZERO, node);
-}
+extern void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node) __alloc_size(1);
+#define kvmalloc_node(...)			alloc_hooks(kvmalloc_node_noprof(__VA_ARGS__))
 
-extern void *kvmalloc_node(size_t size, gfp_t flags, int node) __alloc_size(1);
-static inline __alloc_size(1) void *kvmalloc(size_t size, gfp_t flags)
-{
-	return kvmalloc_node(size, flags, NUMA_NO_NODE);
-}
-static inline __alloc_size(1) void *kvzalloc_node(size_t size, gfp_t flags, int node)
-{
-	return kvmalloc_node(size, flags | __GFP_ZERO, node);
-}
-static inline __alloc_size(1) void *kvzalloc(size_t size, gfp_t flags)
-{
-	return kvmalloc(size, flags | __GFP_ZERO);
-}
+#define kvmalloc(_size, _flags)			kvmalloc_node(_size, _flags, NUMA_NO_NODE)
+#define kvzalloc(_size, _flags)			kvmalloc(_size, _flags|__GFP_ZERO)
+
+#define kvzalloc_node(_size, _flags, _node)	kvmalloc_node(_size, _flags|__GFP_ZERO, _node)
 
-static inline __alloc_size(1, 2) void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
+static inline __alloc_size(1, 2) void *kvmalloc_array_noprof(size_t n, size_t size, gfp_t flags)
 {
 	size_t bytes;
 
 	if (unlikely(check_mul_overflow(n, size, &bytes)))
 		return NULL;
 
-	return kvmalloc(bytes, flags);
+	return kvmalloc_node_noprof(bytes, flags, NUMA_NO_NODE);
 }
 
-static inline __alloc_size(1, 2) void *kvcalloc(size_t n, size_t size, gfp_t flags)
-{
-	return kvmalloc_array(n, size, flags | __GFP_ZERO);
-}
+#define kvmalloc_array(...)			alloc_hooks(kvmalloc_array_noprof(__VA_ARGS__))
+#define kvcalloc(_n, _size, _flags)		kvmalloc_array(_n, _size, _flags|__GFP_ZERO)
 
-extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
+extern void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
 		      __realloc_size(3);
+#define kvrealloc(...)				alloc_hooks(kvrealloc_noprof(__VA_ARGS__))
+
 extern void kvfree(const void *addr);
 DEFINE_FREE(kvfree, void *, if (_T) kvfree(_T))
 
diff --git a/include/linux/string.h b/include/linux/string.h
index ab148d8dbfc1..14e4fb4340f4 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -214,7 +214,9 @@ extern void kfree_const(const void *x);
 extern char *kstrdup(const char *s, gfp_t gfp) __malloc;
 extern const char *kstrdup_const(const char *s, gfp_t gfp);
 extern char *kstrndup(const char *s, size_t len, gfp_t gfp);
-extern void *kmemdup(const void *src, size_t len, gfp_t gfp) __realloc_size(2);
+extern void *kmemdup_noprof(const void *src, size_t len, gfp_t gfp) __realloc_size(2);
+#define kmemdup(...)	alloc_hooks(kmemdup_noprof(__VA_ARGS__))
+
 extern void *kvmemdup(const void *src, size_t len, gfp_t gfp) __realloc_size(2);
 extern char *kmemdup_nul(const char *s, size_t len, gfp_t gfp);
 
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 238293b1dbe1..5f9e25626dc7 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1184,7 +1184,7 @@ __do_krealloc(const void *p, size_t new_size, gfp_t flags)
 		return (void *)p;
 	}
 
-	ret = kmalloc_track_caller(new_size, flags);
+	ret = kmalloc_node_track_caller_noprof(new_size, flags, NUMA_NO_NODE, _RET_IP_);
 	if (ret && p) {
 		/* Disable KASAN checks as the object's redzone is accessed. */
 		kasan_disable_current();
@@ -1208,7 +1208,7 @@ __do_krealloc(const void *p, size_t new_size, gfp_t flags)
  *
  * Return: pointer to the allocated memory or %NULL in case of error
  */
-void *krealloc(const void *p, size_t new_size, gfp_t flags)
+void *krealloc_noprof(const void *p, size_t new_size, gfp_t flags)
 {
 	void *ret;
 
@@ -1223,7 +1223,7 @@ void *krealloc(const void *p, size_t new_size, gfp_t flags)
 
 	return ret;
 }
-EXPORT_SYMBOL(krealloc);
+EXPORT_SYMBOL(krealloc_noprof);
 
 /**
  * kfree_sensitive - Clear sensitive information in memory before freeing
diff --git a/mm/slub.c b/mm/slub.c
index a69b6b4c8df6..920b24b4140e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3920,7 +3920,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list
 	return object;
 }
 
-void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
+void *kmem_cache_alloc_noprof(struct kmem_cache *s, gfp_t gfpflags)
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE, _RET_IP_,
 				    s->object_size);
@@ -3929,9 +3929,9 @@ void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
 
 	return ret;
 }
-EXPORT_SYMBOL(kmem_cache_alloc);
+EXPORT_SYMBOL(kmem_cache_alloc_noprof);
 
-void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
 			   gfp_t gfpflags)
 {
 	void *ret = slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, _RET_IP_,
@@ -3941,10 +3941,10 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
 
 	return ret;
 }
-EXPORT_SYMBOL(kmem_cache_alloc_lru);
+EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);
 
 /**
- * kmem_cache_alloc_node - Allocate an object on the specified node
+ * kmem_cache_alloc_node_noprof - Allocate an object on the specified node
  * @s: The cache to allocate from.
  * @gfpflags: See kmalloc().
  * @node: node number of the target node.
@@ -3956,7 +3956,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_lru);
  *
  * Return: pointer to the new object or %NULL in case of error
  */
-void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
+void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t gfpflags, int node)
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
 
@@ -3964,7 +3964,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
 
 	return ret;
 }
-EXPORT_SYMBOL(kmem_cache_alloc_node);
+EXPORT_SYMBOL(kmem_cache_alloc_node_noprof);
 
 /*
  * To avoid unnecessary overhead, we pass through large allocation requests
@@ -3981,7 +3981,7 @@ static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
 		flags = kmalloc_fix_flags(flags);
 
 	flags |= __GFP_COMP;
-	folio = (struct folio *)alloc_pages_node(node, flags, order);
+	folio = (struct folio *)alloc_pages_node_noprof(node, flags, order);
 	if (folio) {
 		ptr = folio_address(folio);
 		lruvec_stat_mod_folio(folio, NR_SLAB_UNRECLAIMABLE_B,
@@ -3996,7 +3996,7 @@ static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
 	return ptr;
 }
 
-void *kmalloc_large(size_t size, gfp_t flags)
+void *kmalloc_large_noprof(size_t size, gfp_t flags)
 {
 	void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
 
@@ -4004,9 +4004,9 @@ void *kmalloc_large(size_t size, gfp_t flags)
 		      flags, NUMA_NO_NODE);
 	return ret;
 }
-EXPORT_SYMBOL(kmalloc_large);
+EXPORT_SYMBOL(kmalloc_large_noprof);
 
-void *kmalloc_large_node(size_t size, gfp_t flags, int node)
+void *kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
 {
 	void *ret = __kmalloc_large_node(size, flags, node);
 
@@ -4014,7 +4014,7 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 		      flags, node);
 	return ret;
 }
-EXPORT_SYMBOL(kmalloc_large_node);
+EXPORT_SYMBOL(kmalloc_large_node_noprof);
 
 static __always_inline
 void *__do_kmalloc_node(size_t size, gfp_t flags, int node,
@@ -4041,26 +4041,26 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node,
 	return ret;
 }
 
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
+void *__kmalloc_node_noprof(size_t size, gfp_t flags, int node)
 {
 	return __do_kmalloc_node(size, flags, node, _RET_IP_);
 }
-EXPORT_SYMBOL(__kmalloc_node);
+EXPORT_SYMBOL(__kmalloc_node_noprof);
 
-void *__kmalloc(size_t size, gfp_t flags)
+void *__kmalloc_noprof(size_t size, gfp_t flags)
 {
 	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
 }
-EXPORT_SYMBOL(__kmalloc);
+EXPORT_SYMBOL(__kmalloc_noprof);
 
-void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
-				  int node, unsigned long caller)
+void *kmalloc_node_track_caller_noprof(size_t size, gfp_t flags,
+				       int node, unsigned long caller)
 {
 	return __do_kmalloc_node(size, flags, node, caller);
 }
-EXPORT_SYMBOL(__kmalloc_node_track_caller);
+EXPORT_SYMBOL(kmalloc_node_track_caller_noprof);
 
-void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
+void *kmalloc_trace_noprof(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE,
 					    _RET_IP_, size);
@@ -4070,9 +4070,9 @@ void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 	ret = kasan_kmalloc(s, ret, size, gfpflags);
 	return ret;
 }
-EXPORT_SYMBOL(kmalloc_trace);
+EXPORT_SYMBOL(kmalloc_trace_noprof);
 
-void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
+void *kmalloc_node_trace_noprof(struct kmem_cache *s, gfp_t gfpflags,
 			 int node, size_t size)
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
@@ -4082,7 +4082,7 @@ void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
 	ret = kasan_kmalloc(s, ret, size, gfpflags);
 	return ret;
 }
-EXPORT_SYMBOL(kmalloc_node_trace);
+EXPORT_SYMBOL(kmalloc_node_trace_noprof);
 
 static noinline void free_to_partial_list(
 	struct kmem_cache *s, struct slab *slab,
@@ -4691,8 +4691,8 @@ static int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
 #endif /* CONFIG_SLUB_TINY */
 
 /* Note that interrupts must be enabled when calling this function. */
-int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
-			  void **p)
+int kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags, size_t size,
+				 void **p)
 {
 	int i;
 	struct obj_cgroup *objcg = NULL;
@@ -4720,7 +4720,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 
 	return i;
 }
-EXPORT_SYMBOL(kmem_cache_alloc_bulk);
+EXPORT_SYMBOL(kmem_cache_alloc_bulk_noprof);
 
 
 /*
diff --git a/mm/util.c b/mm/util.c
index 5a6a9802583b..291f7945190f 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -115,7 +115,7 @@ char *kstrndup(const char *s, size_t max, gfp_t gfp)
 EXPORT_SYMBOL(kstrndup);
 
 /**
- * kmemdup - duplicate region of memory
+ * kmemdup_noprof - duplicate region of memory
  *
  * @src: memory region to duplicate
  * @len: memory region length
@@ -124,16 +124,16 @@ EXPORT_SYMBOL(kstrndup);
  * Return: newly allocated copy of @src or %NULL in case of error,
  * result is physically contiguous. Use kfree() to free.
  */
-void *kmemdup(const void *src, size_t len, gfp_t gfp)
+void *kmemdup_noprof(const void *src, size_t len, gfp_t gfp)
 {
 	void *p;
 
-	p = kmalloc_track_caller(len, gfp);
+	p = kmalloc_node_track_caller_noprof(len, gfp, NUMA_NO_NODE, _RET_IP_);
 	if (p)
 		memcpy(p, src, len);
 	return p;
 }
-EXPORT_SYMBOL(kmemdup);
+EXPORT_SYMBOL(kmemdup_noprof);
 
 /**
  * kvmemdup - duplicate region of memory
@@ -577,7 +577,7 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 EXPORT_SYMBOL(vm_mmap);
 
 /**
- * kvmalloc_node - attempt to allocate physically contiguous memory, but upon
+ * kvmalloc_node_noprof - attempt to allocate physically contiguous memory, but upon
  * failure, fall back to non-contiguous (vmalloc) allocation.
  * @size: size of the request.
  * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
@@ -592,7 +592,7 @@ EXPORT_SYMBOL(vm_mmap);
  *
  * Return: pointer to the allocated memory of %NULL in case of failure
  */
-void *kvmalloc_node(size_t size, gfp_t flags, int node)
+void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
 {
 	gfp_t kmalloc_flags = flags;
 	void *ret;
@@ -614,7 +614,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 		kmalloc_flags &= ~__GFP_NOFAIL;
 	}
 
-	ret = kmalloc_node(size, kmalloc_flags, node);
+	ret = kmalloc_node_noprof(size, kmalloc_flags, node);
 
 	/*
 	 * It doesn't really make sense to fallback to vmalloc for sub page
@@ -643,7 +643,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 			flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
 			node, __builtin_return_address(0));
 }
-EXPORT_SYMBOL(kvmalloc_node);
+EXPORT_SYMBOL(kvmalloc_node_noprof);
 
 /**
  * kvfree() - Free memory.
@@ -682,7 +682,7 @@ void kvfree_sensitive(const void *addr, size_t len)
 }
 EXPORT_SYMBOL(kvfree_sensitive);
 
-void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
+void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
 {
 	void *newp;
 
@@ -695,7 +695,7 @@ void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
 	kvfree(p);
 	return newp;
 }
-EXPORT_SYMBOL(kvrealloc);
+EXPORT_SYMBOL(kvrealloc_noprof);
 
 /**
  * __vmalloc_array - allocate memory for a virtually contiguous array.
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 24/36] rust: Add a rust helper for krealloc()
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (22 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 23/36] mm/slab: enable slab allocation tagging for kmalloc and friends Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-22  9:59   ` Alice Ryhl
  2024-02-21 19:40 ` [PATCH v4 25/36] mempool: Hook up to memory allocation profiling Suren Baghdasaryan
                   ` (12 subsequent siblings)
  36 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups,
	Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, rust-for-linux

From: Kent Overstreet <kent.overstreet@linux.dev>

Memory allocation profiling is turning krealloc() into a nontrivial
macro - so for now, we need a helper for it.

Until we have proper support on the rust side for memory allocation
profiling this does mean that all Rust allocations will be accounted to
the helper.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: rust-for-linux@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 rust/helpers.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/rust/helpers.c b/rust/helpers.c
index 70e59efd92bc..ad62eaf604b3 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -28,6 +28,7 @@
 #include <linux/mutex.h>
 #include <linux/refcount.h>
 #include <linux/sched/signal.h>
+#include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/wait.h>
 #include <linux/workqueue.h>
@@ -157,6 +158,13 @@ void rust_helper_init_work_with_key(struct work_struct *work, work_func_t func,
 }
 EXPORT_SYMBOL_GPL(rust_helper_init_work_with_key);
 
+void * __must_check rust_helper_krealloc(const void *objp, size_t new_size,
+					 gfp_t flags) __realloc_size(2)
+{
+	return krealloc(objp, new_size, flags);
+}
+EXPORT_SYMBOL_GPL(rust_helper_krealloc);
+
 /*
  * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
  * use it in contexts where Rust expects a `usize` like slice (array) indices.
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 25/36] mempool: Hook up to memory allocation profiling
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (23 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 24/36] rust: Add a rust helper for krealloc() Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 26/36] mm: percpu: Introduce pcpuobj_ext Suren Baghdasaryan
                   ` (11 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

From: Kent Overstreet <kent.overstreet@linux.dev>

This adds hooks to mempools for correctly annotating mempool-backed
allocations at the correct source line, so they show up correctly in
/sys/kernel/debug/allocations.

Various inline functions are converted to wrappers so that we can invoke
alloc_hooks() in fewer places.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/mempool.h | 73 ++++++++++++++++++++---------------------
 mm/mempool.c            | 36 ++++++++------------
 2 files changed, 49 insertions(+), 60 deletions(-)

diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index 7be1e32e6d42..69e65ca515ee 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -5,6 +5,8 @@
 #ifndef _LINUX_MEMPOOL_H
 #define _LINUX_MEMPOOL_H
 
+#include <linux/sched.h>
+#include <linux/alloc_tag.h>
 #include <linux/wait.h>
 #include <linux/compiler.h>
 
@@ -39,18 +41,32 @@ void mempool_exit(mempool_t *pool);
 int mempool_init_node(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
 		      mempool_free_t *free_fn, void *pool_data,
 		      gfp_t gfp_mask, int node_id);
-int mempool_init(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
+
+int mempool_init_noprof(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
 		 mempool_free_t *free_fn, void *pool_data);
+#define mempool_init(...)						\
+	alloc_hooks(mempool_init_noprof(__VA_ARGS__))
 
 extern mempool_t *mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
 			mempool_free_t *free_fn, void *pool_data);
-extern mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
+
+extern mempool_t *mempool_create_node_noprof(int min_nr, mempool_alloc_t *alloc_fn,
 			mempool_free_t *free_fn, void *pool_data,
 			gfp_t gfp_mask, int nid);
+#define mempool_create_node(...)					\
+	alloc_hooks(mempool_create_node_noprof(__VA_ARGS__))
+
+#define mempool_create(_min_nr, _alloc_fn, _free_fn, _pool_data)	\
+	mempool_create_node(_min_nr, _alloc_fn, _free_fn, _pool_data,	\
+			    GFP_KERNEL, NUMA_NO_NODE)
 
 extern int mempool_resize(mempool_t *pool, int new_min_nr);
 extern void mempool_destroy(mempool_t *pool);
-extern void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) __malloc;
+
+extern void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask) __malloc;
+#define mempool_alloc(...)						\
+	alloc_hooks(mempool_alloc_noprof(__VA_ARGS__))
+
 extern void *mempool_alloc_preallocated(mempool_t *pool) __malloc;
 extern void mempool_free(void *element, mempool_t *pool);
 
@@ -62,19 +78,10 @@ extern void mempool_free(void *element, mempool_t *pool);
 void *mempool_alloc_slab(gfp_t gfp_mask, void *pool_data);
 void mempool_free_slab(void *element, void *pool_data);
 
-static inline int
-mempool_init_slab_pool(mempool_t *pool, int min_nr, struct kmem_cache *kc)
-{
-	return mempool_init(pool, min_nr, mempool_alloc_slab,
-			    mempool_free_slab, (void *) kc);
-}
-
-static inline mempool_t *
-mempool_create_slab_pool(int min_nr, struct kmem_cache *kc)
-{
-	return mempool_create(min_nr, mempool_alloc_slab, mempool_free_slab,
-			      (void *) kc);
-}
+#define mempool_init_slab_pool(_pool, _min_nr, _kc)			\
+	mempool_init(_pool, (_min_nr), mempool_alloc_slab, mempool_free_slab, (void *)(_kc))
+#define mempool_create_slab_pool(_min_nr, _kc)			\
+	mempool_create((_min_nr), mempool_alloc_slab, mempool_free_slab, (void *)(_kc))
 
 /*
  * a mempool_alloc_t and a mempool_free_t to kmalloc and kfree the
@@ -83,17 +90,12 @@ mempool_create_slab_pool(int min_nr, struct kmem_cache *kc)
 void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data);
 void mempool_kfree(void *element, void *pool_data);
 
-static inline int mempool_init_kmalloc_pool(mempool_t *pool, int min_nr, size_t size)
-{
-	return mempool_init(pool, min_nr, mempool_kmalloc,
-			    mempool_kfree, (void *) size);
-}
-
-static inline mempool_t *mempool_create_kmalloc_pool(int min_nr, size_t size)
-{
-	return mempool_create(min_nr, mempool_kmalloc, mempool_kfree,
-			      (void *) size);
-}
+#define mempool_init_kmalloc_pool(_pool, _min_nr, _size)		\
+	mempool_init(_pool, (_min_nr), mempool_kmalloc, mempool_kfree,	\
+		     (void *)(unsigned long)(_size))
+#define mempool_create_kmalloc_pool(_min_nr, _size)			\
+	mempool_create((_min_nr), mempool_kmalloc, mempool_kfree,	\
+		       (void *)(unsigned long)(_size))
 
 /*
  * A mempool_alloc_t and mempool_free_t for a simple page allocator that
@@ -102,16 +104,11 @@ static inline mempool_t *mempool_create_kmalloc_pool(int min_nr, size_t size)
 void *mempool_alloc_pages(gfp_t gfp_mask, void *pool_data);
 void mempool_free_pages(void *element, void *pool_data);
 
-static inline int mempool_init_page_pool(mempool_t *pool, int min_nr, int order)
-{
-	return mempool_init(pool, min_nr, mempool_alloc_pages,
-			    mempool_free_pages, (void *)(long)order);
-}
-
-static inline mempool_t *mempool_create_page_pool(int min_nr, int order)
-{
-	return mempool_create(min_nr, mempool_alloc_pages, mempool_free_pages,
-			      (void *)(long)order);
-}
+#define mempool_init_page_pool(_pool, _min_nr, _order)			\
+	mempool_init(_pool, (_min_nr), mempool_alloc_pages,		\
+		     mempool_free_pages, (void *)(long)(_order))
+#define mempool_create_page_pool(_min_nr, _order)			\
+	mempool_create((_min_nr), mempool_alloc_pages,			\
+		       mempool_free_pages, (void *)(long)(_order))
 
 #endif /* _LINUX_MEMPOOL_H */
diff --git a/mm/mempool.c b/mm/mempool.c
index dbbf0e9fb424..c47ff883cf36 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -240,17 +240,17 @@ EXPORT_SYMBOL(mempool_init_node);
  *
  * Return: %0 on success, negative error code otherwise.
  */
-int mempool_init(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
-		 mempool_free_t *free_fn, void *pool_data)
+int mempool_init_noprof(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
+			mempool_free_t *free_fn, void *pool_data)
 {
 	return mempool_init_node(pool, min_nr, alloc_fn, free_fn,
 				 pool_data, GFP_KERNEL, NUMA_NO_NODE);
 
 }
-EXPORT_SYMBOL(mempool_init);
+EXPORT_SYMBOL(mempool_init_noprof);
 
 /**
- * mempool_create - create a memory pool
+ * mempool_create_node - create a memory pool
  * @min_nr:    the minimum number of elements guaranteed to be
  *             allocated for this pool.
  * @alloc_fn:  user-defined element-allocation function.
@@ -265,17 +265,9 @@ EXPORT_SYMBOL(mempool_init);
  *
  * Return: pointer to the created memory pool object or %NULL on error.
  */
-mempool_t *mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
-				mempool_free_t *free_fn, void *pool_data)
-{
-	return mempool_create_node(min_nr, alloc_fn, free_fn, pool_data,
-				   GFP_KERNEL, NUMA_NO_NODE);
-}
-EXPORT_SYMBOL(mempool_create);
-
-mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
-			       mempool_free_t *free_fn, void *pool_data,
-			       gfp_t gfp_mask, int node_id)
+mempool_t *mempool_create_node_noprof(int min_nr, mempool_alloc_t *alloc_fn,
+				      mempool_free_t *free_fn, void *pool_data,
+				      gfp_t gfp_mask, int node_id)
 {
 	mempool_t *pool;
 
@@ -291,7 +283,7 @@ mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
 
 	return pool;
 }
-EXPORT_SYMBOL(mempool_create_node);
+EXPORT_SYMBOL(mempool_create_node_noprof);
 
 /**
  * mempool_resize - resize an existing memory pool
@@ -374,7 +366,7 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
 EXPORT_SYMBOL(mempool_resize);
 
 /**
- * mempool_alloc - allocate an element from a specific memory pool
+ * mempool_alloc_noprof - allocate an element from a specific memory pool
  * @pool:      pointer to the memory pool which was allocated via
  *             mempool_create().
  * @gfp_mask:  the usual allocation bitmask.
@@ -387,7 +379,7 @@ EXPORT_SYMBOL(mempool_resize);
  *
  * Return: pointer to the allocated element or %NULL on error.
  */
-void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
+void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
 {
 	void *element;
 	unsigned long flags;
@@ -454,7 +446,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
 	finish_wait(&pool->wait, &wait);
 	goto repeat_alloc;
 }
-EXPORT_SYMBOL(mempool_alloc);
+EXPORT_SYMBOL(mempool_alloc_noprof);
 
 /**
  * mempool_alloc_preallocated - allocate an element from preallocated elements
@@ -562,7 +554,7 @@ void *mempool_alloc_slab(gfp_t gfp_mask, void *pool_data)
 {
 	struct kmem_cache *mem = pool_data;
 	VM_BUG_ON(mem->ctor);
-	return kmem_cache_alloc(mem, gfp_mask);
+	return kmem_cache_alloc_noprof(mem, gfp_mask);
 }
 EXPORT_SYMBOL(mempool_alloc_slab);
 
@@ -580,7 +572,7 @@ EXPORT_SYMBOL(mempool_free_slab);
 void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data)
 {
 	size_t size = (size_t)pool_data;
-	return kmalloc(size, gfp_mask);
+	return kmalloc_noprof(size, gfp_mask);
 }
 EXPORT_SYMBOL(mempool_kmalloc);
 
@@ -597,7 +589,7 @@ EXPORT_SYMBOL(mempool_kfree);
 void *mempool_alloc_pages(gfp_t gfp_mask, void *pool_data)
 {
 	int order = (int)(long)pool_data;
-	return alloc_pages(gfp_mask, order);
+	return alloc_pages_noprof(gfp_mask, order);
 }
 EXPORT_SYMBOL(mempool_alloc_pages);
 
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 26/36] mm: percpu: Introduce pcpuobj_ext
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (24 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 25/36] mempool: Hook up to memory allocation profiling Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 27/36] mm: percpu: Add codetag reference into pcpuobj_ext Suren Baghdasaryan
                   ` (10 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

From: Kent Overstreet <kent.overstreet@linux.dev>

Upcoming alloc tagging patches require a place to stash per-allocation
metadata.

We already do this when memcg is enabled, so this patch generalizes the
obj_cgroup * vector in struct pcpu_chunk by creating a pcpu_obj_ext
type, which we will be adding to in an upcoming patch - similarly to the
previous slabobj_ext patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: linux-mm@kvack.org
---
 mm/percpu-internal.h | 19 +++++++++++++++++--
 mm/percpu.c          | 30 +++++++++++++++---------------
 2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index cdd0aa597a81..e62d582f4bf3 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -32,6 +32,16 @@ struct pcpu_block_md {
 	int			nr_bits;	/* total bits responsible for */
 };
 
+struct pcpuobj_ext {
+#ifdef CONFIG_MEMCG_KMEM
+	struct obj_cgroup	*cgroup;
+#endif
+};
+
+#ifdef CONFIG_MEMCG_KMEM
+#define NEED_PCPUOBJ_EXT
+#endif
+
 struct pcpu_chunk {
 #ifdef CONFIG_PERCPU_STATS
 	int			nr_alloc;	/* # of allocations */
@@ -64,8 +74,8 @@ struct pcpu_chunk {
 	int			end_offset;	/* additional area required to
 						   have the region end page
 						   aligned */
-#ifdef CONFIG_MEMCG_KMEM
-	struct obj_cgroup	**obj_cgroups;	/* vector of object cgroups */
+#ifdef NEED_PCPUOBJ_EXT
+	struct pcpuobj_ext	*obj_exts;	/* vector of object cgroups */
 #endif
 
 	int			nr_pages;	/* # of pages served by this chunk */
@@ -74,6 +84,11 @@ struct pcpu_chunk {
 	unsigned long		populated[];	/* populated bitmap */
 };
 
+static inline bool need_pcpuobj_ext(void)
+{
+	return !mem_cgroup_kmem_disabled();
+}
+
 extern spinlock_t pcpu_lock;
 
 extern struct list_head *pcpu_chunk_lists;
diff --git a/mm/percpu.c b/mm/percpu.c
index 4e11fc1e6def..2e5edaad9cc3 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1392,9 +1392,9 @@ static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr,
 		panic("%s: Failed to allocate %zu bytes\n", __func__,
 		      alloc_size);
 
-#ifdef CONFIG_MEMCG_KMEM
+#ifdef NEED_PCPUOBJ_EXT
 	/* first chunk is free to use */
-	chunk->obj_cgroups = NULL;
+	chunk->obj_exts = NULL;
 #endif
 	pcpu_init_md_blocks(chunk);
 
@@ -1463,12 +1463,12 @@ static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp)
 	if (!chunk->md_blocks)
 		goto md_blocks_fail;
 
-#ifdef CONFIG_MEMCG_KMEM
-	if (!mem_cgroup_kmem_disabled()) {
-		chunk->obj_cgroups =
+#ifdef NEED_PCPUOBJ_EXT
+	if (need_pcpuobj_ext()) {
+		chunk->obj_exts =
 			pcpu_mem_zalloc(pcpu_chunk_map_bits(chunk) *
-					sizeof(struct obj_cgroup *), gfp);
-		if (!chunk->obj_cgroups)
+					sizeof(struct pcpuobj_ext), gfp);
+		if (!chunk->obj_exts)
 			goto objcg_fail;
 	}
 #endif
@@ -1480,7 +1480,7 @@ static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp)
 
 	return chunk;
 
-#ifdef CONFIG_MEMCG_KMEM
+#ifdef NEED_PCPUOBJ_EXT
 objcg_fail:
 	pcpu_mem_free(chunk->md_blocks);
 #endif
@@ -1498,8 +1498,8 @@ static void pcpu_free_chunk(struct pcpu_chunk *chunk)
 {
 	if (!chunk)
 		return;
-#ifdef CONFIG_MEMCG_KMEM
-	pcpu_mem_free(chunk->obj_cgroups);
+#ifdef NEED_PCPUOBJ_EXT
+	pcpu_mem_free(chunk->obj_exts);
 #endif
 	pcpu_mem_free(chunk->md_blocks);
 	pcpu_mem_free(chunk->bound_map);
@@ -1646,9 +1646,9 @@ static void pcpu_memcg_post_alloc_hook(struct obj_cgroup *objcg,
 	if (!objcg)
 		return;
 
-	if (likely(chunk && chunk->obj_cgroups)) {
+	if (likely(chunk && chunk->obj_exts)) {
 		obj_cgroup_get(objcg);
-		chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = objcg;
+		chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].cgroup = objcg;
 
 		rcu_read_lock();
 		mod_memcg_state(obj_cgroup_memcg(objcg), MEMCG_PERCPU_B,
@@ -1663,13 +1663,13 @@ static void pcpu_memcg_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
 {
 	struct obj_cgroup *objcg;
 
-	if (unlikely(!chunk->obj_cgroups))
+	if (unlikely(!chunk->obj_exts))
 		return;
 
-	objcg = chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT];
+	objcg = chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].cgroup;
 	if (!objcg)
 		return;
-	chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = NULL;
+	chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].cgroup = NULL;
 
 	obj_cgroup_uncharge(objcg, pcpu_obj_full_size(size));
 
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 27/36] mm: percpu: Add codetag reference into pcpuobj_ext
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (25 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 26/36] mm: percpu: Introduce pcpuobj_ext Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 28/36] mm: percpu: enable per-cpu allocation tagging Suren Baghdasaryan
                   ` (9 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

From: Kent Overstreet <kent.overstreet@linux.dev>

To store codetag for every per-cpu allocation, a codetag reference is
embedded into pcpuobj_ext when CONFIG_MEM_ALLOC_PROFILING=y. Hooks to
use the newly introduced codetag are added.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/percpu-internal.h | 11 +++++++++--
 mm/percpu.c          | 26 ++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index e62d582f4bf3..7e42f0ca3b7b 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -36,9 +36,12 @@ struct pcpuobj_ext {
 #ifdef CONFIG_MEMCG_KMEM
 	struct obj_cgroup	*cgroup;
 #endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+	union codetag_ref	tag;
+#endif
 };
 
-#ifdef CONFIG_MEMCG_KMEM
+#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEM_ALLOC_PROFILING)
 #define NEED_PCPUOBJ_EXT
 #endif
 
@@ -86,7 +89,11 @@ struct pcpu_chunk {
 
 static inline bool need_pcpuobj_ext(void)
 {
-	return !mem_cgroup_kmem_disabled();
+	if (IS_ENABLED(CONFIG_MEM_ALLOC_PROFILING))
+		return true;
+	if (!mem_cgroup_kmem_disabled())
+		return true;
+	return false;
 }
 
 extern spinlock_t pcpu_lock;
diff --git a/mm/percpu.c b/mm/percpu.c
index 2e5edaad9cc3..578531ea1f43 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1699,6 +1699,32 @@ static void pcpu_memcg_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
 }
 #endif /* CONFIG_MEMCG_KMEM */
 
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+static void pcpu_alloc_tag_alloc_hook(struct pcpu_chunk *chunk, int off,
+				      size_t size)
+{
+	if (mem_alloc_profiling_enabled() && likely(chunk->obj_exts)) {
+		alloc_tag_add(&chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].tag,
+			      current->alloc_tag, size);
+	}
+}
+
+static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
+{
+	if (mem_alloc_profiling_enabled() && likely(chunk->obj_exts))
+		alloc_tag_sub_noalloc(&chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].tag, size);
+}
+#else
+static void pcpu_alloc_tag_alloc_hook(struct pcpu_chunk *chunk, int off,
+				      size_t size)
+{
+}
+
+static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
+{
+}
+#endif
+
 /**
  * pcpu_alloc - the percpu allocator
  * @size: size of area to allocate in bytes
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 28/36] mm: percpu: enable per-cpu allocation tagging
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (26 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 27/36] mm: percpu: Add codetag reference into pcpuobj_ext Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 29/36] mm: vmalloc: Enable memory allocation profiling Suren Baghdasaryan
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Redefine __alloc_percpu, __alloc_percpu_gfp and __alloc_reserved_percpu
to record allocations and deallocations done by these functions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/percpu.h | 23 ++++++++++-----
 mm/percpu.c            | 64 +++++-------------------------------------
 2 files changed, 23 insertions(+), 64 deletions(-)

diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 62b5eb45bd89..e54921c79c9a 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_PERCPU_H
 #define __LINUX_PERCPU_H
 
+#include <linux/alloc_tag.h>
 #include <linux/mmdebug.h>
 #include <linux/preempt.h>
 #include <linux/smp.h>
@@ -9,6 +10,7 @@
 #include <linux/pfn.h>
 #include <linux/init.h>
 #include <linux/cleanup.h>
+#include <linux/sched.h>
 
 #include <asm/percpu.h>
 
@@ -125,7 +127,6 @@ extern int __init pcpu_page_first_chunk(size_t reserved_size,
 				pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
 #endif
 
-extern void __percpu *__alloc_reserved_percpu(size_t size, size_t align) __alloc_size(1);
 extern bool __is_kernel_percpu_address(unsigned long addr, unsigned long *can_addr);
 extern bool is_kernel_percpu_address(unsigned long addr);
 
@@ -133,14 +134,16 @@ extern bool is_kernel_percpu_address(unsigned long addr);
 extern void __init setup_per_cpu_areas(void);
 #endif
 
-extern void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp) __alloc_size(1);
-extern void __percpu *__alloc_percpu(size_t size, size_t align) __alloc_size(1);
-extern void free_percpu(void __percpu *__pdata);
+extern void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
+				   gfp_t gfp) __alloc_size(1);
 extern size_t pcpu_alloc_size(void __percpu *__pdata);
 
-DEFINE_FREE(free_percpu, void __percpu *, free_percpu(_T))
-
-extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
+#define __alloc_percpu_gfp(_size, _align, _gfp)				\
+	alloc_hooks(pcpu_alloc_noprof(_size, _align, false, _gfp))
+#define __alloc_percpu(_size, _align)					\
+	alloc_hooks(pcpu_alloc_noprof(_size, _align, false, GFP_KERNEL))
+#define __alloc_reserved_percpu(_size, _align)				\
+	alloc_hooks(pcpu_alloc_noprof(_size, _align, true, GFP_KERNEL))
 
 #define alloc_percpu_gfp(type, gfp)					\
 	(typeof(type) __percpu *)__alloc_percpu_gfp(sizeof(type),	\
@@ -149,6 +152,12 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
 	(typeof(type) __percpu *)__alloc_percpu(sizeof(type),		\
 						__alignof__(type))
 
+extern void free_percpu(void __percpu *__pdata);
+
+DEFINE_FREE(free_percpu, void __percpu *, free_percpu(_T))
+
+extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
+
 extern unsigned long pcpu_nr_pages(void);
 
 #endif /* __LINUX_PERCPU_H */
diff --git a/mm/percpu.c b/mm/percpu.c
index 578531ea1f43..2badcc5e0e71 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1726,7 +1726,7 @@ static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t s
 #endif
 
 /**
- * pcpu_alloc - the percpu allocator
+ * pcpu_alloc_noprof - the percpu allocator
  * @size: size of area to allocate in bytes
  * @align: alignment of area (max PAGE_SIZE)
  * @reserved: allocate from the reserved chunk if available
@@ -1740,7 +1740,7 @@ static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t s
  * RETURNS:
  * Percpu pointer to the allocated area on success, NULL on failure.
  */
-static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
+void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
 				 gfp_t gfp)
 {
 	gfp_t pcpu_gfp;
@@ -1907,6 +1907,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 
 	pcpu_memcg_post_alloc_hook(objcg, chunk, off, size);
 
+	pcpu_alloc_tag_alloc_hook(chunk, off, size);
+
 	return ptr;
 
 fail_unlock:
@@ -1935,61 +1937,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 
 	return NULL;
 }
-
-/**
- * __alloc_percpu_gfp - allocate dynamic percpu area
- * @size: size of area to allocate in bytes
- * @align: alignment of area (max PAGE_SIZE)
- * @gfp: allocation flags
- *
- * Allocate zero-filled percpu area of @size bytes aligned at @align.  If
- * @gfp doesn't contain %GFP_KERNEL, the allocation doesn't block and can
- * be called from any context but is a lot more likely to fail. If @gfp
- * has __GFP_NOWARN then no warning will be triggered on invalid or failed
- * allocation requests.
- *
- * RETURNS:
- * Percpu pointer to the allocated area on success, NULL on failure.
- */
-void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp)
-{
-	return pcpu_alloc(size, align, false, gfp);
-}
-EXPORT_SYMBOL_GPL(__alloc_percpu_gfp);
-
-/**
- * __alloc_percpu - allocate dynamic percpu area
- * @size: size of area to allocate in bytes
- * @align: alignment of area (max PAGE_SIZE)
- *
- * Equivalent to __alloc_percpu_gfp(size, align, %GFP_KERNEL).
- */
-void __percpu *__alloc_percpu(size_t size, size_t align)
-{
-	return pcpu_alloc(size, align, false, GFP_KERNEL);
-}
-EXPORT_SYMBOL_GPL(__alloc_percpu);
-
-/**
- * __alloc_reserved_percpu - allocate reserved percpu area
- * @size: size of area to allocate in bytes
- * @align: alignment of area (max PAGE_SIZE)
- *
- * Allocate zero-filled percpu area of @size bytes aligned at @align
- * from reserved percpu area if arch has set it up; otherwise,
- * allocation is served from the same dynamic area.  Might sleep.
- * Might trigger writeouts.
- *
- * CONTEXT:
- * Does GFP_KERNEL allocation.
- *
- * RETURNS:
- * Percpu pointer to the allocated area on success, NULL on failure.
- */
-void __percpu *__alloc_reserved_percpu(size_t size, size_t align)
-{
-	return pcpu_alloc(size, align, true, GFP_KERNEL);
-}
+EXPORT_SYMBOL_GPL(pcpu_alloc_noprof);
 
 /**
  * pcpu_balance_free - manage the amount of free chunks
@@ -2328,6 +2276,8 @@ void free_percpu(void __percpu *ptr)
 	spin_lock_irqsave(&pcpu_lock, flags);
 	size = pcpu_free_area(chunk, off);
 
+	pcpu_alloc_tag_free_hook(chunk, off, size);
+
 	pcpu_memcg_free_hook(chunk, off, size);
 
 	/*
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 29/36] mm: vmalloc: Enable memory allocation profiling
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (27 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 28/36] mm: percpu: enable per-cpu allocation tagging Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 30/36] rhashtable: Plumb through alloc tag Suren Baghdasaryan
                   ` (7 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

From: Kent Overstreet <kent.overstreet@linux.dev>

This wrapps all external vmalloc allocation functions with the
alloc_hooks() wrapper, and switches internal allocations to _noprof
variants where appropriate, for the new memory allocation profiling
feature.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 drivers/staging/media/atomisp/pci/hmm/hmm.c |  2 +-
 include/linux/vmalloc.h                     | 60 ++++++++++----
 kernel/kallsyms_selftest.c                  |  2 +-
 mm/nommu.c                                  | 64 +++++++--------
 mm/util.c                                   | 24 +++---
 mm/vmalloc.c                                | 88 ++++++++++-----------
 6 files changed, 135 insertions(+), 105 deletions(-)

diff --git a/drivers/staging/media/atomisp/pci/hmm/hmm.c b/drivers/staging/media/atomisp/pci/hmm/hmm.c
index bb12644fd033..3e2899ad8517 100644
--- a/drivers/staging/media/atomisp/pci/hmm/hmm.c
+++ b/drivers/staging/media/atomisp/pci/hmm/hmm.c
@@ -205,7 +205,7 @@ static ia_css_ptr __hmm_alloc(size_t bytes, enum hmm_bo_type type,
 	}
 
 	dev_dbg(atomisp_dev, "pages: 0x%08x (%zu bytes), type: %d, vmalloc %p\n",
-		bo->start, bytes, type, vmalloc);
+		bo->start, bytes, type, vmalloc_noprof);
 
 	return bo->start;
 
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c720be70c8dd..106d78e75606 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -2,6 +2,8 @@
 #ifndef _LINUX_VMALLOC_H
 #define _LINUX_VMALLOC_H
 
+#include <linux/alloc_tag.h>
+#include <linux/sched.h>
 #include <linux/spinlock.h>
 #include <linux/init.h>
 #include <linux/list.h>
@@ -137,26 +139,54 @@ extern unsigned long vmalloc_nr_pages(void);
 static inline unsigned long vmalloc_nr_pages(void) { return 0; }
 #endif
 
-extern void *vmalloc(unsigned long size) __alloc_size(1);
-extern void *vzalloc(unsigned long size) __alloc_size(1);
-extern void *vmalloc_user(unsigned long size) __alloc_size(1);
-extern void *vmalloc_node(unsigned long size, int node) __alloc_size(1);
-extern void *vzalloc_node(unsigned long size, int node) __alloc_size(1);
-extern void *vmalloc_32(unsigned long size) __alloc_size(1);
-extern void *vmalloc_32_user(unsigned long size) __alloc_size(1);
-extern void *__vmalloc(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
-extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
+extern void *vmalloc_noprof(unsigned long size) __alloc_size(1);
+#define vmalloc(...)		alloc_hooks(vmalloc_noprof(__VA_ARGS__))
+
+extern void *vzalloc_noprof(unsigned long size) __alloc_size(1);
+#define vzalloc(...)		alloc_hooks(vzalloc_noprof(__VA_ARGS__))
+
+extern void *vmalloc_user_noprof(unsigned long size) __alloc_size(1);
+#define vmalloc_user(...)	alloc_hooks(vmalloc_user_noprof(__VA_ARGS__))
+
+extern void *vmalloc_node_noprof(unsigned long size, int node) __alloc_size(1);
+#define vmalloc_node(...)	alloc_hooks(vmalloc_node_noprof(__VA_ARGS__))
+
+extern void *vzalloc_node_noprof(unsigned long size, int node) __alloc_size(1);
+#define vzalloc_node(...)	alloc_hooks(vzalloc_node_noprof(__VA_ARGS__))
+
+extern void *vmalloc_32_noprof(unsigned long size) __alloc_size(1);
+#define vmalloc_32(...)		alloc_hooks(vmalloc_32_noprof(__VA_ARGS__))
+
+extern void *vmalloc_32_user_noprof(unsigned long size) __alloc_size(1);
+#define vmalloc_32_user(...)	alloc_hooks(vmalloc_32_user_noprof(__VA_ARGS__))
+
+extern void *__vmalloc_noprof(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
+#define __vmalloc(...)		alloc_hooks(__vmalloc_noprof(__VA_ARGS__))
+
+extern void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
 			unsigned long start, unsigned long end, gfp_t gfp_mask,
 			pgprot_t prot, unsigned long vm_flags, int node,
 			const void *caller) __alloc_size(1);
-void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
+#define __vmalloc_node_range(...)	alloc_hooks(__vmalloc_node_range_noprof(__VA_ARGS__))
+
+void *__vmalloc_node_noprof(unsigned long size, unsigned long align, gfp_t gfp_mask,
 		int node, const void *caller) __alloc_size(1);
-void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
+#define __vmalloc_node(...)	alloc_hooks(__vmalloc_node_noprof(__VA_ARGS__))
+
+void *vmalloc_huge_noprof(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
+#define vmalloc_huge(...)	alloc_hooks(vmalloc_huge_noprof(__VA_ARGS__))
+
+extern void *__vmalloc_array_noprof(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
+#define __vmalloc_array(...)	alloc_hooks(__vmalloc_array_noprof(__VA_ARGS__))
+
+extern void *vmalloc_array_noprof(size_t n, size_t size) __alloc_size(1, 2);
+#define vmalloc_array(...)	alloc_hooks(vmalloc_array_noprof(__VA_ARGS__))
+
+extern void *__vcalloc_noprof(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
+#define __vcalloc(...)		alloc_hooks(__vcalloc_noprof(__VA_ARGS__))
 
-extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
-extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
-extern void *__vcalloc(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
-extern void *vcalloc(size_t n, size_t size) __alloc_size(1, 2);
+extern void *vcalloc_noprof(size_t n, size_t size) __alloc_size(1, 2);
+#define vcalloc(...)		alloc_hooks(vcalloc_noprof(__VA_ARGS__))
 
 extern void vfree(const void *addr);
 extern void vfree_atomic(const void *addr);
diff --git a/kernel/kallsyms_selftest.c b/kernel/kallsyms_selftest.c
index b4cac76ea5e9..3ea9be364e32 100644
--- a/kernel/kallsyms_selftest.c
+++ b/kernel/kallsyms_selftest.c
@@ -82,7 +82,7 @@ static struct test_item test_items[] = {
 	ITEM_FUNC(kallsyms_test_func_static),
 	ITEM_FUNC(kallsyms_test_func),
 	ITEM_FUNC(kallsyms_test_func_weak),
-	ITEM_FUNC(vmalloc),
+	ITEM_FUNC(vmalloc_noprof),
 	ITEM_FUNC(vfree),
 #ifdef CONFIG_KALLSYMS_ALL
 	ITEM_DATA(kallsyms_test_var_bss_static),
diff --git a/mm/nommu.c b/mm/nommu.c
index b6dc558d3144..face0938e9e3 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -139,28 +139,28 @@ void vfree(const void *addr)
 }
 EXPORT_SYMBOL(vfree);
 
-void *__vmalloc(unsigned long size, gfp_t gfp_mask)
+void *__vmalloc_noprof(unsigned long size, gfp_t gfp_mask)
 {
 	/*
 	 *  You can't specify __GFP_HIGHMEM with kmalloc() since kmalloc()
 	 * returns only a logical address.
 	 */
-	return kmalloc(size, (gfp_mask | __GFP_COMP) & ~__GFP_HIGHMEM);
+	return kmalloc_noprof(size, (gfp_mask | __GFP_COMP) & ~__GFP_HIGHMEM);
 }
-EXPORT_SYMBOL(__vmalloc);
+EXPORT_SYMBOL(__vmalloc_noprof);
 
-void *__vmalloc_node_range(unsigned long size, unsigned long align,
+void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
 		unsigned long start, unsigned long end, gfp_t gfp_mask,
 		pgprot_t prot, unsigned long vm_flags, int node,
 		const void *caller)
 {
-	return __vmalloc(size, gfp_mask);
+	return __vmalloc_noprof(size, gfp_mask);
 }
 
-void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
+void *__vmalloc_node_noprof(unsigned long size, unsigned long align, gfp_t gfp_mask,
 		int node, const void *caller)
 {
-	return __vmalloc(size, gfp_mask);
+	return __vmalloc_noprof(size, gfp_mask);
 }
 
 static void *__vmalloc_user_flags(unsigned long size, gfp_t flags)
@@ -181,11 +181,11 @@ static void *__vmalloc_user_flags(unsigned long size, gfp_t flags)
 	return ret;
 }
 
-void *vmalloc_user(unsigned long size)
+void *vmalloc_user_noprof(unsigned long size)
 {
 	return __vmalloc_user_flags(size, GFP_KERNEL | __GFP_ZERO);
 }
-EXPORT_SYMBOL(vmalloc_user);
+EXPORT_SYMBOL(vmalloc_user_noprof);
 
 struct page *vmalloc_to_page(const void *addr)
 {
@@ -219,13 +219,13 @@ long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
  *	For tight control over page level allocator and protection flags
  *	use __vmalloc() instead.
  */
-void *vmalloc(unsigned long size)
+void *vmalloc_noprof(unsigned long size)
 {
-	return __vmalloc(size, GFP_KERNEL);
+	return __vmalloc_noprof(size, GFP_KERNEL);
 }
-EXPORT_SYMBOL(vmalloc);
+EXPORT_SYMBOL(vmalloc_noprof);
 
-void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __weak __alias(__vmalloc);
+void *vmalloc_huge_noprof(unsigned long size, gfp_t gfp_mask) __weak __alias(__vmalloc_noprof);
 
 /*
  *	vzalloc - allocate virtually contiguous memory with zero fill
@@ -239,14 +239,14 @@ void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __weak __alias(__vmalloc)
  *	For tight control over page level allocator and protection flags
  *	use __vmalloc() instead.
  */
-void *vzalloc(unsigned long size)
+void *vzalloc_noprof(unsigned long size)
 {
-	return __vmalloc(size, GFP_KERNEL | __GFP_ZERO);
+	return __vmalloc_noprof(size, GFP_KERNEL | __GFP_ZERO);
 }
-EXPORT_SYMBOL(vzalloc);
+EXPORT_SYMBOL(vzalloc_noprof);
 
 /**
- * vmalloc_node - allocate memory on a specific node
+ * vmalloc_node_noprof - allocate memory on a specific node
  * @size:	allocation size
  * @node:	numa node
  *
@@ -256,14 +256,14 @@ EXPORT_SYMBOL(vzalloc);
  * For tight control over page level allocator and protection flags
  * use __vmalloc() instead.
  */
-void *vmalloc_node(unsigned long size, int node)
+void *vmalloc_node_noprof(unsigned long size, int node)
 {
-	return vmalloc(size);
+	return vmalloc_noprof(size);
 }
-EXPORT_SYMBOL(vmalloc_node);
+EXPORT_SYMBOL(vmalloc_node_noprof);
 
 /**
- * vzalloc_node - allocate memory on a specific node with zero fill
+ * vzalloc_node_noprof - allocate memory on a specific node with zero fill
  * @size:	allocation size
  * @node:	numa node
  *
@@ -274,27 +274,27 @@ EXPORT_SYMBOL(vmalloc_node);
  * For tight control over page level allocator and protection flags
  * use __vmalloc() instead.
  */
-void *vzalloc_node(unsigned long size, int node)
+void *vzalloc_node_noprof(unsigned long size, int node)
 {
-	return vzalloc(size);
+	return vzalloc_noprof(size);
 }
-EXPORT_SYMBOL(vzalloc_node);
+EXPORT_SYMBOL(vzalloc_node_noprof);
 
 /**
- * vmalloc_32  -  allocate virtually contiguous memory (32bit addressable)
+ * vmalloc_32_noprof  -  allocate virtually contiguous memory (32bit addressable)
  *	@size:		allocation size
  *
  *	Allocate enough 32bit PA addressable pages to cover @size from the
  *	page level allocator and map them into contiguous kernel virtual space.
  */
-void *vmalloc_32(unsigned long size)
+void *vmalloc_32_noprof(unsigned long size)
 {
-	return __vmalloc(size, GFP_KERNEL);
+	return __vmalloc_noprof(size, GFP_KERNEL);
 }
-EXPORT_SYMBOL(vmalloc_32);
+EXPORT_SYMBOL(vmalloc_32_noprof);
 
 /**
- * vmalloc_32_user - allocate zeroed virtually contiguous 32bit memory
+ * vmalloc_32_user_noprof - allocate zeroed virtually contiguous 32bit memory
  *	@size:		allocation size
  *
  * The resulting memory area is 32bit addressable and zeroed so it can be
@@ -303,15 +303,15 @@ EXPORT_SYMBOL(vmalloc_32);
  * VM_USERMAP is set on the corresponding VMA so that subsequent calls to
  * remap_vmalloc_range() are permissible.
  */
-void *vmalloc_32_user(unsigned long size)
+void *vmalloc_32_user_noprof(unsigned long size)
 {
 	/*
 	 * We'll have to sort out the ZONE_DMA bits for 64-bit,
 	 * but for now this can simply use vmalloc_user() directly.
 	 */
-	return vmalloc_user(size);
+	return vmalloc_user_noprof(size);
 }
-EXPORT_SYMBOL(vmalloc_32_user);
+EXPORT_SYMBOL(vmalloc_32_user_noprof);
 
 void *vmap(struct page **pages, unsigned int count, unsigned long flags, pgprot_t prot)
 {
diff --git a/mm/util.c b/mm/util.c
index 291f7945190f..19c90036d3cc 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -639,7 +639,7 @@ void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
 	 * about the resulting pointer, and cannot play
 	 * protection games.
 	 */
-	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+	return __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
 			flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
 			node, __builtin_return_address(0));
 }
@@ -698,12 +698,12 @@ void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flag
 EXPORT_SYMBOL(kvrealloc_noprof);
 
 /**
- * __vmalloc_array - allocate memory for a virtually contiguous array.
+ * __vmalloc_array_noprof - allocate memory for a virtually contiguous array.
  * @n: number of elements.
  * @size: element size.
  * @flags: the type of memory to allocate (see kmalloc).
  */
-void *__vmalloc_array(size_t n, size_t size, gfp_t flags)
+void *__vmalloc_array_noprof(size_t n, size_t size, gfp_t flags)
 {
 	size_t bytes;
 
@@ -711,18 +711,18 @@ void *__vmalloc_array(size_t n, size_t size, gfp_t flags)
 		return NULL;
 	return __vmalloc(bytes, flags);
 }
-EXPORT_SYMBOL(__vmalloc_array);
+EXPORT_SYMBOL(__vmalloc_array_noprof);
 
 /**
- * vmalloc_array - allocate memory for a virtually contiguous array.
+ * vmalloc_array_noprof - allocate memory for a virtually contiguous array.
  * @n: number of elements.
  * @size: element size.
  */
-void *vmalloc_array(size_t n, size_t size)
+void *vmalloc_array_noprof(size_t n, size_t size)
 {
 	return __vmalloc_array(n, size, GFP_KERNEL);
 }
-EXPORT_SYMBOL(vmalloc_array);
+EXPORT_SYMBOL(vmalloc_array_noprof);
 
 /**
  * __vcalloc - allocate and zero memory for a virtually contiguous array.
@@ -730,22 +730,22 @@ EXPORT_SYMBOL(vmalloc_array);
  * @size: element size.
  * @flags: the type of memory to allocate (see kmalloc).
  */
-void *__vcalloc(size_t n, size_t size, gfp_t flags)
+void *__vcalloc_noprof(size_t n, size_t size, gfp_t flags)
 {
 	return __vmalloc_array(n, size, flags | __GFP_ZERO);
 }
-EXPORT_SYMBOL(__vcalloc);
+EXPORT_SYMBOL(__vcalloc_noprof);
 
 /**
- * vcalloc - allocate and zero memory for a virtually contiguous array.
+ * vcalloc_noprof - allocate and zero memory for a virtually contiguous array.
  * @n: number of elements.
  * @size: element size.
  */
-void *vcalloc(size_t n, size_t size)
+void *vcalloc_noprof(size_t n, size_t size)
 {
 	return __vmalloc_array(n, size, GFP_KERNEL | __GFP_ZERO);
 }
-EXPORT_SYMBOL(vcalloc);
+EXPORT_SYMBOL(vcalloc_noprof);
 
 struct anon_vma *folio_anon_vma(struct folio *folio)
 {
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d12a17fc0c17..5239f2c9ecae 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3025,12 +3025,12 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 			 * but mempolicy wants to alloc memory by interleaving.
 			 */
 			if (IS_ENABLED(CONFIG_NUMA) && nid == NUMA_NO_NODE)
-				nr = alloc_pages_bulk_array_mempolicy(bulk_gfp,
+				nr = alloc_pages_bulk_array_mempolicy_noprof(bulk_gfp,
 							nr_pages_request,
 							pages + nr_allocated);
 
 			else
-				nr = alloc_pages_bulk_array_node(bulk_gfp, nid,
+				nr = alloc_pages_bulk_array_node_noprof(bulk_gfp, nid,
 							nr_pages_request,
 							pages + nr_allocated);
 
@@ -3060,9 +3060,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 			break;
 
 		if (nid == NUMA_NO_NODE)
-			page = alloc_pages(alloc_gfp, order);
+			page = alloc_pages_noprof(alloc_gfp, order);
 		else
-			page = alloc_pages_node(nid, alloc_gfp, order);
+			page = alloc_pages_node_noprof(nid, alloc_gfp, order);
 		if (unlikely(!page)) {
 			if (!nofail)
 				break;
@@ -3119,10 +3119,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 
 	/* Please note that the recursion is strictly bounded. */
 	if (array_size > PAGE_SIZE) {
-		area->pages = __vmalloc_node(array_size, 1, nested_gfp, node,
+		area->pages = __vmalloc_node_noprof(array_size, 1, nested_gfp, node,
 					area->caller);
 	} else {
-		area->pages = kmalloc_node(array_size, nested_gfp, node);
+		area->pages = kmalloc_node_noprof(array_size, nested_gfp, node);
 	}
 
 	if (!area->pages) {
@@ -3205,7 +3205,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 }
 
 /**
- * __vmalloc_node_range - allocate virtually contiguous memory
+ * __vmalloc_node_range_noprof - allocate virtually contiguous memory
  * @size:		  allocation size
  * @align:		  desired alignment
  * @start:		  vm area range start
@@ -3232,7 +3232,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
  *
  * Return: the address of the area or %NULL on failure
  */
-void *__vmalloc_node_range(unsigned long size, unsigned long align,
+void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
 			unsigned long start, unsigned long end, gfp_t gfp_mask,
 			pgprot_t prot, unsigned long vm_flags, int node,
 			const void *caller)
@@ -3361,7 +3361,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 }
 
 /**
- * __vmalloc_node - allocate virtually contiguous memory
+ * __vmalloc_node_noprof - allocate virtually contiguous memory
  * @size:	    allocation size
  * @align:	    desired alignment
  * @gfp_mask:	    flags for the page level allocator
@@ -3379,10 +3379,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *__vmalloc_node(unsigned long size, unsigned long align,
+void *__vmalloc_node_noprof(unsigned long size, unsigned long align,
 			    gfp_t gfp_mask, int node, const void *caller)
 {
-	return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
+	return __vmalloc_node_range_noprof(size, align, VMALLOC_START, VMALLOC_END,
 				gfp_mask, PAGE_KERNEL, 0, node, caller);
 }
 /*
@@ -3391,15 +3391,15 @@ void *__vmalloc_node(unsigned long size, unsigned long align,
  * than that.
  */
 #ifdef CONFIG_TEST_VMALLOC_MODULE
-EXPORT_SYMBOL_GPL(__vmalloc_node);
+EXPORT_SYMBOL_GPL(__vmalloc_node_noprof);
 #endif
 
-void *__vmalloc(unsigned long size, gfp_t gfp_mask)
+void *__vmalloc_noprof(unsigned long size, gfp_t gfp_mask)
 {
-	return __vmalloc_node(size, 1, gfp_mask, NUMA_NO_NODE,
+	return __vmalloc_node_noprof(size, 1, gfp_mask, NUMA_NO_NODE,
 				__builtin_return_address(0));
 }
-EXPORT_SYMBOL(__vmalloc);
+EXPORT_SYMBOL(__vmalloc_noprof);
 
 /**
  * vmalloc - allocate virtually contiguous memory
@@ -3413,12 +3413,12 @@ EXPORT_SYMBOL(__vmalloc);
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *vmalloc(unsigned long size)
+void *vmalloc_noprof(unsigned long size)
 {
-	return __vmalloc_node(size, 1, GFP_KERNEL, NUMA_NO_NODE,
+	return __vmalloc_node_noprof(size, 1, GFP_KERNEL, NUMA_NO_NODE,
 				__builtin_return_address(0));
 }
-EXPORT_SYMBOL(vmalloc);
+EXPORT_SYMBOL(vmalloc_noprof);
 
 /**
  * vmalloc_huge - allocate virtually contiguous memory, allow huge pages
@@ -3432,16 +3432,16 @@ EXPORT_SYMBOL(vmalloc);
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *vmalloc_huge(unsigned long size, gfp_t gfp_mask)
+void *vmalloc_huge_noprof(unsigned long size, gfp_t gfp_mask)
 {
-	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+	return __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
 				    gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
 				    NUMA_NO_NODE, __builtin_return_address(0));
 }
-EXPORT_SYMBOL_GPL(vmalloc_huge);
+EXPORT_SYMBOL_GPL(vmalloc_huge_noprof);
 
 /**
- * vzalloc - allocate virtually contiguous memory with zero fill
+ * vzalloc_noprof - allocate virtually contiguous memory with zero fill
  * @size:    allocation size
  *
  * Allocate enough pages to cover @size from the page level
@@ -3453,12 +3453,12 @@ EXPORT_SYMBOL_GPL(vmalloc_huge);
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *vzalloc(unsigned long size)
+void *vzalloc_noprof(unsigned long size)
 {
-	return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE,
+	return __vmalloc_node_noprof(size, 1, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE,
 				__builtin_return_address(0));
 }
-EXPORT_SYMBOL(vzalloc);
+EXPORT_SYMBOL(vzalloc_noprof);
 
 /**
  * vmalloc_user - allocate zeroed virtually contiguous memory for userspace
@@ -3469,17 +3469,17 @@ EXPORT_SYMBOL(vzalloc);
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *vmalloc_user(unsigned long size)
+void *vmalloc_user_noprof(unsigned long size)
 {
-	return __vmalloc_node_range(size, SHMLBA,  VMALLOC_START, VMALLOC_END,
+	return __vmalloc_node_range_noprof(size, SHMLBA,  VMALLOC_START, VMALLOC_END,
 				    GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL,
 				    VM_USERMAP, NUMA_NO_NODE,
 				    __builtin_return_address(0));
 }
-EXPORT_SYMBOL(vmalloc_user);
+EXPORT_SYMBOL(vmalloc_user_noprof);
 
 /**
- * vmalloc_node - allocate memory on a specific node
+ * vmalloc_node_noprof - allocate memory on a specific node
  * @size:	  allocation size
  * @node:	  numa node
  *
@@ -3491,15 +3491,15 @@ EXPORT_SYMBOL(vmalloc_user);
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *vmalloc_node(unsigned long size, int node)
+void *vmalloc_node_noprof(unsigned long size, int node)
 {
-	return __vmalloc_node(size, 1, GFP_KERNEL, node,
+	return __vmalloc_node_noprof(size, 1, GFP_KERNEL, node,
 			__builtin_return_address(0));
 }
-EXPORT_SYMBOL(vmalloc_node);
+EXPORT_SYMBOL(vmalloc_node_noprof);
 
 /**
- * vzalloc_node - allocate memory on a specific node with zero fill
+ * vzalloc_node_noprof - allocate memory on a specific node with zero fill
  * @size:	allocation size
  * @node:	numa node
  *
@@ -3509,12 +3509,12 @@ EXPORT_SYMBOL(vmalloc_node);
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *vzalloc_node(unsigned long size, int node)
+void *vzalloc_node_noprof(unsigned long size, int node)
 {
-	return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, node,
+	return __vmalloc_node_noprof(size, 1, GFP_KERNEL | __GFP_ZERO, node,
 				__builtin_return_address(0));
 }
-EXPORT_SYMBOL(vzalloc_node);
+EXPORT_SYMBOL(vzalloc_node_noprof);
 
 #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
 #define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL)
@@ -3529,7 +3529,7 @@ EXPORT_SYMBOL(vzalloc_node);
 #endif
 
 /**
- * vmalloc_32 - allocate virtually contiguous memory (32bit addressable)
+ * vmalloc_32_noprof - allocate virtually contiguous memory (32bit addressable)
  * @size:	allocation size
  *
  * Allocate enough 32bit PA addressable pages to cover @size from the
@@ -3537,15 +3537,15 @@ EXPORT_SYMBOL(vzalloc_node);
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *vmalloc_32(unsigned long size)
+void *vmalloc_32_noprof(unsigned long size)
 {
-	return __vmalloc_node(size, 1, GFP_VMALLOC32, NUMA_NO_NODE,
+	return __vmalloc_node_noprof(size, 1, GFP_VMALLOC32, NUMA_NO_NODE,
 			__builtin_return_address(0));
 }
-EXPORT_SYMBOL(vmalloc_32);
+EXPORT_SYMBOL(vmalloc_32_noprof);
 
 /**
- * vmalloc_32_user - allocate zeroed virtually contiguous 32bit memory
+ * vmalloc_32_user_noprof - allocate zeroed virtually contiguous 32bit memory
  * @size:	     allocation size
  *
  * The resulting memory area is 32bit addressable and zeroed so it can be
@@ -3553,14 +3553,14 @@ EXPORT_SYMBOL(vmalloc_32);
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *vmalloc_32_user(unsigned long size)
+void *vmalloc_32_user_noprof(unsigned long size)
 {
-	return __vmalloc_node_range(size, SHMLBA,  VMALLOC_START, VMALLOC_END,
+	return __vmalloc_node_range_noprof(size, SHMLBA,  VMALLOC_START, VMALLOC_END,
 				    GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
 				    VM_USERMAP, NUMA_NO_NODE,
 				    __builtin_return_address(0));
 }
-EXPORT_SYMBOL(vmalloc_32_user);
+EXPORT_SYMBOL(vmalloc_32_user_noprof);
 
 /*
  * Atomically zero bytes in the iterator.
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 30/36] rhashtable: Plumb through alloc tag
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (28 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 29/36] mm: vmalloc: Enable memory allocation profiling Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 31/36] lib: add memory allocations report in show_mem() Suren Baghdasaryan
                   ` (6 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

From: Kent Overstreet <kent.overstreet@linux.dev>

This gives better memory allocation profiling results; rhashtable
allocations will be accounted to the code that initialized the
rhashtable.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/alloc_tag.h        |  3 +++
 include/linux/rhashtable-types.h | 11 +++++++++--
 lib/rhashtable.c                 | 28 +++++++++++++++++-----------
 3 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 86ed5d24a030..29636719b276 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -130,6 +130,8 @@ static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
 	this_cpu_add(tag->counters->bytes, bytes);
 }
 
+#define alloc_tag_record(p)	((p) = current->alloc_tag)
+
 #else /* CONFIG_MEM_ALLOC_PROFILING */
 
 #define DEFINE_ALLOC_TAG(_alloc_tag)
@@ -138,6 +140,7 @@ static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) {}
 static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes) {}
 static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
 				 size_t bytes) {}
+#define alloc_tag_record(p)	do {} while (0)
 
 #endif /* CONFIG_MEM_ALLOC_PROFILING */
 
diff --git a/include/linux/rhashtable-types.h b/include/linux/rhashtable-types.h
index b6f3797277ff..015c8298bebc 100644
--- a/include/linux/rhashtable-types.h
+++ b/include/linux/rhashtable-types.h
@@ -9,6 +9,7 @@
 #ifndef _LINUX_RHASHTABLE_TYPES_H
 #define _LINUX_RHASHTABLE_TYPES_H
 
+#include <linux/alloc_tag.h>
 #include <linux/atomic.h>
 #include <linux/compiler.h>
 #include <linux/mutex.h>
@@ -88,6 +89,9 @@ struct rhashtable {
 	struct mutex                    mutex;
 	spinlock_t			lock;
 	atomic_t			nelems;
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+	struct alloc_tag		*alloc_tag;
+#endif
 };
 
 /**
@@ -127,9 +131,12 @@ struct rhashtable_iter {
 	bool end_of_table;
 };
 
-int rhashtable_init(struct rhashtable *ht,
+int rhashtable_init_noprof(struct rhashtable *ht,
 		    const struct rhashtable_params *params);
-int rhltable_init(struct rhltable *hlt,
+#define rhashtable_init(...)	alloc_hooks(rhashtable_init_noprof(__VA_ARGS__))
+
+int rhltable_init_noprof(struct rhltable *hlt,
 		  const struct rhashtable_params *params);
+#define rhltable_init(...)	alloc_hooks(rhltable_init_noprof(__VA_ARGS__))
 
 #endif /* _LINUX_RHASHTABLE_TYPES_H */
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 6ae2ba8e06a2..35d841cf2b43 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -130,7 +130,8 @@ static union nested_table *nested_table_alloc(struct rhashtable *ht,
 	if (ntbl)
 		return ntbl;
 
-	ntbl = kzalloc(PAGE_SIZE, GFP_ATOMIC);
+	ntbl = alloc_hooks_tag(ht->alloc_tag,
+			kmalloc_noprof(PAGE_SIZE, GFP_ATOMIC|__GFP_ZERO));
 
 	if (ntbl && leaf) {
 		for (i = 0; i < PAGE_SIZE / sizeof(ntbl[0]); i++)
@@ -157,7 +158,8 @@ static struct bucket_table *nested_bucket_table_alloc(struct rhashtable *ht,
 
 	size = sizeof(*tbl) + sizeof(tbl->buckets[0]);
 
-	tbl = kzalloc(size, gfp);
+	tbl = alloc_hooks_tag(ht->alloc_tag,
+			kmalloc_noprof(size, gfp|__GFP_ZERO));
 	if (!tbl)
 		return NULL;
 
@@ -181,7 +183,9 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 	int i;
 	static struct lock_class_key __key;
 
-	tbl = kvzalloc(struct_size(tbl, buckets, nbuckets), gfp);
+	tbl = alloc_hooks_tag(ht->alloc_tag,
+			kvmalloc_node_noprof(struct_size(tbl, buckets, nbuckets),
+					     gfp|__GFP_ZERO, NUMA_NO_NODE));
 
 	size = nbuckets;
 
@@ -975,7 +979,7 @@ static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed)
 }
 
 /**
- * rhashtable_init - initialize a new hash table
+ * rhashtable_init_noprof - initialize a new hash table
  * @ht:		hash table to be initialized
  * @params:	configuration parameters
  *
@@ -1016,7 +1020,7 @@ static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed)
  *	.obj_hashfn = my_hash_fn,
  * };
  */
-int rhashtable_init(struct rhashtable *ht,
+int rhashtable_init_noprof(struct rhashtable *ht,
 		    const struct rhashtable_params *params)
 {
 	struct bucket_table *tbl;
@@ -1031,6 +1035,8 @@ int rhashtable_init(struct rhashtable *ht,
 	spin_lock_init(&ht->lock);
 	memcpy(&ht->p, params, sizeof(*params));
 
+	alloc_tag_record(ht->alloc_tag);
+
 	if (params->min_size)
 		ht->p.min_size = roundup_pow_of_two(params->min_size);
 
@@ -1076,26 +1082,26 @@ int rhashtable_init(struct rhashtable *ht,
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(rhashtable_init);
+EXPORT_SYMBOL_GPL(rhashtable_init_noprof);
 
 /**
- * rhltable_init - initialize a new hash list table
+ * rhltable_init_noprof - initialize a new hash list table
  * @hlt:	hash list table to be initialized
  * @params:	configuration parameters
  *
  * Initializes a new hash list table.
  *
- * See documentation for rhashtable_init.
+ * See documentation for rhashtable_init_noprof.
  */
-int rhltable_init(struct rhltable *hlt, const struct rhashtable_params *params)
+int rhltable_init_noprof(struct rhltable *hlt, const struct rhashtable_params *params)
 {
 	int err;
 
-	err = rhashtable_init(&hlt->ht, params);
+	err = rhashtable_init_noprof(&hlt->ht, params);
 	hlt->ht.rhlist = true;
 	return err;
 }
-EXPORT_SYMBOL_GPL(rhltable_init);
+EXPORT_SYMBOL_GPL(rhltable_init_noprof);
 
 static void rhashtable_free_one(struct rhashtable *ht, struct rhash_head *obj,
 				void (*free_fn)(void *ptr, void *arg),
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 31/36] lib: add memory allocations report in show_mem()
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (29 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 30/36] rhashtable: Plumb through alloc tag Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-27 13:19   ` Vlastimil Babka
  2024-02-21 19:40 ` [PATCH v4 32/36] codetag: debug: skip objext checking when it's for objext itself Suren Baghdasaryan
                   ` (5 subsequent siblings)
  36 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Include allocations in show_mem reports.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/alloc_tag.h |  7 +++++++
 include/linux/codetag.h   |  1 +
 lib/alloc_tag.c           | 38 ++++++++++++++++++++++++++++++++++++++
 lib/codetag.c             |  5 +++++
 mm/show_mem.c             | 26 ++++++++++++++++++++++++++
 5 files changed, 77 insertions(+)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 29636719b276..85a24a027403 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,6 +30,13 @@ struct alloc_tag {
 
 #ifdef CONFIG_MEM_ALLOC_PROFILING
 
+struct codetag_bytes {
+	struct codetag *ct;
+	s64 bytes;
+};
+
+size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sleep);
+
 static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct)
 {
 	return container_of(ct, struct alloc_tag, ct);
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index bfd0ba5c4185..c2a579ccd455 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -61,6 +61,7 @@ struct codetag_iterator {
 }
 
 void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
+bool codetag_trylock_module_list(struct codetag_type *cttype);
 struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
 struct codetag *codetag_next_ct(struct codetag_iterator *iter);
 
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index cb5adec4b2e2..ec54f29482dc 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -86,6 +86,44 @@ static const struct seq_operations allocinfo_seq_op = {
 	.show	= allocinfo_show,
 };
 
+size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sleep)
+{
+	struct codetag_iterator iter;
+	struct codetag *ct;
+	struct codetag_bytes n;
+	unsigned int i, nr = 0;
+
+	if (can_sleep)
+		codetag_lock_module_list(alloc_tag_cttype, true);
+	else if (!codetag_trylock_module_list(alloc_tag_cttype))
+		return 0;
+
+	iter = codetag_get_ct_iter(alloc_tag_cttype);
+	while ((ct = codetag_next_ct(&iter))) {
+		struct alloc_tag_counters counter = alloc_tag_read(ct_to_alloc_tag(ct));
+
+		n.ct	= ct;
+		n.bytes = counter.bytes;
+
+		for (i = 0; i < nr; i++)
+			if (n.bytes > tags[i].bytes)
+				break;
+
+		if (i < count) {
+			nr -= nr == count;
+			memmove(&tags[i + 1],
+				&tags[i],
+				sizeof(tags[0]) * (nr - i));
+			nr++;
+			tags[i] = n;
+		}
+	}
+
+	codetag_lock_module_list(alloc_tag_cttype, false);
+
+	return nr;
+}
+
 static void __init procfs_init(void)
 {
 	proc_create_seq("allocinfo", 0444, NULL, &allocinfo_seq_op);
diff --git a/lib/codetag.c b/lib/codetag.c
index b13412ca57cc..7b39cec9648a 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -36,6 +36,11 @@ void codetag_lock_module_list(struct codetag_type *cttype, bool lock)
 		up_read(&cttype->mod_lock);
 }
 
+bool codetag_trylock_module_list(struct codetag_type *cttype)
+{
+	return down_read_trylock(&cttype->mod_lock) != 0;
+}
+
 struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
 {
 	struct codetag_iterator iter = {
diff --git a/mm/show_mem.c b/mm/show_mem.c
index 8dcfafbd283c..1e41f8d6e297 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -423,4 +423,30 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
 #ifdef CONFIG_MEMORY_FAILURE
 	printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
 #endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+	{
+		struct codetag_bytes tags[10];
+		size_t i, nr;
+
+		nr = alloc_tag_top_users(tags, ARRAY_SIZE(tags), false);
+		if (nr) {
+			printk(KERN_NOTICE "Memory allocations:\n");
+			for (i = 0; i < nr; i++) {
+				struct codetag *ct = tags[i].ct;
+				struct alloc_tag *tag = ct_to_alloc_tag(ct);
+				struct alloc_tag_counters counter = alloc_tag_read(tag);
+
+				/* Same as alloc_tag_to_text() but w/o intermediate buffer */
+				if (ct->modname)
+					printk(KERN_NOTICE "%12lli %8llu %s:%u [%s] func:%s\n",
+					       counter.bytes, counter.calls, ct->filename,
+					       ct->lineno, ct->modname, ct->function);
+				else
+					printk(KERN_NOTICE "%12lli %8llu %s:%u func:%s\n",
+					       counter.bytes, counter.calls, ct->filename,
+					       ct->lineno, ct->function);
+			}
+		}
+	}
+#endif
 }
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 32/36] codetag: debug: skip objext checking when it's for objext itself
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (30 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 31/36] lib: add memory allocations report in show_mem() Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 33/36] codetag: debug: mark codetags for reserved pages as empty Suren Baghdasaryan
                   ` (4 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

objext objects are created with __GFP_NO_OBJ_EXT flag and therefore have
no corresponding objext themselves (otherwise we would get an infinite
recursion). When freeing these objects their codetag will be empty and
when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled this will lead to false
warnings. Introduce CODETAG_EMPTY special codetag value to mark
allocations which intentionally lack codetag to avoid these warnings.
Set objext codetags to CODETAG_EMPTY before freeing to indicate that
the codetag is expected to be empty.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/alloc_tag.h | 26 ++++++++++++++++++++++++++
 mm/slub.c                 | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 85a24a027403..4a3fc865d878 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -28,6 +28,27 @@ struct alloc_tag {
 	struct alloc_tag_counters __percpu	*counters;
 } __aligned(8);
 
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+
+#define CODETAG_EMPTY	((void *)1)
+
+static inline bool is_codetag_empty(union codetag_ref *ref)
+{
+	return ref->ct == CODETAG_EMPTY;
+}
+
+static inline void set_codetag_empty(union codetag_ref *ref)
+{
+	if (ref)
+		ref->ct = CODETAG_EMPTY;
+}
+
+#else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
+static inline bool is_codetag_empty(union codetag_ref *ref) { return false; }
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
 #ifdef CONFIG_MEM_ALLOC_PROFILING
 
 struct codetag_bytes {
@@ -91,6 +112,11 @@ static inline void __alloc_tag_sub(union codetag_ref *ref, size_t bytes)
 	if (!ref || !ref->ct)
 		return;
 
+	if (is_codetag_empty(ref)) {
+		ref->ct = NULL;
+		return;
+	}
+
 	tag = ct_to_alloc_tag(ref->ct);
 
 	this_cpu_sub(tag->counters->bytes, bytes);
diff --git a/mm/slub.c b/mm/slub.c
index 920b24b4140e..3e41d45f9fa4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1883,6 +1883,30 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
 
 #ifdef CONFIG_SLAB_OBJ_EXT
 
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+
+static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
+{
+	struct slabobj_ext *slab_exts;
+	struct slab *obj_exts_slab;
+
+	obj_exts_slab = virt_to_slab(obj_exts);
+	slab_exts = slab_obj_exts(obj_exts_slab);
+	if (slab_exts) {
+		unsigned int offs = obj_to_index(obj_exts_slab->slab_cache,
+						 obj_exts_slab, obj_exts);
+		/* codetag should be NULL */
+		WARN_ON(slab_exts[offs].ref.ct);
+		set_codetag_empty(&slab_exts[offs].ref);
+	}
+}
+
+#else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
+static inline void mark_objexts_empty(struct slabobj_ext *obj_exts) {}
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
 /*
  * The allocated objcg pointers array is not accounted directly.
  * Moreover, it should not come from DMA buffer and is not readily
@@ -1923,6 +1947,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 		 * assign slabobj_exts in parallel. In this case the existing
 		 * objcg vector should be reused.
 		 */
+		mark_objexts_empty(vec);
 		kfree(vec);
 		return 0;
 	}
@@ -1939,6 +1964,14 @@ static inline void free_slab_obj_exts(struct slab *slab)
 	if (!obj_exts)
 		return;
 
+	/*
+	 * obj_exts was created with __GFP_NO_OBJ_EXT flag, therefore its
+	 * corresponding extension will be NULL. alloc_tag_sub() will throw a
+	 * warning if slab has extensions but the extension of an object is
+	 * NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
+	 * the extension for obj_exts is expected to be NULL.
+	 */
+	mark_objexts_empty(obj_exts);
 	kfree(obj_exts);
 	slab->obj_exts = 0;
 }
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 33/36] codetag: debug: mark codetags for reserved pages as empty
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (31 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 32/36] codetag: debug: skip objext checking when it's for objext itself Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 34/36] codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations Suren Baghdasaryan
                   ` (3 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

To avoid debug warnings while freeing reserved pages which were not
allocated with usual allocators, mark their codetags as empty before
freeing.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 include/linux/alloc_tag.h   |  1 +
 include/linux/mm.h          |  9 +++++++++
 include/linux/pgalloc_tag.h |  2 ++
 mm/mm_init.c                | 12 +++++++++++-
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 4a3fc865d878..64aa9557341e 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -46,6 +46,7 @@ static inline void set_codetag_empty(union codetag_ref *ref)
 #else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
 
 static inline bool is_codetag_empty(union codetag_ref *ref) { return false; }
+static inline void set_codetag_empty(union codetag_ref *ref) {}
 
 #endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f5a97dec5169..b9a4e2cb3ac1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -5,6 +5,7 @@
 #include <linux/errno.h>
 #include <linux/mmdebug.h>
 #include <linux/gfp.h>
+#include <linux/pgalloc_tag.h>
 #include <linux/bug.h>
 #include <linux/list.h>
 #include <linux/mmzone.h>
@@ -3112,6 +3113,14 @@ extern void reserve_bootmem_region(phys_addr_t start,
 /* Free the reserved page into the buddy system, so it gets managed. */
 static inline void free_reserved_page(struct page *page)
 {
+	if (mem_alloc_profiling_enabled()) {
+		union codetag_ref *ref = get_page_tag_ref(page);
+
+		if (ref) {
+			set_codetag_empty(ref);
+			put_page_tag_ref(ref);
+		}
+	}
 	ClearPageReserved(page);
 	init_page_count(page);
 	__free_page(page);
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 9e6ad8e0e4aa..7a41ed612423 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -98,6 +98,8 @@ static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
 
 #else /* CONFIG_MEM_ALLOC_PROFILING */
 
+static inline union codetag_ref *get_page_tag_ref(struct page *page) { return NULL; }
+static inline void put_page_tag_ref(union codetag_ref *ref) {}
 static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
 				   unsigned int order) {}
 static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {}
diff --git a/mm/mm_init.c b/mm/mm_init.c
index e9ea2919d02d..6b5410a5112c 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2566,7 +2566,6 @@ void __init set_dma_reserve(unsigned long new_dma_reserve)
 void __init memblock_free_pages(struct page *page, unsigned long pfn,
 							unsigned int order)
 {
-
 	if (IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT)) {
 		int nid = early_pfn_to_nid(pfn);
 
@@ -2578,6 +2577,17 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
 		/* KMSAN will take care of these pages. */
 		return;
 	}
+
+	/* pages were reserved and not allocated */
+	if (mem_alloc_profiling_enabled()) {
+		union codetag_ref *ref = get_page_tag_ref(page);
+
+		if (ref) {
+			set_codetag_empty(ref);
+			put_page_tag_ref(ref);
+		}
+	}
+
 	__free_pages_core(page, order);
 }
 
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 34/36] codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (32 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 33/36] codetag: debug: mark codetags for reserved pages as empty Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 35/36] MAINTAINERS: Add entries for code tagging and memory allocation profiling Suren Baghdasaryan
                   ` (2 subsequent siblings)
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

If slabobj_ext vector allocation for a slab object fails and later on it
succeeds for another object in the same slab, the slabobj_ext for the
original object will be NULL and will be flagged in case when
CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled.
Mark failed slabobj_ext vector allocations using a new objext_flags flag
stored in the lower bits of slab->obj_exts. When new allocation succeeds
it marks all tag references in the same slabobj_ext vector as empty to
avoid warnings implemented by CONFIG_MEM_ALLOC_PROFILING_DEBUG checks.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/memcontrol.h |  4 +++-
 mm/slub.c                  | 46 ++++++++++++++++++++++++++++++++------
 2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 2b010316016c..f95241ca9052 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -365,8 +365,10 @@ enum page_memcg_data_flags {
 #endif /* CONFIG_MEMCG */
 
 enum objext_flags {
+	/* slabobj_ext vector failed to allocate */
+	OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG,
 	/* the next bit after the last actual flag */
-	__NR_OBJEXTS_FLAGS  = __FIRST_OBJEXT_FLAG,
+	__NR_OBJEXTS_FLAGS  = (__FIRST_OBJEXT_FLAG << 1),
 };
 
 #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
diff --git a/mm/slub.c b/mm/slub.c
index 3e41d45f9fa4..43d63747cad2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1901,9 +1901,33 @@ static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
 	}
 }
 
+static inline void mark_failed_objexts_alloc(struct slab *slab)
+{
+	slab->obj_exts = OBJEXTS_ALLOC_FAIL;
+}
+
+static inline void handle_failed_objexts_alloc(unsigned long obj_exts,
+			struct slabobj_ext *vec, unsigned int objects)
+{
+	/*
+	 * If vector previously failed to allocate then we have live
+	 * objects with no tag reference. Mark all references in this
+	 * vector as empty to avoid warnings later on.
+	 */
+	if (obj_exts & OBJEXTS_ALLOC_FAIL) {
+		unsigned int i;
+
+		for (i = 0; i < objects; i++)
+			set_codetag_empty(&vec[i].ref);
+	}
+}
+
 #else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
 
 static inline void mark_objexts_empty(struct slabobj_ext *obj_exts) {}
+static inline void mark_failed_objexts_alloc(struct slab *slab) {}
+static inline void handle_failed_objexts_alloc(unsigned long obj_exts,
+			struct slabobj_ext *vec, unsigned int objects) {}
 
 #endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
 
@@ -1919,29 +1943,37 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 			gfp_t gfp, bool new_slab)
 {
 	unsigned int objects = objs_per_slab(s, slab);
-	unsigned long obj_exts;
-	void *vec;
+	unsigned long new_exts;
+	unsigned long old_exts;
+	struct slabobj_ext *vec;
 
 	gfp &= ~OBJCGS_CLEAR_MASK;
 	/* Prevent recursive extension vector allocation */
 	gfp |= __GFP_NO_OBJ_EXT;
 	vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
 			   slab_nid(slab));
-	if (!vec)
+	if (!vec) {
+		/* Mark vectors which failed to allocate */
+		if (new_slab)
+			mark_failed_objexts_alloc(slab);
+
 		return -ENOMEM;
+	}
 
-	obj_exts = (unsigned long)vec;
+	new_exts = (unsigned long)vec;
 #ifdef CONFIG_MEMCG
-	obj_exts |= MEMCG_DATA_OBJEXTS;
+	new_exts |= MEMCG_DATA_OBJEXTS;
 #endif
+	old_exts = slab->obj_exts;
+	handle_failed_objexts_alloc(old_exts, vec, objects);
 	if (new_slab) {
 		/*
 		 * If the slab is brand new and nobody can yet access its
 		 * obj_exts, no synchronization is required and obj_exts can
 		 * be simply assigned.
 		 */
-		slab->obj_exts = obj_exts;
-	} else if (cmpxchg(&slab->obj_exts, 0, obj_exts)) {
+		slab->obj_exts = new_exts;
+	} else if (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts) {
 		/*
 		 * If the slab is already in use, somebody can allocate and
 		 * assign slabobj_exts in parallel. In this case the existing
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 35/36] MAINTAINERS: Add entries for code tagging and memory allocation profiling
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (33 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 34/36] codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-21 19:40 ` [PATCH v4 36/36] memprofiling: Documentation Suren Baghdasaryan
  2024-02-27 13:36 ` [PATCH v4 00/36] Memory allocation profiling Vlastimil Babka
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

From: Kent Overstreet <kent.overstreet@linux.dev>

The new code & libraries added are being maintained - mark them as such.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 MAINTAINERS | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9ed4d3868539..4f131872da27 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5210,6 +5210,13 @@ S:	Supported
 F:	Documentation/process/code-of-conduct-interpretation.rst
 F:	Documentation/process/code-of-conduct.rst
 
+CODE TAGGING
+M:	Suren Baghdasaryan <surenb@google.com>
+M:	Kent Overstreet <kent.overstreet@linux.dev>
+S:	Maintained
+F:	include/linux/codetag.h
+F:	lib/codetag.c
+
 COMEDI DRIVERS
 M:	Ian Abbott <abbotti@mev.co.uk>
 M:	H Hartley Sweeten <hsweeten@visionengravers.com>
@@ -14061,6 +14068,16 @@ F:	mm/memblock.c
 F:	mm/mm_init.c
 F:	tools/testing/memblock/
 
+MEMORY ALLOCATION PROFILING
+M:	Suren Baghdasaryan <surenb@google.com>
+M:	Kent Overstreet <kent.overstreet@linux.dev>
+L:	linux-mm@kvack.org
+S:	Maintained
+F:	include/linux/alloc_tag.h
+F:	include/linux/codetag_ctx.h
+F:	lib/alloc_tag.c
+F:	lib/pgalloc_tag.c
+
 MEMORY CONTROLLER DRIVERS
 M:	Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
 L:	linux-kernel@vger.kernel.org
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v4 36/36] memprofiling: Documentation
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (34 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 35/36] MAINTAINERS: Add entries for code tagging and memory allocation profiling Suren Baghdasaryan
@ 2024-02-21 19:40 ` Suren Baghdasaryan
  2024-02-27 13:36 ` [PATCH v4 00/36] Memory allocation profiling Vlastimil Babka
  36 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-21 19:40 UTC (permalink / raw)
  To: akpm
  Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	surenb, kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

From: Kent Overstreet <kent.overstreet@linux.dev>

Provide documentation for memory allocation profiling.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 Documentation/mm/allocation-profiling.rst | 86 +++++++++++++++++++++++
 1 file changed, 86 insertions(+)
 create mode 100644 Documentation/mm/allocation-profiling.rst

diff --git a/Documentation/mm/allocation-profiling.rst b/Documentation/mm/allocation-profiling.rst
new file mode 100644
index 000000000000..2bcbd9e51fe4
--- /dev/null
+++ b/Documentation/mm/allocation-profiling.rst
@@ -0,0 +1,86 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
+MEMORY ALLOCATION PROFILING
+===========================
+
+Low overhead (suitable for production) accounting of all memory allocations,
+tracked by file and line number.
+
+Usage:
+kconfig options:
+ - CONFIG_MEM_ALLOC_PROFILING
+ - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
+ - CONFIG_MEM_ALLOC_PROFILING_DEBUG
+   adds warnings for allocations that weren't accounted because of a
+   missing annotation
+
+Boot parameter:
+  sysctl.vm.mem_profiling=1
+
+sysctl:
+  /proc/sys/vm/mem_profiling
+
+Runtime info:
+  /proc/allocinfo
+
+Example output:
+  root@moria-kvm:~# sort -g /proc/allocinfo|tail|numfmt --to=iec
+        2.8M    22648 fs/kernfs/dir.c:615 func:__kernfs_new_node
+        3.8M      953 mm/memory.c:4214 func:alloc_anon_folio
+        4.0M     1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
+        4.1M        4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
+        6.0M     1532 mm/filemap.c:1919 func:__filemap_get_folio
+        8.8M     2785 kernel/fork.c:307 func:alloc_thread_stack_node
+         13M      234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
+         14M     3520 mm/mm_init.c:2530 func:alloc_large_system_hash
+         15M     3656 mm/readahead.c:247 func:page_cache_ra_unbounded
+         55M     4887 mm/slub.c:2259 func:alloc_slab_page
+        122M    31168 mm/page_ext.c:270 func:alloc_page_ext
+===================
+Theory of operation
+===================
+
+Memory allocation profiling builds off of code tagging, which is a library for
+declaring static structs (that typcially describe a file and line number in
+some way, hence code tagging) and then finding and operating on them at runtime
+- i.e. iterating over them to print them in debugfs/procfs.
+
+To add accounting for an allocation call, we replace it with a macro
+invocation, alloc_hooks(), that
+ - declares a code tag
+ - stashes a pointer to it in task_struct
+ - calls the real allocation function
+ - and finally, restores the task_struct alloc tag pointer to its previous value.
+
+This allows for alloc_hooks() calls to be nested, with the most recent one
+taking effect. This is important for allocations internal to the mm/ code that
+do not properly belong to the outer allocation context and should be counted
+separately: for example, slab object extension vectors, or when the slab
+allocates pages from the page allocator.
+
+Thus, proper usage requires determining which function in an allocation call
+stack should be tagged. There are many helper functions that essentially wrap
+e.g. kmalloc() and do a little more work, then are called in multiple places;
+we'll generally want the accounting to happen in the callers of these helpers,
+not in the helpers themselves.
+
+To fix up a given helper, for example foo(), do the following:
+ - switch its allocation call to the _noprof() version, e.g. kmalloc_noprof()
+ - rename it to foo_noprof()
+ - define a macro version of foo() like so:
+   #define foo(...) alloc_hooks(foo_noprof(__VA_ARGS__))
+
+It's also possible to stash a pointer to an alloc tag in your own data structures.
+
+Do this when you're implementing a generic data structure that does allocations
+"on behalf of" some other code - for example, the rhashtable code. This way,
+instead of seeing a large line in /proc/allocinfo for rhashtable.c, we can
+break it out by rhashtable type.
+
+To do so:
+ - Hook your data structure's init function, like any other allocation function
+ - Within your init function, use the convenience macro alloc_tag_record() to
+   record alloc tag in your data structure.
+ - Then, use the following form for your allocations:
+   alloc_hooks_tag(ht->your_saved_tag, kmalloc_noprof(...))
-- 
2.44.0.rc0.258.g7320e95886-goog


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 01/36] fix missing vmalloc.h includes
  2024-02-21 19:40 ` [PATCH v4 01/36] fix missing vmalloc.h includes Suren Baghdasaryan
@ 2024-02-21 21:09   ` Pasha Tatashin
  0 siblings, 0 replies; 98+ messages in thread
From: Pasha Tatashin @ 2024-02-21 21:09 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, yosryahmed,
	yuzhao, dhowells, hughd, andreyknvl, keescook, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 2:40 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> From: Kent Overstreet <kent.overstreet@linux.dev>
>
> The next patch drops vmalloc.h from a system header in order to fix
> a circular dependency; this adds it to all the files that were pulling
> it in implicitly.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 02/36] asm-generic/io.h: Kill vmalloc.h dependency
  2024-02-21 19:40 ` [PATCH v4 02/36] asm-generic/io.h: Kill vmalloc.h dependency Suren Baghdasaryan
@ 2024-02-21 21:11   ` Pasha Tatashin
  0 siblings, 0 replies; 98+ messages in thread
From: Pasha Tatashin @ 2024-02-21 21:11 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, yosryahmed,
	yuzhao, dhowells, hughd, andreyknvl, keescook, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> From: Kent Overstreet <kent.overstreet@linux.dev>
>
> Needed to avoid a new circular dependency with the memory allocation
> profiling series.
>
> Naturally, a whole bunch of files needed to include vmalloc.h that were
> previously getting it implicitly.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline
  2024-02-21 19:40 ` [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline Suren Baghdasaryan
@ 2024-02-21 21:15   ` Pasha Tatashin
  2024-02-24  2:02     ` Suren Baghdasaryan
  0 siblings, 1 reply; 98+ messages in thread
From: Pasha Tatashin @ 2024-02-21 21:15 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, yosryahmed,
	yuzhao, dhowells, hughd, andreyknvl, keescook, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> From: Kent Overstreet <kent.overstreet@linux.dev>
>
> It seems we need to be more forceful with the compiler on this one.
> This is done for performance reasons only.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> ---
>  mm/slub.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 2ef88bbf56a3..d31b03a8d9d5 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2121,7 +2121,7 @@ bool slab_free_hook(struct kmem_cache *s, void *x, bool init)
>         return !kasan_slab_free(s, x, init);
>  }
>
> -static inline bool slab_free_freelist_hook(struct kmem_cache *s,
> +static __always_inline bool slab_free_freelist_hook(struct kmem_cache *s,

__fastpath_inline seems to me more appropriate here. It prioritizes
memory vs performance.

>                                            void **head, void **tail,
>                                            int *cnt)
>  {
> --
> 2.44.0.rc0.258.g7320e95886-goog
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 04/36] scripts/kallysms: Always include __start and __stop symbols
  2024-02-21 19:40 ` [PATCH v4 04/36] scripts/kallysms: Always include __start and __stop symbols Suren Baghdasaryan
@ 2024-02-21 21:20   ` Pasha Tatashin
  0 siblings, 0 replies; 98+ messages in thread
From: Pasha Tatashin @ 2024-02-21 21:20 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, yosryahmed,
	yuzhao, dhowells, hughd, andreyknvl, keescook, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> From: Kent Overstreet <kent.overstreet@linux.dev>
>
> These symbols are used to denote section boundaries: by always including
> them we can unify loading sections from modules with loading built-in
> sections, which leads to some significant cleanup.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>

Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 05/36] fs: Convert alloc_inode_sb() to a macro
  2024-02-21 19:40 ` [PATCH v4 05/36] fs: Convert alloc_inode_sb() to a macro Suren Baghdasaryan
@ 2024-02-21 21:23   ` Pasha Tatashin
  2024-02-26 15:44   ` Vlastimil Babka
  1 sibling, 0 replies; 98+ messages in thread
From: Pasha Tatashin @ 2024-02-21 21:23 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, yosryahmed,
	yuzhao, dhowells, hughd, andreyknvl, keescook, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups,
	Alexander Viro

On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> From: Kent Overstreet <kent.overstreet@linux.dev>
>
> We're introducing alloc tagging, which tracks memory allocations by
> callsite. Converting alloc_inode_sb() to a macro means allocations will
> be tracked by its caller, which is a bit more useful.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Reviewed-by: Kees Cook <keescook@chromium.org>

Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 06/36] mm: enumerate all gfp flags
  2024-02-21 19:40 ` [PATCH v4 06/36] mm: enumerate all gfp flags Suren Baghdasaryan
@ 2024-02-21 21:25   ` Pasha Tatashin
  2024-02-22 12:12   ` Michal Hocko
  1 sibling, 0 replies; 98+ messages in thread
From: Pasha Tatashin @ 2024-02-21 21:25 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, yosryahmed,
	yuzhao, dhowells, hughd, andreyknvl, keescook, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups,
	Petr Tesařík

On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Introduce GFP bits enumeration to let compiler track the number of used
> bits (which depends on the config options) instead of hardcoding them.
> That simplifies __GFP_BITS_SHIFT calculation.
>
> Suggested-by: Petr Tesařík <petr@tesarici.cz>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>

Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 07/36] mm: introduce slabobj_ext to support slab object extensions
  2024-02-21 19:40 ` [PATCH v4 07/36] mm: introduce slabobj_ext to support slab object extensions Suren Baghdasaryan
@ 2024-02-21 21:30   ` Pasha Tatashin
  2024-02-26 16:26   ` Vlastimil Babka
  1 sibling, 0 replies; 98+ messages in thread
From: Pasha Tatashin @ 2024-02-21 21:30 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, yosryahmed,
	yuzhao, dhowells, hughd, andreyknvl, keescook, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Currently slab pages can store only vectors of obj_cgroup pointers in
> page->memcg_data. Introduce slabobj_ext structure to allow more data
> to be stored for each slab object. Wrap obj_cgroup into slabobj_ext
> to support current functionality while allowing to extend slabobj_ext
> in the future.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling
  2024-02-21 19:40 ` [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling Suren Baghdasaryan
@ 2024-02-21 23:05   ` Kees Cook
  2024-02-21 23:29     ` Kent Overstreet
  2024-02-28  8:29   ` Vlastimil Babka
  2024-02-28  8:41   ` Vlastimil Babka
  2 siblings, 1 reply; 98+ messages in thread
From: Kees Cook @ 2024-02-21 23:05 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 11:40:27AM -0800, Suren Baghdasaryan wrote:
> [...]
> +struct alloc_tag {
> +	struct codetag			ct;
> +	struct alloc_tag_counters __percpu	*counters;
> +} __aligned(8);
> [...]
> +#define DEFINE_ALLOC_TAG(_alloc_tag)						\
> +	static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);	\
> +	static struct alloc_tag _alloc_tag __used __aligned(8)			\
> +	__section("alloc_tags") = {						\
> +		.ct = CODE_TAG_INIT,						\
> +		.counters = &_alloc_tag_cntr };
> [...]
> +static inline struct alloc_tag *alloc_tag_save(struct alloc_tag *tag)
> +{
> +	swap(current->alloc_tag, tag);
> +	return tag;
> +}

Future security hardening improvement idea based on this infrastructure:
it should be possible to implement per-allocation-site kmem caches. For
example, we could create:

struct alloc_details {
	u32 flags;
	union {
		u32 size; /* not valid after __init completes */
		struct kmem_cache *cache;
	};
};

- add struct alloc_details to struct alloc_tag
- move the tags section into .ro_after_init
- extend alloc_hooks() to populate flags and size:
	.flags = __builtin_constant_p(size) ? KMALLOC_ALLOCATE_FIXED
					    : KMALLOC_ALLOCATE_BUCKETS;
	.size = __builtin_constant_p(size) ? size : SIZE_MAX;
- during kernel start or module init, walk the alloc_tag list
  and create either a fixed-size kmem_cache or to allocate a
  full set of kmalloc-buckets, and update the "cache" member.
- adjust kmalloc core routines to use current->alloc_tag->cache instead
  of using the global buckets.

This would get us fully separated allocations, producing better than
type-based levels of granularity, exceeding what we have currently with
CONFIG_RANDOM_KMALLOC_CACHES.

Does this look possible, or am I misunderstanding something in the
infrastructure being created here?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling
  2024-02-21 23:05   ` Kees Cook
@ 2024-02-21 23:29     ` Kent Overstreet
  2024-02-22  0:25       ` Kees Cook
  0 siblings, 1 reply; 98+ messages in thread
From: Kent Overstreet @ 2024-02-21 23:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Suren Baghdasaryan, akpm, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 03:05:32PM -0800, Kees Cook wrote:
> On Wed, Feb 21, 2024 at 11:40:27AM -0800, Suren Baghdasaryan wrote:
> > [...]
> > +struct alloc_tag {
> > +	struct codetag			ct;
> > +	struct alloc_tag_counters __percpu	*counters;
> > +} __aligned(8);
> > [...]
> > +#define DEFINE_ALLOC_TAG(_alloc_tag)						\
> > +	static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);	\
> > +	static struct alloc_tag _alloc_tag __used __aligned(8)			\
> > +	__section("alloc_tags") = {						\
> > +		.ct = CODE_TAG_INIT,						\
> > +		.counters = &_alloc_tag_cntr };
> > [...]
> > +static inline struct alloc_tag *alloc_tag_save(struct alloc_tag *tag)
> > +{
> > +	swap(current->alloc_tag, tag);
> > +	return tag;
> > +}
> 
> Future security hardening improvement idea based on this infrastructure:
> it should be possible to implement per-allocation-site kmem caches. For
> example, we could create:
> 
> struct alloc_details {
> 	u32 flags;
> 	union {
> 		u32 size; /* not valid after __init completes */
> 		struct kmem_cache *cache;
> 	};
> };
> 
> - add struct alloc_details to struct alloc_tag
> - move the tags section into .ro_after_init
> - extend alloc_hooks() to populate flags and size:
> 	.flags = __builtin_constant_p(size) ? KMALLOC_ALLOCATE_FIXED
> 					    : KMALLOC_ALLOCATE_BUCKETS;
> 	.size = __builtin_constant_p(size) ? size : SIZE_MAX;
> - during kernel start or module init, walk the alloc_tag list
>   and create either a fixed-size kmem_cache or to allocate a
>   full set of kmalloc-buckets, and update the "cache" member.
> - adjust kmalloc core routines to use current->alloc_tag->cache instead
>   of using the global buckets.
> 
> This would get us fully separated allocations, producing better than
> type-based levels of granularity, exceeding what we have currently with
> CONFIG_RANDOM_KMALLOC_CACHES.
> 
> Does this look possible, or am I misunderstanding something in the
> infrastructure being created here?

Definitely possible, but... would we want this? That would produce a
_lot_ of kmem caches, and don't we already try to collapse those where
possible to reduce internal fragmentation?

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 08/36] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation
  2024-02-21 19:40 ` [PATCH v4 08/36] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation Suren Baghdasaryan
@ 2024-02-22  0:08   ` Pasha Tatashin
  2024-02-26 16:51   ` Vlastimil Babka
  1 sibling, 0 replies; 98+ messages in thread
From: Pasha Tatashin @ 2024-02-22  0:08 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, yosryahmed,
	yuzhao, dhowells, hughd, andreyknvl, keescook, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Introduce __GFP_NO_OBJ_EXT flag in order to prevent recursive allocations
> when allocating slabobj_ext on a slab.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>

Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 09/36] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation
  2024-02-21 19:40 ` [PATCH v4 09/36] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation Suren Baghdasaryan
@ 2024-02-22  0:09   ` Pasha Tatashin
  2024-02-26 16:52   ` Vlastimil Babka
  1 sibling, 0 replies; 98+ messages in thread
From: Pasha Tatashin @ 2024-02-22  0:09 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, yosryahmed,
	yuzhao, dhowells, hughd, andreyknvl, keescook, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Slab extension objects can't be allocated before slab infrastructure is
> initialized. Some caches, like kmem_cache and kmem_cache_node, are created
> before slab infrastructure is initialized. Objects from these caches can't
> have extension objects. Introduce SLAB_NO_OBJ_EXT slab flag to mark these
> caches and avoid creating extensions for objects allocated from these
> slabs.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>

Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 10/36] slab: objext: introduce objext_flags as extension to page_memcg_data_flags
  2024-02-21 19:40 ` [PATCH v4 10/36] slab: objext: introduce objext_flags as extension to page_memcg_data_flags Suren Baghdasaryan
@ 2024-02-22  0:12   ` Pasha Tatashin
  2024-02-26 16:53   ` Vlastimil Babka
  1 sibling, 0 replies; 98+ messages in thread
From: Pasha Tatashin @ 2024-02-22  0:12 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, yosryahmed,
	yuzhao, dhowells, hughd, andreyknvl, keescook, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Introduce objext_flags to store additional objext flags unrelated to memcg.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>

Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling
  2024-02-21 23:29     ` Kent Overstreet
@ 2024-02-22  0:25       ` Kees Cook
  2024-02-22  0:34         ` Kent Overstreet
  0 siblings, 1 reply; 98+ messages in thread
From: Kees Cook @ 2024-02-22  0:25 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Suren Baghdasaryan, akpm, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 06:29:17PM -0500, Kent Overstreet wrote:
> On Wed, Feb 21, 2024 at 03:05:32PM -0800, Kees Cook wrote:
> > On Wed, Feb 21, 2024 at 11:40:27AM -0800, Suren Baghdasaryan wrote:
> > > [...]
> > > +struct alloc_tag {
> > > +	struct codetag			ct;
> > > +	struct alloc_tag_counters __percpu	*counters;
> > > +} __aligned(8);
> > > [...]
> > > +#define DEFINE_ALLOC_TAG(_alloc_tag)						\
> > > +	static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);	\
> > > +	static struct alloc_tag _alloc_tag __used __aligned(8)			\
> > > +	__section("alloc_tags") = {						\
> > > +		.ct = CODE_TAG_INIT,						\
> > > +		.counters = &_alloc_tag_cntr };
> > > [...]
> > > +static inline struct alloc_tag *alloc_tag_save(struct alloc_tag *tag)
> > > +{
> > > +	swap(current->alloc_tag, tag);
> > > +	return tag;
> > > +}
> > 
> > Future security hardening improvement idea based on this infrastructure:
> > it should be possible to implement per-allocation-site kmem caches. For
> > example, we could create:
> > 
> > struct alloc_details {
> > 	u32 flags;
> > 	union {
> > 		u32 size; /* not valid after __init completes */
> > 		struct kmem_cache *cache;
> > 	};
> > };
> > 
> > - add struct alloc_details to struct alloc_tag
> > - move the tags section into .ro_after_init
> > - extend alloc_hooks() to populate flags and size:
> > 	.flags = __builtin_constant_p(size) ? KMALLOC_ALLOCATE_FIXED
> > 					    : KMALLOC_ALLOCATE_BUCKETS;
> > 	.size = __builtin_constant_p(size) ? size : SIZE_MAX;
> > - during kernel start or module init, walk the alloc_tag list
> >   and create either a fixed-size kmem_cache or to allocate a
> >   full set of kmalloc-buckets, and update the "cache" member.
> > - adjust kmalloc core routines to use current->alloc_tag->cache instead
> >   of using the global buckets.
> > 
> > This would get us fully separated allocations, producing better than
> > type-based levels of granularity, exceeding what we have currently with
> > CONFIG_RANDOM_KMALLOC_CACHES.
> > 
> > Does this look possible, or am I misunderstanding something in the
> > infrastructure being created here?
> 
> Definitely possible, but... would we want this?

Yes, very very much. One of the worst and mostly unaddressed weaknesses
with the kernel right now is use-after-free based type confusion[0], which
depends on merged caches (or cache reuse).

This doesn't solve cross-allocator (kmalloc/page_alloc) type confusion
(as terrifyingly demonstrated[1] by Jann Horn), but it does help with
what has been a very common case of "use msg_msg to impersonate your
target object"[2] exploitation.

> That would produce a _lot_ of kmem caches

Fewer than you'd expect, but yes, there is some overhead. However,
out-of-tree forks of Linux have successfully experimented with this
already and seen good results[3].

> and don't we already try to collapse those where possible to reduce
> internal fragmentation?

In the past, yes, but the desire for security has tended to have more
people building with SLAB_MERGE_DEFAULT=n and/or CONFIG_RANDOM_KMALLOC_CACHES=y
(or booting with "slab_nomerge").

Just doing the type safety isn't sufficient without the cross-allocator
safety, but we've also had solutions for that proposed[4].

-Kees

[0] https://github.com/KSPP/linux/issues/189
[1] https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html
[2] https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html
    https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html#exploring-struct-msg_msg
[3] https://grsecurity.net/how_autoslab_changes_the_memory_unsafety_game
[4] https://lore.kernel.org/linux-hardening/20230915105933.495735-1-matteorizzo@google.com/

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling
  2024-02-22  0:25       ` Kees Cook
@ 2024-02-22  0:34         ` Kent Overstreet
  2024-02-22  0:57           ` Kees Cook
  0 siblings, 1 reply; 98+ messages in thread
From: Kent Overstreet @ 2024-02-22  0:34 UTC (permalink / raw)
  To: Kees Cook
  Cc: Suren Baghdasaryan, akpm, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 04:25:02PM -0800, Kees Cook wrote:
> On Wed, Feb 21, 2024 at 06:29:17PM -0500, Kent Overstreet wrote:
> > On Wed, Feb 21, 2024 at 03:05:32PM -0800, Kees Cook wrote:
> > > On Wed, Feb 21, 2024 at 11:40:27AM -0800, Suren Baghdasaryan wrote:
> > > > [...]
> > > > +struct alloc_tag {
> > > > +	struct codetag			ct;
> > > > +	struct alloc_tag_counters __percpu	*counters;
> > > > +} __aligned(8);
> > > > [...]
> > > > +#define DEFINE_ALLOC_TAG(_alloc_tag)						\
> > > > +	static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);	\
> > > > +	static struct alloc_tag _alloc_tag __used __aligned(8)			\
> > > > +	__section("alloc_tags") = {						\
> > > > +		.ct = CODE_TAG_INIT,						\
> > > > +		.counters = &_alloc_tag_cntr };
> > > > [...]
> > > > +static inline struct alloc_tag *alloc_tag_save(struct alloc_tag *tag)
> > > > +{
> > > > +	swap(current->alloc_tag, tag);
> > > > +	return tag;
> > > > +}
> > > 
> > > Future security hardening improvement idea based on this infrastructure:
> > > it should be possible to implement per-allocation-site kmem caches. For
> > > example, we could create:
> > > 
> > > struct alloc_details {
> > > 	u32 flags;
> > > 	union {
> > > 		u32 size; /* not valid after __init completes */
> > > 		struct kmem_cache *cache;
> > > 	};
> > > };
> > > 
> > > - add struct alloc_details to struct alloc_tag
> > > - move the tags section into .ro_after_init
> > > - extend alloc_hooks() to populate flags and size:
> > > 	.flags = __builtin_constant_p(size) ? KMALLOC_ALLOCATE_FIXED
> > > 					    : KMALLOC_ALLOCATE_BUCKETS;
> > > 	.size = __builtin_constant_p(size) ? size : SIZE_MAX;
> > > - during kernel start or module init, walk the alloc_tag list
> > >   and create either a fixed-size kmem_cache or to allocate a
> > >   full set of kmalloc-buckets, and update the "cache" member.
> > > - adjust kmalloc core routines to use current->alloc_tag->cache instead
> > >   of using the global buckets.
> > > 
> > > This would get us fully separated allocations, producing better than
> > > type-based levels of granularity, exceeding what we have currently with
> > > CONFIG_RANDOM_KMALLOC_CACHES.
> > > 
> > > Does this look possible, or am I misunderstanding something in the
> > > infrastructure being created here?
> > 
> > Definitely possible, but... would we want this?
> 
> Yes, very very much. One of the worst and mostly unaddressed weaknesses
> with the kernel right now is use-after-free based type confusion[0], which
> depends on merged caches (or cache reuse).
> 
> This doesn't solve cross-allocator (kmalloc/page_alloc) type confusion
> (as terrifyingly demonstrated[1] by Jann Horn), but it does help with
> what has been a very common case of "use msg_msg to impersonate your
> target object"[2] exploitation.

We have a ton of code that references PAGE_SIZE and uses the page
allocator completely unnecessarily - that's something worth harping
about at conferences; if we could motivate people to clean that stuff up
it'd have a lot of positive effects.

> > That would produce a _lot_ of kmem caches
> 
> Fewer than you'd expect, but yes, there is some overhead. However,
> out-of-tree forks of Linux have successfully experimented with this
> already and seen good results[3].

So in that case - I don't think there's any need for a separate
alloc_details; we'd just add a kmem_cache * to alloc_tag and then hook
into the codetag init/unload path to create and destroy the kmem caches.

No need to adjust the slab code either; alloc_hooks() itself could
dispatch to kmem_cache_alloc() instead of kmalloc() if this is in use.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling
  2024-02-22  0:34         ` Kent Overstreet
@ 2024-02-22  0:57           ` Kees Cook
  0 siblings, 0 replies; 98+ messages in thread
From: Kees Cook @ 2024-02-22  0:57 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Suren Baghdasaryan, akpm, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 07:34:44PM -0500, Kent Overstreet wrote:
> On Wed, Feb 21, 2024 at 04:25:02PM -0800, Kees Cook wrote:
> > On Wed, Feb 21, 2024 at 06:29:17PM -0500, Kent Overstreet wrote:
> > > On Wed, Feb 21, 2024 at 03:05:32PM -0800, Kees Cook wrote:
> > > > On Wed, Feb 21, 2024 at 11:40:27AM -0800, Suren Baghdasaryan wrote:
> > > > > [...]
> > > > > +struct alloc_tag {
> > > > > +	struct codetag			ct;
> > > > > +	struct alloc_tag_counters __percpu	*counters;
> > > > > +} __aligned(8);
> > > > > [...]
> > > > > +#define DEFINE_ALLOC_TAG(_alloc_tag)						\
> > > > > +	static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr);	\
> > > > > +	static struct alloc_tag _alloc_tag __used __aligned(8)			\
> > > > > +	__section("alloc_tags") = {						\
> > > > > +		.ct = CODE_TAG_INIT,						\
> > > > > +		.counters = &_alloc_tag_cntr };
> > > > > [...]
> > > > > +static inline struct alloc_tag *alloc_tag_save(struct alloc_tag *tag)
> > > > > +{
> > > > > +	swap(current->alloc_tag, tag);
> > > > > +	return tag;
> > > > > +}
> > > > 
> > > > Future security hardening improvement idea based on this infrastructure:
> > > > it should be possible to implement per-allocation-site kmem caches. For
> > > > example, we could create:
> > > > 
> > > > struct alloc_details {
> > > > 	u32 flags;
> > > > 	union {
> > > > 		u32 size; /* not valid after __init completes */
> > > > 		struct kmem_cache *cache;
> > > > 	};
> > > > };
> > > > 
> > > > - add struct alloc_details to struct alloc_tag
> > > > - move the tags section into .ro_after_init
> > > > - extend alloc_hooks() to populate flags and size:
> > > > 	.flags = __builtin_constant_p(size) ? KMALLOC_ALLOCATE_FIXED
> > > > 					    : KMALLOC_ALLOCATE_BUCKETS;
> > > > 	.size = __builtin_constant_p(size) ? size : SIZE_MAX;
> > > > - during kernel start or module init, walk the alloc_tag list
> > > >   and create either a fixed-size kmem_cache or to allocate a
> > > >   full set of kmalloc-buckets, and update the "cache" member.
> > > > - adjust kmalloc core routines to use current->alloc_tag->cache instead
> > > >   of using the global buckets.
> > > > 
> > > > This would get us fully separated allocations, producing better than
> > > > type-based levels of granularity, exceeding what we have currently with
> > > > CONFIG_RANDOM_KMALLOC_CACHES.
> > > > 
> > > > Does this look possible, or am I misunderstanding something in the
> > > > infrastructure being created here?
> > > 
> > > Definitely possible, but... would we want this?
> > 
> > Yes, very very much. One of the worst and mostly unaddressed weaknesses
> > with the kernel right now is use-after-free based type confusion[0], which
> > depends on merged caches (or cache reuse).
> > 
> > This doesn't solve cross-allocator (kmalloc/page_alloc) type confusion
> > (as terrifyingly demonstrated[1] by Jann Horn), but it does help with
> > what has been a very common case of "use msg_msg to impersonate your
> > target object"[2] exploitation.
> 
> We have a ton of code that references PAGE_SIZE and uses the page
> allocator completely unnecessarily - that's something worth harping
> about at conferences; if we could motivate people to clean that stuff up
> it'd have a lot of positive effects.
> 
> > > That would produce a _lot_ of kmem caches
> > 
> > Fewer than you'd expect, but yes, there is some overhead. However,
> > out-of-tree forks of Linux have successfully experimented with this
> > already and seen good results[3].
> 
> So in that case - I don't think there's any need for a separate
> alloc_details; we'd just add a kmem_cache * to alloc_tag and then hook
> into the codetag init/unload path to create and destroy the kmem caches.

Okay, sounds good. There needs to be a place to track "is this a fixed
size or a run-time size" choice.

> No need to adjust the slab code either; alloc_hooks() itself could
> dispatch to kmem_cache_alloc() instead of kmalloc() if this is in use.

Right, it'd go to either kmem_cache_alloc() directly, or to a modified
kmalloc() that used the passed-in cache is the base for an array of sized
buckets, rather than the global (or 16-way global) buckets.

Yay for the future!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 24/36] rust: Add a rust helper for krealloc()
  2024-02-21 19:40 ` [PATCH v4 24/36] rust: Add a rust helper for krealloc() Suren Baghdasaryan
@ 2024-02-22  9:59   ` Alice Ryhl
  2024-02-23 22:17     ` Suren Baghdasaryan
  0 siblings, 1 reply; 98+ messages in thread
From: Alice Ryhl @ 2024-02-22  9:59 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups,
	Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	rust-for-linux

On Wed, Feb 21, 2024 at 8:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> From: Kent Overstreet <kent.overstreet@linux.dev>
>
> Memory allocation profiling is turning krealloc() into a nontrivial
> macro - so for now, we need a helper for it.
>
> Until we have proper support on the rust side for memory allocation
> profiling this does mean that all Rust allocations will be accounted to
> the helper.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Cc: Miguel Ojeda <ojeda@kernel.org>
> Cc: Alex Gaynor <alex.gaynor@gmail.com>
> Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Gary Guo <gary@garyguo.net>
> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
> Cc: Benno Lossin <benno.lossin@proton.me>
> Cc: Andreas Hindborg <a.hindborg@samsung.com>
> Cc: Alice Ryhl <aliceryhl@google.com>
> Cc: rust-for-linux@vger.kernel.org
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Currently, the Rust build doesn't work throughout the entire series
since there are some commits where krealloc is missing before you
introduce the helper. If you introduce the helper first before
krealloc stops being an exported function, then the Rust build should
work throughout the entire series. (Having both the helper and the
exported function at the same time is not a problem.)

With the patch reordered:

Reviewed-by: Alice Ryhl <aliceryhl@google.com>

Alice

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 06/36] mm: enumerate all gfp flags
  2024-02-21 19:40 ` [PATCH v4 06/36] mm: enumerate all gfp flags Suren Baghdasaryan
  2024-02-21 21:25   ` Pasha Tatashin
@ 2024-02-22 12:12   ` Michal Hocko
  2024-02-22 12:24     ` Petr Tesařík
  1 sibling, 1 reply; 98+ messages in thread
From: Michal Hocko @ 2024-02-22 12:12 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, vbabka, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups,
	Petr Tesařík

On Wed 21-02-24 11:40:19, Suren Baghdasaryan wrote:
> Introduce GFP bits enumeration to let compiler track the number of used
> bits (which depends on the config options) instead of hardcoding them.
> That simplifies __GFP_BITS_SHIFT calculation.
> 
> Suggested-by: Petr Tesařík <petr@tesarici.cz>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>

I thought I have responded to this patch but obviously not the case.
I like this change. Makes sense even without the rest of the series.
Acked-by: Michal Hocko <mhocko@suse.com>

It seems that KASAN flags already __GFP_BITS_SHIFT which just proves how
fragile this existing scheme is.

> ---
>  include/linux/gfp_types.h | 90 +++++++++++++++++++++++++++------------
>  1 file changed, 62 insertions(+), 28 deletions(-)
> 
> diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
> index 1b6053da8754..868c8fb1bbc1 100644
> --- a/include/linux/gfp_types.h
> +++ b/include/linux/gfp_types.h
> @@ -21,44 +21,78 @@ typedef unsigned int __bitwise gfp_t;
>   * include/trace/events/mmflags.h and tools/perf/builtin-kmem.c
>   */
>  
> +enum {
> +	___GFP_DMA_BIT,
> +	___GFP_HIGHMEM_BIT,
> +	___GFP_DMA32_BIT,
> +	___GFP_MOVABLE_BIT,
> +	___GFP_RECLAIMABLE_BIT,
> +	___GFP_HIGH_BIT,
> +	___GFP_IO_BIT,
> +	___GFP_FS_BIT,
> +	___GFP_ZERO_BIT,
> +	___GFP_UNUSED_BIT,	/* 0x200u unused */
> +	___GFP_DIRECT_RECLAIM_BIT,
> +	___GFP_KSWAPD_RECLAIM_BIT,
> +	___GFP_WRITE_BIT,
> +	___GFP_NOWARN_BIT,
> +	___GFP_RETRY_MAYFAIL_BIT,
> +	___GFP_NOFAIL_BIT,
> +	___GFP_NORETRY_BIT,
> +	___GFP_MEMALLOC_BIT,
> +	___GFP_COMP_BIT,
> +	___GFP_NOMEMALLOC_BIT,
> +	___GFP_HARDWALL_BIT,
> +	___GFP_THISNODE_BIT,
> +	___GFP_ACCOUNT_BIT,
> +	___GFP_ZEROTAGS_BIT,
> +#ifdef CONFIG_KASAN_HW_TAGS
> +	___GFP_SKIP_ZERO_BIT,
> +	___GFP_SKIP_KASAN_BIT,
> +#endif
> +#ifdef CONFIG_LOCKDEP
> +	___GFP_NOLOCKDEP_BIT,
> +#endif
> +	___GFP_LAST_BIT
> +};
> +
>  /* Plain integer GFP bitmasks. Do not use this directly. */
> -#define ___GFP_DMA		0x01u
> -#define ___GFP_HIGHMEM		0x02u
> -#define ___GFP_DMA32		0x04u
> -#define ___GFP_MOVABLE		0x08u
> -#define ___GFP_RECLAIMABLE	0x10u
> -#define ___GFP_HIGH		0x20u
> -#define ___GFP_IO		0x40u
> -#define ___GFP_FS		0x80u
> -#define ___GFP_ZERO		0x100u
> +#define ___GFP_DMA		BIT(___GFP_DMA_BIT)
> +#define ___GFP_HIGHMEM		BIT(___GFP_HIGHMEM_BIT)
> +#define ___GFP_DMA32		BIT(___GFP_DMA32_BIT)
> +#define ___GFP_MOVABLE		BIT(___GFP_MOVABLE_BIT)
> +#define ___GFP_RECLAIMABLE	BIT(___GFP_RECLAIMABLE_BIT)
> +#define ___GFP_HIGH		BIT(___GFP_HIGH_BIT)
> +#define ___GFP_IO		BIT(___GFP_IO_BIT)
> +#define ___GFP_FS		BIT(___GFP_FS_BIT)
> +#define ___GFP_ZERO		BIT(___GFP_ZERO_BIT)
>  /* 0x200u unused */
> -#define ___GFP_DIRECT_RECLAIM	0x400u
> -#define ___GFP_KSWAPD_RECLAIM	0x800u
> -#define ___GFP_WRITE		0x1000u
> -#define ___GFP_NOWARN		0x2000u
> -#define ___GFP_RETRY_MAYFAIL	0x4000u
> -#define ___GFP_NOFAIL		0x8000u
> -#define ___GFP_NORETRY		0x10000u
> -#define ___GFP_MEMALLOC		0x20000u
> -#define ___GFP_COMP		0x40000u
> -#define ___GFP_NOMEMALLOC	0x80000u
> -#define ___GFP_HARDWALL		0x100000u
> -#define ___GFP_THISNODE		0x200000u
> -#define ___GFP_ACCOUNT		0x400000u
> -#define ___GFP_ZEROTAGS		0x800000u
> +#define ___GFP_DIRECT_RECLAIM	BIT(___GFP_DIRECT_RECLAIM_BIT)
> +#define ___GFP_KSWAPD_RECLAIM	BIT(___GFP_KSWAPD_RECLAIM_BIT)
> +#define ___GFP_WRITE		BIT(___GFP_WRITE_BIT)
> +#define ___GFP_NOWARN		BIT(___GFP_NOWARN_BIT)
> +#define ___GFP_RETRY_MAYFAIL	BIT(___GFP_RETRY_MAYFAIL_BIT)
> +#define ___GFP_NOFAIL		BIT(___GFP_NOFAIL_BIT)
> +#define ___GFP_NORETRY		BIT(___GFP_NORETRY_BIT)
> +#define ___GFP_MEMALLOC		BIT(___GFP_MEMALLOC_BIT)
> +#define ___GFP_COMP		BIT(___GFP_COMP_BIT)
> +#define ___GFP_NOMEMALLOC	BIT(___GFP_NOMEMALLOC_BIT)
> +#define ___GFP_HARDWALL		BIT(___GFP_HARDWALL_BIT)
> +#define ___GFP_THISNODE		BIT(___GFP_THISNODE_BIT)
> +#define ___GFP_ACCOUNT		BIT(___GFP_ACCOUNT_BIT)
> +#define ___GFP_ZEROTAGS		BIT(___GFP_ZEROTAGS_BIT)
>  #ifdef CONFIG_KASAN_HW_TAGS
> -#define ___GFP_SKIP_ZERO	0x1000000u
> -#define ___GFP_SKIP_KASAN	0x2000000u
> +#define ___GFP_SKIP_ZERO	BIT(___GFP_SKIP_ZERO_BIT)
> +#define ___GFP_SKIP_KASAN	BIT(___GFP_SKIP_KASAN_BIT)
>  #else
>  #define ___GFP_SKIP_ZERO	0
>  #define ___GFP_SKIP_KASAN	0
>  #endif
>  #ifdef CONFIG_LOCKDEP
> -#define ___GFP_NOLOCKDEP	0x4000000u
> +#define ___GFP_NOLOCKDEP	BIT(___GFP_NOLOCKDEP_BIT)
>  #else
>  #define ___GFP_NOLOCKDEP	0
>  #endif
> -/* If the above are modified, __GFP_BITS_SHIFT may need updating */
>  
>  /*
>   * Physical address zone modifiers (see linux/mmzone.h - low four bits)
> @@ -249,7 +283,7 @@ typedef unsigned int __bitwise gfp_t;
>  #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
>  
>  /* Room for N __GFP_FOO bits */
> -#define __GFP_BITS_SHIFT (26 + IS_ENABLED(CONFIG_LOCKDEP))
> +#define __GFP_BITS_SHIFT ___GFP_LAST_BIT
>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>  
>  /**
> -- 
> 2.44.0.rc0.258.g7320e95886-goog

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 06/36] mm: enumerate all gfp flags
  2024-02-22 12:12   ` Michal Hocko
@ 2024-02-22 12:24     ` Petr Tesařík
  2024-02-23 19:26       ` Suren Baghdasaryan
  0 siblings, 1 reply; 98+ messages in thread
From: Petr Tesařík @ 2024-02-22 12:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Suren Baghdasaryan, akpm, kent.overstreet, vbabka, hannes,
	roman.gushchin, mgorman, dave, willy, liam.howlett,
	penguin-kernel, corbet, void, peterz, juri.lelli,
	catalin.marinas, will, arnd, tglx, mingo, dave.hansen, x86,
	peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
	muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
	dhowells, hughd, andreyknvl, keescook, ndesaulniers, vvvvvv,
	gregkh, ebiggers, ytcoode, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, vschneid, cl, penberg, iamjoonsoo.kim,
	42.hyeyoo, glider, elver, dvyukov, shakeelb, songmuchun, jbaron,
	rientjes, minchan, kaleshsingh, kernel-team, linux-doc,
	linux-kernel, iommu, linux-arch, linux-fsdevel, linux-mm,
	linux-modules, kasan-dev, cgroups

On Thu, 22 Feb 2024 13:12:29 +0100
Michal Hocko <mhocko@suse.com> wrote:

> On Wed 21-02-24 11:40:19, Suren Baghdasaryan wrote:
> > Introduce GFP bits enumeration to let compiler track the number of used
> > bits (which depends on the config options) instead of hardcoding them.
> > That simplifies __GFP_BITS_SHIFT calculation.
> > 
> > Suggested-by: Petr Tesařík <petr@tesarici.cz>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > Reviewed-by: Kees Cook <keescook@chromium.org>  
> 
> I thought I have responded to this patch but obviously not the case.
> I like this change. Makes sense even without the rest of the series.
> Acked-by: Michal Hocko <mhocko@suse.com>

Thank you, Michal. I also hope it can be merged without waiting for the
rest of the series.

Petr T

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 06/36] mm: enumerate all gfp flags
  2024-02-22 12:24     ` Petr Tesařík
@ 2024-02-23 19:26       ` Suren Baghdasaryan
  2024-02-24  1:59         ` Suren Baghdasaryan
  0 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-23 19:26 UTC (permalink / raw)
  To: Petr Tesařík
  Cc: Michal Hocko, akpm, kent.overstreet, vbabka, hannes,
	roman.gushchin, mgorman, dave, willy, liam.howlett,
	penguin-kernel, corbet, void, peterz, juri.lelli,
	catalin.marinas, will, arnd, tglx, mingo, dave.hansen, x86,
	peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
	muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
	dhowells, hughd, andreyknvl, keescook, ndesaulniers, vvvvvv,
	gregkh, ebiggers, ytcoode, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, vschneid, cl, penberg, iamjoonsoo.kim,
	42.hyeyoo, glider, elver, dvyukov, shakeelb, songmuchun, jbaron,
	rientjes, minchan, kaleshsingh, kernel-team, linux-doc,
	linux-kernel, iommu, linux-arch, linux-fsdevel, linux-mm,
	linux-modules, kasan-dev, cgroups

On Thu, Feb 22, 2024 at 4:24 AM 'Petr Tesařík' via kernel-team
<kernel-team@android.com> wrote:
>
> On Thu, 22 Feb 2024 13:12:29 +0100
> Michal Hocko <mhocko@suse.com> wrote:
>
> > On Wed 21-02-24 11:40:19, Suren Baghdasaryan wrote:
> > > Introduce GFP bits enumeration to let compiler track the number of used
> > > bits (which depends on the config options) instead of hardcoding them.
> > > That simplifies __GFP_BITS_SHIFT calculation.
> > >
> > > Suggested-by: Petr Tesařík <petr@tesarici.cz>
> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > Reviewed-by: Kees Cook <keescook@chromium.org>
> >
> > I thought I have responded to this patch but obviously not the case.
> > I like this change. Makes sense even without the rest of the series.
> > Acked-by: Michal Hocko <mhocko@suse.com>
>
> Thank you, Michal. I also hope it can be merged without waiting for the
> rest of the series.

Thanks Michal! I can post it separately. With the Ack I don't think it
will delay the rest of the series.
Thanks,
Suren.

>
> Petr T
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 24/36] rust: Add a rust helper for krealloc()
  2024-02-22  9:59   ` Alice Ryhl
@ 2024-02-23 22:17     ` Suren Baghdasaryan
  0 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-23 22:17 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups,
	Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	rust-for-linux

On Thu, Feb 22, 2024 at 2:00 AM Alice Ryhl <aliceryhl@google.com> wrote:
>
> On Wed, Feb 21, 2024 at 8:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > From: Kent Overstreet <kent.overstreet@linux.dev>
> >
> > Memory allocation profiling is turning krealloc() into a nontrivial
> > macro - so for now, we need a helper for it.
> >
> > Until we have proper support on the rust side for memory allocation
> > profiling this does mean that all Rust allocations will be accounted to
> > the helper.
> >
> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> > Cc: Miguel Ojeda <ojeda@kernel.org>
> > Cc: Alex Gaynor <alex.gaynor@gmail.com>
> > Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Cc: Gary Guo <gary@garyguo.net>
> > Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
> > Cc: Benno Lossin <benno.lossin@proton.me>
> > Cc: Andreas Hindborg <a.hindborg@samsung.com>
> > Cc: Alice Ryhl <aliceryhl@google.com>
> > Cc: rust-for-linux@vger.kernel.org
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>
> Currently, the Rust build doesn't work throughout the entire series
> since there are some commits where krealloc is missing before you
> introduce the helper. If you introduce the helper first before
> krealloc stops being an exported function, then the Rust build should
> work throughout the entire series. (Having both the helper and the
> exported function at the same time is not a problem.)

Ack. I'll move it up in the series.

>
> With the patch reordered:
>
> Reviewed-by: Alice Ryhl <aliceryhl@google.com>

Thanks Alice!

>
> Alice

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 06/36] mm: enumerate all gfp flags
  2024-02-23 19:26       ` Suren Baghdasaryan
@ 2024-02-24  1:59         ` Suren Baghdasaryan
  0 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-24  1:59 UTC (permalink / raw)
  To: Petr Tesařík
  Cc: Michal Hocko, akpm, kent.overstreet, vbabka, hannes,
	roman.gushchin, mgorman, dave, willy, liam.howlett,
	penguin-kernel, corbet, void, peterz, juri.lelli,
	catalin.marinas, will, arnd, tglx, mingo, dave.hansen, x86,
	peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
	muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
	dhowells, hughd, andreyknvl, keescook, ndesaulniers, vvvvvv,
	gregkh, ebiggers, ytcoode, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, vschneid, cl, penberg, iamjoonsoo.kim,
	42.hyeyoo, glider, elver, dvyukov, shakeelb, songmuchun, jbaron,
	rientjes, minchan, kaleshsingh, kernel-team, linux-doc,
	linux-kernel, iommu, linux-arch, linux-fsdevel, linux-mm,
	linux-modules, kasan-dev, cgroups

On Fri, Feb 23, 2024 at 11:26 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Feb 22, 2024 at 4:24 AM 'Petr Tesařík' via kernel-team
> <kernel-team@android.com> wrote:
> >
> > On Thu, 22 Feb 2024 13:12:29 +0100
> > Michal Hocko <mhocko@suse.com> wrote:
> >
> > > On Wed 21-02-24 11:40:19, Suren Baghdasaryan wrote:
> > > > Introduce GFP bits enumeration to let compiler track the number of used
> > > > bits (which depends on the config options) instead of hardcoding them.
> > > > That simplifies __GFP_BITS_SHIFT calculation.
> > > >
> > > > Suggested-by: Petr Tesařík <petr@tesarici.cz>
> > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > > Reviewed-by: Kees Cook <keescook@chromium.org>
> > >
> > > I thought I have responded to this patch but obviously not the case.
> > > I like this change. Makes sense even without the rest of the series.
> > > Acked-by: Michal Hocko <mhocko@suse.com>
> >
> > Thank you, Michal. I also hope it can be merged without waiting for the
> > rest of the series.
>
> Thanks Michal! I can post it separately. With the Ack I don't think it
> will delay the rest of the series.

Stand-alone version is posted as v5 here:
https://lore.kernel.org/all/20240224015800.2569851-1-surenb@google.com/

> Thanks,
> Suren.
>
> >
> > Petr T
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline
  2024-02-21 21:15   ` Pasha Tatashin
@ 2024-02-24  2:02     ` Suren Baghdasaryan
  2024-02-26 14:31       ` Vlastimil Babka
  0 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-24  2:02 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, yosryahmed,
	yuzhao, dhowells, hughd, andreyknvl, keescook, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 21, 2024 at 1:16 PM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > From: Kent Overstreet <kent.overstreet@linux.dev>
> >
> > It seems we need to be more forceful with the compiler on this one.
> > This is done for performance reasons only.
> >
> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > Reviewed-by: Kees Cook <keescook@chromium.org>
> > ---
> >  mm/slub.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 2ef88bbf56a3..d31b03a8d9d5 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2121,7 +2121,7 @@ bool slab_free_hook(struct kmem_cache *s, void *x, bool init)
> >         return !kasan_slab_free(s, x, init);
> >  }
> >
> > -static inline bool slab_free_freelist_hook(struct kmem_cache *s,
> > +static __always_inline bool slab_free_freelist_hook(struct kmem_cache *s,
>
> __fastpath_inline seems to me more appropriate here. It prioritizes
> memory vs performance.

Hmm. AFAIKT this function is used only in one place and we do not add
any additional users, so I don't think changing to __fastpath_inline
here would gain us anything.

>
> >                                            void **head, void **tail,
> >                                            int *cnt)
> >  {
> > --
> > 2.44.0.rc0.258.g7320e95886-goog
> >

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline
  2024-02-24  2:02     ` Suren Baghdasaryan
@ 2024-02-26 14:31       ` Vlastimil Babka
  2024-02-26 15:21         ` Pasha Tatashin
  0 siblings, 1 reply; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-26 14:31 UTC (permalink / raw)
  To: Suren Baghdasaryan, Pasha Tatashin
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, yosryahmed,
	yuzhao, dhowells, hughd, andreyknvl, keescook, ndesaulniers,
	vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/24/24 03:02, Suren Baghdasaryan wrote:
> On Wed, Feb 21, 2024 at 1:16 PM Pasha Tatashin
> <pasha.tatashin@soleen.com> wrote:
>>
>> On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
>> >
>> > From: Kent Overstreet <kent.overstreet@linux.dev>
>> >
>> > It seems we need to be more forceful with the compiler on this one.
>> > This is done for performance reasons only.
>> >
>> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
>> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>> > Reviewed-by: Kees Cook <keescook@chromium.org>
>> > ---
>> >  mm/slub.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/mm/slub.c b/mm/slub.c
>> > index 2ef88bbf56a3..d31b03a8d9d5 100644
>> > --- a/mm/slub.c
>> > +++ b/mm/slub.c
>> > @@ -2121,7 +2121,7 @@ bool slab_free_hook(struct kmem_cache *s, void *x, bool init)
>> >         return !kasan_slab_free(s, x, init);
>> >  }
>> >
>> > -static inline bool slab_free_freelist_hook(struct kmem_cache *s,
>> > +static __always_inline bool slab_free_freelist_hook(struct kmem_cache *s,
>>
>> __fastpath_inline seems to me more appropriate here. It prioritizes
>> memory vs performance.
> 
> Hmm. AFAIKT this function is used only in one place and we do not add
> any additional users, so I don't think changing to __fastpath_inline
> here would gain us anything.

It would have been more future-proof and self-documenting. But I don't insist.

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

>>
>> >                                            void **head, void **tail,
>> >                                            int *cnt)
>> >  {
>> > --
>> > 2.44.0.rc0.258.g7320e95886-goog
>> >


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline
  2024-02-26 14:31       ` Vlastimil Babka
@ 2024-02-26 15:21         ` Pasha Tatashin
  2024-02-26 16:09           ` Suren Baghdasaryan
  0 siblings, 1 reply; 98+ messages in thread
From: Pasha Tatashin @ 2024-02-26 15:21 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, Andrew Morton, Kent Overstreet, Michal Hocko,
	Johannes Weiner, Roman Gushchin, Mel Gorman, dave,
	Matthew Wilcox, Liam R. Howlett, Tetsuo Handa, Jonathan Corbet,
	void, Peter Zijlstra, Juri Lelli, Catalin Marinas, Will Deacon,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Peter Xu, David Hildenbrand, Jens Axboe, mcgrof, Masahiro Yamada,
	Nathan Chancellor, dennis, Tejun Heo, Muchun Song, Mike Rapoport,
	paulmck, Yosry Ahmed, Yu Zhao, dhowells, Hugh Dickins,
	andreyknvl, Kees Cook, ndesaulniers, vvvvvv, Greg Kroah-Hartman,
	ebiggers, ytcoode, vincent.guittot, dietmar.eggemann,
	Steven Rostedt, bsegall, bristot, vschneid, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Hyeonggon Yoo, Alexander Potapenko,
	elver, dvyukov, Shakeel Butt, Muchun Song, jbaron,
	David Rientjes, minchan, kaleshsingh, kernel-team,
	Linux Doc Mailing List, LKML, iommu,
	open list:GENERIC INCLUDE/ASM HEADER FILES, linux-fsdevel,
	linux-mm, linux-modules, kasan-dev, cgroups

[-- Attachment #1: Type: text/plain, Size: 2091 bytes --]

On Mon, Feb 26, 2024, 9:31 AM Vlastimil Babka <vbabka@suse.cz> wrote:

> On 2/24/24 03:02, Suren Baghdasaryan wrote:
> > On Wed, Feb 21, 2024 at 1:16 PM Pasha Tatashin
> > <pasha.tatashin@soleen.com> wrote:
> >>
> >> On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@google.com>
> wrote:
> >> >
> >> > From: Kent Overstreet <kent.overstreet@linux.dev>
> >> >
> >> > It seems we need to be more forceful with the compiler on this one.
> >> > This is done for performance reasons only.
> >> >
> >> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> >> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> >> > Reviewed-by: Kees Cook <keescook@chromium.org>
> >> > ---
> >> >  mm/slub.c | 2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/mm/slub.c b/mm/slub.c
> >> > index 2ef88bbf56a3..d31b03a8d9d5 100644
> >> > --- a/mm/slub.c
> >> > +++ b/mm/slub.c
> >> > @@ -2121,7 +2121,7 @@ bool slab_free_hook(struct kmem_cache *s, void
> *x, bool init)
> >> >         return !kasan_slab_free(s, x, init);
> >> >  }
> >> >
> >> > -static inline bool slab_free_freelist_hook(struct kmem_cache *s,
> >> > +static __always_inline bool slab_free_freelist_hook(struct
> kmem_cache *s,
> >>
> >> __fastpath_inline seems to me more appropriate here. It prioritizes
> >> memory vs performance.
> >
> > Hmm. AFAIKT this function is used only in one place and we do not add
> > any additional users, so I don't think changing to __fastpath_inline
> > here would gain us anything.
>

For consistency __fastpath_inline makes more sense, but I am ok with or
without this change.

Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>


> It would have been more future-proof and self-documenting. But I don't
> insist.
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>
> >>
> >> >                                            void **head, void **tail,
> >> >                                            int *cnt)
> >> >  {
> >> > --
> >> > 2.44.0.rc0.258.g7320e95886-goog
> >> >
>
>

[-- Attachment #2: Type: text/html, Size: 3729 bytes --]

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 05/36] fs: Convert alloc_inode_sb() to a macro
  2024-02-21 19:40 ` [PATCH v4 05/36] fs: Convert alloc_inode_sb() to a macro Suren Baghdasaryan
  2024-02-21 21:23   ` Pasha Tatashin
@ 2024-02-26 15:44   ` Vlastimil Babka
  2024-02-26 17:48     ` Suren Baghdasaryan
  2024-02-26 20:50     ` Kent Overstreet
  1 sibling, 2 replies; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-26 15:44 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups,
	Alexander Viro

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> From: Kent Overstreet <kent.overstreet@linux.dev>
> 
> We're introducing alloc tagging, which tracks memory allocations by
> callsite. Converting alloc_inode_sb() to a macro means allocations will
> be tracked by its caller, which is a bit more useful.
> 
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> ---
>  include/linux/fs.h | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 023f37c60709..08d8246399c3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3010,11 +3010,7 @@ int setattr_should_drop_sgid(struct mnt_idmap *idmap,
>   * This must be used for allocating filesystems specific inodes to set
>   * up the inode reclaim context correctly.
>   */
> -static inline void *
> -alloc_inode_sb(struct super_block *sb, struct kmem_cache *cache, gfp_t gfp)

A __always_inline wouldn't have the same effect? Just wondering.

> -{
> -	return kmem_cache_alloc_lru(cache, &sb->s_inode_lru, gfp);
> -}
> +#define alloc_inode_sb(_sb, _cache, _gfp) kmem_cache_alloc_lru(_cache, &_sb->s_inode_lru, _gfp)
>  
>  extern void __insert_inode_hash(struct inode *, unsigned long hashval);
>  static inline void insert_inode_hash(struct inode *inode)


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline
  2024-02-26 15:21         ` Pasha Tatashin
@ 2024-02-26 16:09           ` Suren Baghdasaryan
  0 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-26 16:09 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: Vlastimil Babka, Andrew Morton, Kent Overstreet, Michal Hocko,
	Johannes Weiner, Roman Gushchin, Mel Gorman, dave,
	Matthew Wilcox, Liam R. Howlett, Tetsuo Handa, Jonathan Corbet,
	void, Peter Zijlstra, Juri Lelli, Catalin Marinas, Will Deacon,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Peter Xu, David Hildenbrand, Jens Axboe, mcgrof, Masahiro Yamada,
	Nathan Chancellor, dennis, Tejun Heo, Muchun Song, Mike Rapoport,
	paulmck, Yosry Ahmed, Yu Zhao, dhowells, Hugh Dickins,
	andreyknvl, Kees Cook, ndesaulniers, vvvvvv, Greg Kroah-Hartman,
	ebiggers, ytcoode, vincent.guittot, dietmar.eggemann,
	Steven Rostedt, bsegall, bristot, vschneid, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Hyeonggon Yoo, Alexander Potapenko,
	elver, dvyukov, Shakeel Butt, Muchun Song, jbaron,
	David Rientjes, minchan, kaleshsingh, kernel-team,
	Linux Doc Mailing List, LKML, iommu,
	open list:GENERIC INCLUDE/ASM HEADER FILES, linux-fsdevel,
	linux-mm, linux-modules, kasan-dev, cgroups

On Mon, Feb 26, 2024 at 7:21 AM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
>
>
> On Mon, Feb 26, 2024, 9:31 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 2/24/24 03:02, Suren Baghdasaryan wrote:
>> > On Wed, Feb 21, 2024 at 1:16 PM Pasha Tatashin
>> > <pasha.tatashin@soleen.com> wrote:
>> >>
>> >> On Wed, Feb 21, 2024 at 2:41 PM Suren Baghdasaryan <surenb@google.com> wrote:
>> >> >
>> >> > From: Kent Overstreet <kent.overstreet@linux.dev>
>> >> >
>> >> > It seems we need to be more forceful with the compiler on this one.
>> >> > This is done for performance reasons only.
>> >> >
>> >> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
>> >> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>> >> > Reviewed-by: Kees Cook <keescook@chromium.org>
>> >> > ---
>> >> >  mm/slub.c | 2 +-
>> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/mm/slub.c b/mm/slub.c
>> >> > index 2ef88bbf56a3..d31b03a8d9d5 100644
>> >> > --- a/mm/slub.c
>> >> > +++ b/mm/slub.c
>> >> > @@ -2121,7 +2121,7 @@ bool slab_free_hook(struct kmem_cache *s, void *x, bool init)
>> >> >         return !kasan_slab_free(s, x, init);
>> >> >  }
>> >> >
>> >> > -static inline bool slab_free_freelist_hook(struct kmem_cache *s,
>> >> > +static __always_inline bool slab_free_freelist_hook(struct kmem_cache *s,
>> >>
>> >> __fastpath_inline seems to me more appropriate here. It prioritizes
>> >> memory vs performance.
>> >
>> > Hmm. AFAIKT this function is used only in one place and we do not add
>> > any additional users, so I don't think changing to __fastpath_inline
>> > here would gain us anything.
>
>
> For consistency __fastpath_inline makes more sense, but I am ok with or without this change.

Ok, I'll update in the next revision. Thanks!

>
> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>
>>
>> It would have been more future-proof and self-documenting. But I don't insist.
>>
>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>
>> >>
>> >> >                                            void **head, void **tail,
>> >> >                                            int *cnt)
>> >> >  {
>> >> > --
>> >> > 2.44.0.rc0.258.g7320e95886-goog
>> >> >
>>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 07/36] mm: introduce slabobj_ext to support slab object extensions
  2024-02-21 19:40 ` [PATCH v4 07/36] mm: introduce slabobj_ext to support slab object extensions Suren Baghdasaryan
  2024-02-21 21:30   ` Pasha Tatashin
@ 2024-02-26 16:26   ` Vlastimil Babka
  2024-02-26 17:22     ` Suren Baghdasaryan
  1 sibling, 1 reply; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-26 16:26 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> Currently slab pages can store only vectors of obj_cgroup pointers in
> page->memcg_data. Introduce slabobj_ext structure to allow more data
> to be stored for each slab object. Wrap obj_cgroup into slabobj_ext
> to support current functionality while allowing to extend slabobj_ext
> in the future.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Hi, mostly good from slab perspective, just some fixups:

> --- a/mm/slab.h
> +++ b/mm/slab.h
> -int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
> -				 gfp_t gfp, bool new_slab);
> -void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
> -		     enum node_stat_item idx, int nr);
> -#else /* CONFIG_MEMCG_KMEM */
> -static inline struct obj_cgroup **slab_objcgs(struct slab *slab)
> +int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> +			gfp_t gfp, bool new_slab);
>

We could remove this declaration and make the function static in mm/slub.c.

> +#else /* CONFIG_SLAB_OBJ_EXT */
> +
> +static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
>  {
>  	return NULL;
>  }
>  
> -static inline int memcg_alloc_slab_cgroups(struct slab *slab,
> -					       struct kmem_cache *s, gfp_t gfp,
> -					       bool new_slab)
> +static inline int alloc_slab_obj_exts(struct slab *slab,
> +				      struct kmem_cache *s, gfp_t gfp,
> +				      bool new_slab)
>  {
>  	return 0;
>  }

Ditto

> -#endif /* CONFIG_MEMCG_KMEM */
> +
> +static inline struct slabobj_ext *
> +prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
> +{
> +	return NULL;
> +}

Same here (and the definition and usage even happens in later patch).

> +#endif /* CONFIG_SLAB_OBJ_EXT */
> +
> +#ifdef CONFIG_MEMCG_KMEM
> +void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
> +		     enum node_stat_item idx, int nr);
> +#endif
>  
>  size_t __ksize(const void *objp);
>  
> diff --git a/mm/slub.c b/mm/slub.c
> index d31b03a8d9d5..76fb600fbc80 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -683,10 +683,10 @@ static inline bool __slab_update_freelist(struct kmem_cache *s, struct slab *sla
>  
>  	if (s->flags & __CMPXCHG_DOUBLE) {
>  		ret = __update_freelist_fast(slab, freelist_old, counters_old,
> -				            freelist_new, counters_new);
> +					    freelist_new, counters_new);
>  	} else {
>  		ret = __update_freelist_slow(slab, freelist_old, counters_old,
> -				            freelist_new, counters_new);
> +					    freelist_new, counters_new);
>  	}
>  	if (likely(ret))
>  		return true;
> @@ -710,13 +710,13 @@ static inline bool slab_update_freelist(struct kmem_cache *s, struct slab *slab,
>  
>  	if (s->flags & __CMPXCHG_DOUBLE) {
>  		ret = __update_freelist_fast(slab, freelist_old, counters_old,
> -				            freelist_new, counters_new);
> +					    freelist_new, counters_new);
>  	} else {
>  		unsigned long flags;
>  
>  		local_irq_save(flags);
>  		ret = __update_freelist_slow(slab, freelist_old, counters_old,
> -				            freelist_new, counters_new);
> +					     freelist_new, counters_new);

Please no drive-by fixups of whitespace in code you're not actually
changing. I thought you agreed in v3?

>  static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
>  					     struct list_lru *lru,
>  					     struct obj_cgroup **objcgp,
> @@ -2314,7 +2364,7 @@ static __always_inline void account_slab(struct slab *slab, int order,
>  					 struct kmem_cache *s, gfp_t gfp)
>  {
>  	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> -		memcg_alloc_slab_cgroups(slab, s, gfp, true);
> +		alloc_slab_obj_exts(slab, s, gfp, true);

This is still guarded by the memcg_kmem_online() static key, which is good.

>  
>  	mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
>  			    PAGE_SIZE << order);
> @@ -2323,8 +2373,7 @@ static __always_inline void account_slab(struct slab *slab, int order,
>  static __always_inline void unaccount_slab(struct slab *slab, int order,
>  					   struct kmem_cache *s)
>  {
> -	if (memcg_kmem_online())
> -		memcg_free_slab_cgroups(slab);
> +	free_slab_obj_exts(slab);

But this no longer is, yet it still could be?

>  
>  	mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
>  			    -(PAGE_SIZE << order));


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 08/36] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation
  2024-02-21 19:40 ` [PATCH v4 08/36] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation Suren Baghdasaryan
  2024-02-22  0:08   ` Pasha Tatashin
@ 2024-02-26 16:51   ` Vlastimil Babka
  1 sibling, 0 replies; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-26 16:51 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> Introduce __GFP_NO_OBJ_EXT flag in order to prevent recursive allocations
> when allocating slabobj_ext on a slab.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 09/36] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation
  2024-02-21 19:40 ` [PATCH v4 09/36] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation Suren Baghdasaryan
  2024-02-22  0:09   ` Pasha Tatashin
@ 2024-02-26 16:52   ` Vlastimil Babka
  1 sibling, 0 replies; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-26 16:52 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> Slab extension objects can't be allocated before slab infrastructure is
> initialized. Some caches, like kmem_cache and kmem_cache_node, are created
> before slab infrastructure is initialized. Objects from these caches can't
> have extension objects. Introduce SLAB_NO_OBJ_EXT slab flag to mark these
> caches and avoid creating extensions for objects allocated from these
> slabs.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 10/36] slab: objext: introduce objext_flags as extension to page_memcg_data_flags
  2024-02-21 19:40 ` [PATCH v4 10/36] slab: objext: introduce objext_flags as extension to page_memcg_data_flags Suren Baghdasaryan
  2024-02-22  0:12   ` Pasha Tatashin
@ 2024-02-26 16:53   ` Vlastimil Babka
  1 sibling, 0 replies; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-26 16:53 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> Introduce objext_flags to store additional objext flags unrelated to memcg.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 13/36] lib: prevent module unloading if memory is not freed
  2024-02-21 19:40 ` [PATCH v4 13/36] lib: prevent module unloading if memory is not freed Suren Baghdasaryan
@ 2024-02-26 16:58   ` Vlastimil Babka
  2024-02-26 17:13     ` Suren Baghdasaryan
  2024-03-12 18:23     ` Luis Chamberlain
  0 siblings, 2 replies; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-26 16:58 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> Skip freeing module's data section if there are non-zero allocation tags
> because otherwise, once these allocations are freed, the access to their
> code tag would cause UAF.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

I know that module unloading was never considered really supported etc.
But should we printk something so the admin knows why it didn't unload, and
can go check those outstanding allocations?


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 15/36] lib: introduce support for page allocation tagging
  2024-02-21 19:40 ` [PATCH v4 15/36] lib: introduce support for page allocation tagging Suren Baghdasaryan
@ 2024-02-26 17:07   ` Vlastimil Babka
  2024-02-26 17:11     ` Suren Baghdasaryan
  0 siblings, 1 reply; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-26 17:07 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> Introduce helper functions to easily instrument page allocators by
> storing a pointer to the allocation tag associated with the code that
> allocated the page in a page_ext field.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

The static key usage seems fine now. Even if the page_ext overhead is still
always paid when compiled in, you mention in the cover letter there's a plan
for boot-time toggle later, so

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 15/36] lib: introduce support for page allocation tagging
  2024-02-26 17:07   ` Vlastimil Babka
@ 2024-02-26 17:11     ` Suren Baghdasaryan
  2024-02-27  9:30       ` Vlastimil Babka
  0 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-26 17:11 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Mon, Feb 26, 2024 at 9:07 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > Introduce helper functions to easily instrument page allocators by
> > storing a pointer to the allocation tag associated with the code that
> > allocated the page in a page_ext field.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
>
> The static key usage seems fine now. Even if the page_ext overhead is still
> always paid when compiled in, you mention in the cover letter there's a plan
> for boot-time toggle later, so

Yes, I already have a simple patch for that to be included in the next
revision: https://github.com/torvalds/linux/commit/7ca367e80232345f471b77b3ea71cf82faf50954

>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!

>
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 13/36] lib: prevent module unloading if memory is not freed
  2024-02-26 16:58   ` Vlastimil Babka
@ 2024-02-26 17:13     ` Suren Baghdasaryan
  2024-03-12 18:23     ` Luis Chamberlain
  1 sibling, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-26 17:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Mon, Feb 26, 2024 at 8:58 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > Skip freeing module's data section if there are non-zero allocation tags
> > because otherwise, once these allocations are freed, the access to their
> > code tag would cause UAF.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>
> I know that module unloading was never considered really supported etc.
> But should we printk something so the admin knows why it didn't unload, and
> can go check those outstanding allocations?

Yes, that sounds reasonable. I'll add a pr_warn() in the next version.
Thanks!

>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 07/36] mm: introduce slabobj_ext to support slab object extensions
  2024-02-26 16:26   ` Vlastimil Babka
@ 2024-02-26 17:22     ` Suren Baghdasaryan
  0 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-26 17:22 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Mon, Feb 26, 2024 at 8:26 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > Currently slab pages can store only vectors of obj_cgroup pointers in
> > page->memcg_data. Introduce slabobj_ext structure to allow more data
> > to be stored for each slab object. Wrap obj_cgroup into slabobj_ext
> > to support current functionality while allowing to extend slabobj_ext
> > in the future.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>
> Hi, mostly good from slab perspective, just some fixups:
>
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > -int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
> > -                              gfp_t gfp, bool new_slab);
> > -void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
> > -                  enum node_stat_item idx, int nr);
> > -#else /* CONFIG_MEMCG_KMEM */
> > -static inline struct obj_cgroup **slab_objcgs(struct slab *slab)
> > +int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > +                     gfp_t gfp, bool new_slab);
> >
>
> We could remove this declaration and make the function static in mm/slub.c.

Ack.

>
> > +#else /* CONFIG_SLAB_OBJ_EXT */
> > +
> > +static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
> >  {
> >       return NULL;
> >  }
> >
> > -static inline int memcg_alloc_slab_cgroups(struct slab *slab,
> > -                                            struct kmem_cache *s, gfp_t gfp,
> > -                                            bool new_slab)
> > +static inline int alloc_slab_obj_exts(struct slab *slab,
> > +                                   struct kmem_cache *s, gfp_t gfp,
> > +                                   bool new_slab)
> >  {
> >       return 0;
> >  }
>
> Ditto

Ack.

>
> > -#endif /* CONFIG_MEMCG_KMEM */
> > +
> > +static inline struct slabobj_ext *
> > +prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
> > +{
> > +     return NULL;
> > +}
>
> Same here (and the definition and usage even happens in later patch).

Ack.

>
> > +#endif /* CONFIG_SLAB_OBJ_EXT */
> > +
> > +#ifdef CONFIG_MEMCG_KMEM
> > +void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
> > +                  enum node_stat_item idx, int nr);
> > +#endif
> >
> >  size_t __ksize(const void *objp);
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index d31b03a8d9d5..76fb600fbc80 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -683,10 +683,10 @@ static inline bool __slab_update_freelist(struct kmem_cache *s, struct slab *sla
> >
> >       if (s->flags & __CMPXCHG_DOUBLE) {
> >               ret = __update_freelist_fast(slab, freelist_old, counters_old,
> > -                                         freelist_new, counters_new);
> > +                                         freelist_new, counters_new);
> >       } else {
> >               ret = __update_freelist_slow(slab, freelist_old, counters_old,
> > -                                         freelist_new, counters_new);
> > +                                         freelist_new, counters_new);
> >       }
> >       if (likely(ret))
> >               return true;
> > @@ -710,13 +710,13 @@ static inline bool slab_update_freelist(struct kmem_cache *s, struct slab *slab,
> >
> >       if (s->flags & __CMPXCHG_DOUBLE) {
> >               ret = __update_freelist_fast(slab, freelist_old, counters_old,
> > -                                         freelist_new, counters_new);
> > +                                         freelist_new, counters_new);
> >       } else {
> >               unsigned long flags;
> >
> >               local_irq_save(flags);
> >               ret = __update_freelist_slow(slab, freelist_old, counters_old,
> > -                                         freelist_new, counters_new);
> > +                                          freelist_new, counters_new);
>
> Please no drive-by fixups of whitespace in code you're not actually
> changing. I thought you agreed in v3?

Sorry, I must have misunderstood your previous comment. I thought you
were saying that the alignment I changed to was incorrect. I'll keep
them untouched.


>
> >  static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
> >                                            struct list_lru *lru,
> >                                            struct obj_cgroup **objcgp,
> > @@ -2314,7 +2364,7 @@ static __always_inline void account_slab(struct slab *slab, int order,
> >                                        struct kmem_cache *s, gfp_t gfp)
> >  {
> >       if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> > -             memcg_alloc_slab_cgroups(slab, s, gfp, true);
> > +             alloc_slab_obj_exts(slab, s, gfp, true);
>
> This is still guarded by the memcg_kmem_online() static key, which is good.
>
> >
> >       mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> >                           PAGE_SIZE << order);
> > @@ -2323,8 +2373,7 @@ static __always_inline void account_slab(struct slab *slab, int order,
> >  static __always_inline void unaccount_slab(struct slab *slab, int order,
> >                                          struct kmem_cache *s)
> >  {
> > -     if (memcg_kmem_online())
> > -             memcg_free_slab_cgroups(slab);
> > +     free_slab_obj_exts(slab);
>
> But this no longer is, yet it still could be?

Yes, I missed that, it seems. free_slab_obj_exts() would bail out but
still checking the static key is more efficient. I'll revive this
check.

Thanks for the review!
Suren.

>
> >
> >       mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> >                           -(PAGE_SIZE << order));
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 05/36] fs: Convert alloc_inode_sb() to a macro
  2024-02-26 15:44   ` Vlastimil Babka
@ 2024-02-26 17:48     ` Suren Baghdasaryan
  2024-02-26 20:50     ` Kent Overstreet
  1 sibling, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-26 17:48 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups,
	Alexander Viro

On Mon, Feb 26, 2024 at 7:44 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > From: Kent Overstreet <kent.overstreet@linux.dev>
> >
> > We're introducing alloc tagging, which tracks memory allocations by
> > callsite. Converting alloc_inode_sb() to a macro means allocations will
> > be tracked by its caller, which is a bit more useful.
> >
> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Reviewed-by: Kees Cook <keescook@chromium.org>
> > ---
> >  include/linux/fs.h | 6 +-----
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> >
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 023f37c60709..08d8246399c3 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -3010,11 +3010,7 @@ int setattr_should_drop_sgid(struct mnt_idmap *idmap,
> >   * This must be used for allocating filesystems specific inodes to set
> >   * up the inode reclaim context correctly.
> >   */
> > -static inline void *
> > -alloc_inode_sb(struct super_block *sb, struct kmem_cache *cache, gfp_t gfp)
>
> A __always_inline wouldn't have the same effect? Just wondering.

I think inlining it would still keep __LINE__ and __FILE__ pointing to
this location in the header instead of the location where the call
happens. If we change alloc_inode_sb() to inline we will have to wrap
it with alloc_hook() and call kmem_cache_alloc_lru_noprof() inside it.
Doable but this change seems much simpler.

>
> > -{
> > -     return kmem_cache_alloc_lru(cache, &sb->s_inode_lru, gfp);
> > -}
> > +#define alloc_inode_sb(_sb, _cache, _gfp) kmem_cache_alloc_lru(_cache, &_sb->s_inode_lru, _gfp)
> >
> >  extern void __insert_inode_hash(struct inode *, unsigned long hashval);
> >  static inline void insert_inode_hash(struct inode *inode)
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 05/36] fs: Convert alloc_inode_sb() to a macro
  2024-02-26 15:44   ` Vlastimil Babka
  2024-02-26 17:48     ` Suren Baghdasaryan
@ 2024-02-26 20:50     ` Kent Overstreet
  1 sibling, 0 replies; 98+ messages in thread
From: Kent Overstreet @ 2024-02-26 20:50 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, akpm, mhocko, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups,
	Alexander Viro

On Mon, Feb 26, 2024 at 04:44:51PM +0100, Vlastimil Babka wrote:
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > From: Kent Overstreet <kent.overstreet@linux.dev>
> > 
> > We're introducing alloc tagging, which tracks memory allocations by
> > callsite. Converting alloc_inode_sb() to a macro means allocations will
> > be tracked by its caller, which is a bit more useful.
> > 
> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Reviewed-by: Kees Cook <keescook@chromium.org>
> > ---
> >  include/linux/fs.h | 6 +-----
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 023f37c60709..08d8246399c3 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -3010,11 +3010,7 @@ int setattr_should_drop_sgid(struct mnt_idmap *idmap,
> >   * This must be used for allocating filesystems specific inodes to set
> >   * up the inode reclaim context correctly.
> >   */
> > -static inline void *
> > -alloc_inode_sb(struct super_block *sb, struct kmem_cache *cache, gfp_t gfp)
> 
> A __always_inline wouldn't have the same effect? Just wondering.

nope, macro expansion within an inline happens once, and will show
__func__ and __line__ of the helper, we want it expanded in the caller

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 15/36] lib: introduce support for page allocation tagging
  2024-02-26 17:11     ` Suren Baghdasaryan
@ 2024-02-27  9:30       ` Vlastimil Babka
  2024-02-27  9:45         ` Kent Overstreet
  2024-02-27 16:55         ` Suren Baghdasaryan
  0 siblings, 2 replies; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-27  9:30 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups



On 2/26/24 18:11, Suren Baghdasaryan wrote:
> On Mon, Feb 26, 2024 at 9:07 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 2/21/24 20:40, Suren Baghdasaryan wrote:
>>> Introduce helper functions to easily instrument page allocators by
>>> storing a pointer to the allocation tag associated with the code that
>>> allocated the page in a page_ext field.
>>>
>>> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>>> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
>>> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
>>
>> The static key usage seems fine now. Even if the page_ext overhead is still
>> always paid when compiled in, you mention in the cover letter there's a plan
>> for boot-time toggle later, so
> 
> Yes, I already have a simple patch for that to be included in the next
> revision: https://github.com/torvalds/linux/commit/7ca367e80232345f471b77b3ea71cf82faf50954

This opt-out logic would require a distro kernel with allocation
profiling compiled-in to ship together with something that modifies
kernel command line to disable it by default, so it's not very
practical. Could the CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT be
turned into having 3 possible choices, where one of them would
initialize mem_profiling_enabled to false?

Or, taking a step back, is it going to be a common usecase to pay the
memory overhead unconditionally, but only enable the profiling later
during runtime? Also what happens if someone would enable and disable it
multiple times during one boot? Would the statistics get all skewed
because some frees would be not accounted while it's disabled?

>>
>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Thanks!
> 
>>
>>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 15/36] lib: introduce support for page allocation tagging
  2024-02-27  9:30       ` Vlastimil Babka
@ 2024-02-27  9:45         ` Kent Overstreet
  2024-02-27 16:55         ` Suren Baghdasaryan
  1 sibling, 0 replies; 98+ messages in thread
From: Kent Overstreet @ 2024-02-27  9:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, akpm, mhocko, hannes, roman.gushchin,
	mgorman, dave, willy, liam.howlett, penguin-kernel, corbet, void,
	peterz, juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Tue, Feb 27, 2024 at 10:30:53AM +0100, Vlastimil Babka wrote:
> 
> 
> On 2/26/24 18:11, Suren Baghdasaryan wrote:
> > On Mon, Feb 26, 2024 at 9:07 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >>
> >> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> >>> Introduce helper functions to easily instrument page allocators by
> >>> storing a pointer to the allocation tag associated with the code that
> >>> allocated the page in a page_ext field.
> >>>
> >>> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> >>> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
> >>> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> >>
> >> The static key usage seems fine now. Even if the page_ext overhead is still
> >> always paid when compiled in, you mention in the cover letter there's a plan
> >> for boot-time toggle later, so
> > 
> > Yes, I already have a simple patch for that to be included in the next
> > revision: https://github.com/torvalds/linux/commit/7ca367e80232345f471b77b3ea71cf82faf50954
> 
> This opt-out logic would require a distro kernel with allocation
> profiling compiled-in to ship together with something that modifies
> kernel command line to disable it by default, so it's not very
> practical. Could the CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT be
> turned into having 3 possible choices, where one of them would
> initialize mem_profiling_enabled to false?
> 
> Or, taking a step back, is it going to be a common usecase to pay the
> memory overhead unconditionally, but only enable the profiling later
> during runtime? Also what happens if someone would enable and disable it
> multiple times during one boot? Would the statistics get all skewed
> because some frees would be not accounted while it's disabled?

I already wrote the code for fast lookup from codetag index -> codetag -
i.e. pointer compression - so this is all going away shortly.

It just won't be in the initial pull request because of other
dependencies (it requires my eytzinger code, which I was already lifting
from fs/bcachefs/ for 6.9), but it can still probably make 6.9 in a
second smaller pull.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 19/36] mm: create new codetag references during page splitting
  2024-02-21 19:40 ` [PATCH v4 19/36] mm: create new codetag references during page splitting Suren Baghdasaryan
@ 2024-02-27 10:11   ` Vlastimil Babka
  2024-02-27 16:38     ` Suren Baghdasaryan
  0 siblings, 1 reply; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-27 10:11 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> When a high-order page is split into smaller ones, each newly split
> page should get its codetag. The original codetag is reused for these
> pages but it's recorded as 0-byte allocation because original codetag
> already accounts for the original high-order allocated page.

This was v3 but then you refactored (for the better) so the commit log
could reflect it?

> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

I was going to R-b, but now I recalled the trickiness of
__free_pages() for non-compound pages if it loses the race to a
speculative reference. Will the codetag handling work fine there?

> ---
>  include/linux/pgalloc_tag.h | 30 ++++++++++++++++++++++++++++++
>  mm/huge_memory.c            |  2 ++
>  mm/page_alloc.c             |  2 ++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
> index b49ab955300f..9e6ad8e0e4aa 100644
> --- a/include/linux/pgalloc_tag.h
> +++ b/include/linux/pgalloc_tag.h
> @@ -67,11 +67,41 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int order)
>  	}
>  }
>  
> +static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
> +{
> +	int i;
> +	struct page_ext *page_ext;
> +	union codetag_ref *ref;
> +	struct alloc_tag *tag;
> +
> +	if (!mem_alloc_profiling_enabled())
> +		return;
> +
> +	page_ext = page_ext_get(page);
> +	if (unlikely(!page_ext))
> +		return;
> +
> +	ref = codetag_ref_from_page_ext(page_ext);
> +	if (!ref->ct)
> +		goto out;
> +
> +	tag = ct_to_alloc_tag(ref->ct);
> +	page_ext = page_ext_next(page_ext);
> +	for (i = 1; i < nr; i++) {
> +		/* Set new reference to point to the original tag */
> +		alloc_tag_ref_set(codetag_ref_from_page_ext(page_ext), tag);
> +		page_ext = page_ext_next(page_ext);
> +	}
> +out:
> +	page_ext_put(page_ext);
> +}
> +
>  #else /* CONFIG_MEM_ALLOC_PROFILING */
>  
>  static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
>  				   unsigned int order) {}
>  static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {}
> +static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {}
>  
>  #endif /* CONFIG_MEM_ALLOC_PROFILING */
>  
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 94c958f7ebb5..86daae671319 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -38,6 +38,7 @@
>  #include <linux/sched/sysctl.h>
>  #include <linux/memory-tiers.h>
>  #include <linux/compat.h>
> +#include <linux/pgalloc_tag.h>
>  
>  #include <asm/tlb.h>
>  #include <asm/pgalloc.h>
> @@ -2899,6 +2900,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>  	/* Caller disabled irqs, so they are still disabled here */
>  
>  	split_page_owner(head, nr);
> +	pgalloc_tag_split(head, nr);
>  
>  	/* See comment in __split_huge_page_tail() */
>  	if (PageAnon(head)) {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 58c0e8b948a4..4bc5b4720fee 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2621,6 +2621,7 @@ void split_page(struct page *page, unsigned int order)
>  	for (i = 1; i < (1 << order); i++)
>  		set_page_refcounted(page + i);
>  	split_page_owner(page, 1 << order);
> +	pgalloc_tag_split(page, 1 << order);
>  	split_page_memcg(page, 1 << order);
>  }
>  EXPORT_SYMBOL_GPL(split_page);
> @@ -4806,6 +4807,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
>  		struct page *last = page + nr;
>  
>  		split_page_owner(page, 1 << order);
> +		pgalloc_tag_split(page, 1 << order);
>  		split_page_memcg(page, 1 << order);
>  		while (page < --last)
>  			set_page_refcounted(last);

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 20/36] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
  2024-02-21 19:40 ` [PATCH v4 20/36] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y Suren Baghdasaryan
@ 2024-02-27 10:18   ` Vlastimil Babka
  0 siblings, 0 replies; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-27 10:18 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> For all page allocations to be tagged, page_ext has to be initialized
> before the first page allocation. Early tasks allocate their stacks
> using page allocator before alloc_node_page_ext() initializes page_ext
> area, unless early_page_ext is enabled. Therefore these allocations will
> generate a warning when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled.
> Enable early_page_ext whenever CONFIG_MEM_ALLOC_PROFILING_DEBUG=y to
> ensure page_ext initialization prior to any page allocation. This will
> have all the negative effects associated with early_page_ext, such as
> possible longer boot time, therefore we enable it only when debugging
> with CONFIG_MEM_ALLOC_PROFILING_DEBUG enabled and not universally for
> CONFIG_MEM_ALLOC_PROFILING.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/page_ext.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index 3c58fe8a24df..e7d8f1a5589e 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -95,7 +95,16 @@ unsigned long page_ext_size;
>  
>  static unsigned long total_usage;
>  
> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
> +/*
> + * To ensure correct allocation tagging for pages, page_ext should be available
> + * before the first page allocation. Otherwise early task stacks will be
> + * allocated before page_ext initialization and missing tags will be flagged.
> + */
> +bool early_page_ext __meminitdata = true;
> +#else
>  bool early_page_ext __meminitdata;
> +#endif
>  static int __init setup_early_page_ext(char *str)
>  {
>  	early_page_ext = true;

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 21/36] lib: add codetag reference into slabobj_ext
  2024-02-21 19:40 ` [PATCH v4 21/36] lib: add codetag reference into slabobj_ext Suren Baghdasaryan
@ 2024-02-27 10:19   ` Vlastimil Babka
  0 siblings, 0 replies; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-27 10:19 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> To store code tag for every slab object, a codetag reference is embedded
> into slabobj_ext when CONFIG_MEM_ALLOC_PROFILING=y.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/memcontrol.h | 5 +++++
>  lib/Kconfig.debug          | 1 +
>  2 files changed, 6 insertions(+)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index f3584e98b640..2b010316016c 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1653,7 +1653,12 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
>   * if MEMCG_DATA_OBJEXTS is set.
>   */
>  struct slabobj_ext {
> +#ifdef CONFIG_MEMCG_KMEM
>  	struct obj_cgroup *objcg;
> +#endif
> +#ifdef CONFIG_MEM_ALLOC_PROFILING
> +	union codetag_ref ref;
> +#endif
>  } __aligned(8);
>  
>  static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 7bbdb0ddb011..9ecfcdb54417 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -979,6 +979,7 @@ config MEM_ALLOC_PROFILING
>  	depends on !DEBUG_FORCE_WEAK_PER_CPU
>  	select CODE_TAGGING
>  	select PAGE_EXTENSION
> +	select SLAB_OBJ_EXT
>  	help
>  	  Track allocation source code and record total allocation size
>  	  initiated at that code location. The mechanism can be used to track

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 22/36] mm/slab: add allocation accounting into slab allocation and free paths
  2024-02-21 19:40 ` [PATCH v4 22/36] mm/slab: add allocation accounting into slab allocation and free paths Suren Baghdasaryan
@ 2024-02-27 13:07   ` Vlastimil Babka
  2024-02-27 16:15     ` Suren Baghdasaryan
  0 siblings, 1 reply; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-27 13:07 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups



On 2/21/24 20:40, Suren Baghdasaryan wrote:
> Account slab allocations using codetag reference embedded into slabobj_ext.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> ---
>  mm/slab.h | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/slub.c |  9 ++++++++
>  2 files changed, 75 insertions(+)
> 
> diff --git a/mm/slab.h b/mm/slab.h
> index 13b6ba2abd74..c4bd0d5348cb 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -567,6 +567,46 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
>  int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>  			gfp_t gfp, bool new_slab);
>  
> +static inline bool need_slab_obj_ext(void)
> +{
> +#ifdef CONFIG_MEM_ALLOC_PROFILING
> +	if (mem_alloc_profiling_enabled())
> +		return true;
> +#endif
> +	/*
> +	 * CONFIG_MEMCG_KMEM creates vector of obj_cgroup objects conditionally
> +	 * inside memcg_slab_post_alloc_hook. No other users for now.
> +	 */
> +	return false;
> +}
> +
> +static inline struct slabobj_ext *
> +prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
> +{
> +	struct slab *slab;
> +
> +	if (!p)
> +		return NULL;
> +
> +	if (!need_slab_obj_ext())
> +		return NULL;
> +
> +	if (s->flags & SLAB_NO_OBJ_EXT)
> +		return NULL;
> +
> +	if (flags & __GFP_NO_OBJ_EXT)
> +		return NULL;
> +
> +	slab = virt_to_slab(p);
> +	if (!slab_obj_exts(slab) &&
> +	    WARN(alloc_slab_obj_exts(slab, s, flags, false),
> +		 "%s, %s: Failed to create slab extension vector!\n",
> +		 __func__, s->name))
> +		return NULL;
> +
> +	return slab_obj_exts(slab) + obj_to_index(s, slab, p);
> +}
> +
>  #else /* CONFIG_SLAB_OBJ_EXT */
>  
>  static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
> @@ -589,6 +629,32 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
>  
>  #endif /* CONFIG_SLAB_OBJ_EXT */
>  
> +#ifdef CONFIG_MEM_ALLOC_PROFILING
> +
> +static inline void alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> +					void **p, int objects)

Only used from mm/slub.c so could move?

> +{
> +	struct slabobj_ext *obj_exts;
> +	int i;
> +
> +	obj_exts = slab_obj_exts(slab);
> +	if (!obj_exts)
> +		return;
> +
> +	for (i = 0; i < objects; i++) {
> +		unsigned int off = obj_to_index(s, slab, p[i]);
> +
> +		alloc_tag_sub(&obj_exts[off].ref, s->size);
> +	}
> +}
> +
> +#else
> +
> +static inline void alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> +					void **p, int objects) {}
> +
> +#endif /* CONFIG_MEM_ALLOC_PROFILING */
> +
>  #ifdef CONFIG_MEMCG_KMEM
>  void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
>  		     enum node_stat_item idx, int nr);
> diff --git a/mm/slub.c b/mm/slub.c
> index 5dc7beda6c0d..a69b6b4c8df6 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3826,6 +3826,7 @@ void slab_post_alloc_hook(struct kmem_cache *s,	struct obj_cgroup *objcg,
>  			  unsigned int orig_size)
>  {
>  	unsigned int zero_size = s->object_size;
> +	struct slabobj_ext *obj_exts;
>  	bool kasan_init = init;
>  	size_t i;
>  	gfp_t init_flags = flags & gfp_allowed_mask;
> @@ -3868,6 +3869,12 @@ void slab_post_alloc_hook(struct kmem_cache *s,	struct obj_cgroup *objcg,
>  		kmemleak_alloc_recursive(p[i], s->object_size, 1,
>  					 s->flags, init_flags);
>  		kmsan_slab_alloc(s, p[i], init_flags);
> +		obj_exts = prepare_slab_obj_exts_hook(s, flags, p[i]);
> +#ifdef CONFIG_MEM_ALLOC_PROFILING
> +		/* obj_exts can be allocated for other reasons */
> +		if (likely(obj_exts) && mem_alloc_profiling_enabled())
> +			alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
> +#endif

I think that like in the page allocator, this could be better guarded by
mem_alloc_profiling_enabled() as the outermost thing.

>  	}
>  
>  	memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
> @@ -4346,6 +4353,7 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
>  	       unsigned long addr)
>  {
>  	memcg_slab_free_hook(s, slab, &object, 1);
> +	alloc_tagging_slab_free_hook(s, slab, &object, 1);

Same here, the static key is not even inside of this?

>  
>  	if (likely(slab_free_hook(s, object, slab_want_init_on_free(s))))
>  		do_slab_free(s, slab, object, object, 1, addr);
> @@ -4356,6 +4364,7 @@ void slab_free_bulk(struct kmem_cache *s, struct slab *slab, void *head,
>  		    void *tail, void **p, int cnt, unsigned long addr)
>  {
>  	memcg_slab_free_hook(s, slab, p, cnt);
> +	alloc_tagging_slab_free_hook(s, slab, p, cnt);

Ditto.

>  	/*
>  	 * With KASAN enabled slab_free_freelist_hook modifies the freelist
>  	 * to remove objects, whose reuse must be delayed.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 31/36] lib: add memory allocations report in show_mem()
  2024-02-21 19:40 ` [PATCH v4 31/36] lib: add memory allocations report in show_mem() Suren Baghdasaryan
@ 2024-02-27 13:19   ` Vlastimil Babka
  2024-02-27 16:12     ` Suren Baghdasaryan
  0 siblings, 1 reply; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-27 13:19 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> Include allocations in show_mem reports.
> 
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Nit: there's pr_notice() that's shorter than printk(KERN_NOTICE

> ---
>  include/linux/alloc_tag.h |  7 +++++++
>  include/linux/codetag.h   |  1 +
>  lib/alloc_tag.c           | 38 ++++++++++++++++++++++++++++++++++++++
>  lib/codetag.c             |  5 +++++
>  mm/show_mem.c             | 26 ++++++++++++++++++++++++++
>  5 files changed, 77 insertions(+)
> 
> diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> index 29636719b276..85a24a027403 100644
> --- a/include/linux/alloc_tag.h
> +++ b/include/linux/alloc_tag.h
> @@ -30,6 +30,13 @@ struct alloc_tag {
>  
>  #ifdef CONFIG_MEM_ALLOC_PROFILING
>  
> +struct codetag_bytes {
> +	struct codetag *ct;
> +	s64 bytes;
> +};
> +
> +size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sleep);
> +
>  static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct)
>  {
>  	return container_of(ct, struct alloc_tag, ct);
> diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> index bfd0ba5c4185..c2a579ccd455 100644
> --- a/include/linux/codetag.h
> +++ b/include/linux/codetag.h
> @@ -61,6 +61,7 @@ struct codetag_iterator {
>  }
>  
>  void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
> +bool codetag_trylock_module_list(struct codetag_type *cttype);
>  struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
>  struct codetag *codetag_next_ct(struct codetag_iterator *iter);
>  
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index cb5adec4b2e2..ec54f29482dc 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -86,6 +86,44 @@ static const struct seq_operations allocinfo_seq_op = {
>  	.show	= allocinfo_show,
>  };
>  
> +size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sleep)
> +{
> +	struct codetag_iterator iter;
> +	struct codetag *ct;
> +	struct codetag_bytes n;
> +	unsigned int i, nr = 0;
> +
> +	if (can_sleep)
> +		codetag_lock_module_list(alloc_tag_cttype, true);
> +	else if (!codetag_trylock_module_list(alloc_tag_cttype))
> +		return 0;
> +
> +	iter = codetag_get_ct_iter(alloc_tag_cttype);
> +	while ((ct = codetag_next_ct(&iter))) {
> +		struct alloc_tag_counters counter = alloc_tag_read(ct_to_alloc_tag(ct));
> +
> +		n.ct	= ct;
> +		n.bytes = counter.bytes;
> +
> +		for (i = 0; i < nr; i++)
> +			if (n.bytes > tags[i].bytes)
> +				break;
> +
> +		if (i < count) {
> +			nr -= nr == count;
> +			memmove(&tags[i + 1],
> +				&tags[i],
> +				sizeof(tags[0]) * (nr - i));
> +			nr++;
> +			tags[i] = n;
> +		}
> +	}
> +
> +	codetag_lock_module_list(alloc_tag_cttype, false);
> +
> +	return nr;
> +}
> +
>  static void __init procfs_init(void)
>  {
>  	proc_create_seq("allocinfo", 0444, NULL, &allocinfo_seq_op);
> diff --git a/lib/codetag.c b/lib/codetag.c
> index b13412ca57cc..7b39cec9648a 100644
> --- a/lib/codetag.c
> +++ b/lib/codetag.c
> @@ -36,6 +36,11 @@ void codetag_lock_module_list(struct codetag_type *cttype, bool lock)
>  		up_read(&cttype->mod_lock);
>  }
>  
> +bool codetag_trylock_module_list(struct codetag_type *cttype)
> +{
> +	return down_read_trylock(&cttype->mod_lock) != 0;
> +}
> +
>  struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
>  {
>  	struct codetag_iterator iter = {
> diff --git a/mm/show_mem.c b/mm/show_mem.c
> index 8dcfafbd283c..1e41f8d6e297 100644
> --- a/mm/show_mem.c
> +++ b/mm/show_mem.c
> @@ -423,4 +423,30 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
>  #ifdef CONFIG_MEMORY_FAILURE
>  	printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
>  #endif
> +#ifdef CONFIG_MEM_ALLOC_PROFILING
> +	{
> +		struct codetag_bytes tags[10];
> +		size_t i, nr;
> +
> +		nr = alloc_tag_top_users(tags, ARRAY_SIZE(tags), false);
> +		if (nr) {
> +			printk(KERN_NOTICE "Memory allocations:\n");
> +			for (i = 0; i < nr; i++) {
> +				struct codetag *ct = tags[i].ct;
> +				struct alloc_tag *tag = ct_to_alloc_tag(ct);
> +				struct alloc_tag_counters counter = alloc_tag_read(tag);
> +
> +				/* Same as alloc_tag_to_text() but w/o intermediate buffer */
> +				if (ct->modname)
> +					printk(KERN_NOTICE "%12lli %8llu %s:%u [%s] func:%s\n",
> +					       counter.bytes, counter.calls, ct->filename,
> +					       ct->lineno, ct->modname, ct->function);
> +				else
> +					printk(KERN_NOTICE "%12lli %8llu %s:%u func:%s\n",
> +					       counter.bytes, counter.calls, ct->filename,
> +					       ct->lineno, ct->function);
> +			}
> +		}
> +	}
> +#endif
>  }

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 00/36] Memory allocation profiling
  2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
                   ` (35 preceding siblings ...)
  2024-02-21 19:40 ` [PATCH v4 36/36] memprofiling: Documentation Suren Baghdasaryan
@ 2024-02-27 13:36 ` Vlastimil Babka
  2024-02-27 16:10   ` Suren Baghdasaryan
  36 siblings, 1 reply; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-27 13:36 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> Overview:
> Low overhead [1] per-callsite memory allocation profiling. Not just for
> debug kernels, overhead low enough to be deployed in production.
> 
> Example output:
>   root@moria-kvm:~# sort -rn /proc/allocinfo
>    127664128    31168 mm/page_ext.c:270 func:alloc_page_ext
>     56373248     4737 mm/slub.c:2259 func:alloc_slab_page
>     14880768     3633 mm/readahead.c:247 func:page_cache_ra_unbounded
>     14417920     3520 mm/mm_init.c:2530 func:alloc_large_system_hash
>     13377536      234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
>     11718656     2861 mm/filemap.c:1919 func:__filemap_get_folio
>      9192960     2800 kernel/fork.c:307 func:alloc_thread_stack_node
>      4206592        4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
>      4136960     1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
>      3940352      962 mm/memory.c:4214 func:alloc_anon_folio
>      2894464    22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
>      ...
> 
> Since v3:
>  - Dropped patch changing string_get_size() [2] as not needed
>  - Dropped patch modifying xfs allocators [3] as non needed,
>    per Dave Chinner
>  - Added Reviewed-by, per Kees Cook
>  - Moved prepare_slab_obj_exts_hook() and alloc_slab_obj_exts() where they
>    are used, per Vlastimil Babka
>  - Fixed SLAB_NO_OBJ_EXT definition to use unused bit, per Vlastimil Babka
>  - Refactored patch [4] into other patches, per Vlastimil Babka
>  - Replaced snprintf() with seq_buf_printf(), per Kees Cook
>  - Changed output to report bytes, per Andrew Morton and Pasha Tatashin
>  - Changed output to report [module] only for loadable modules,
>    per Vlastimil Babka
>  - Moved mem_alloc_profiling_enabled() check earlier, per Vlastimil Babka
>  - Changed the code to handle page splitting to be more understandable,
>    per Vlastimil Babka
>  - Moved alloc_tagging_slab_free_hook(), mark_objexts_empty(),
>    mark_failed_objexts_alloc() and handle_failed_objexts_alloc(),
>    per Vlastimil Babka
>  - Fixed loss of __alloc_size(1, 2) in kvmalloc functions,
>    per Vlastimil Babka
>  - Refactored the code in show_mem() to avoid memory allocations,
>    per Michal Hocko
>  - Changed to trylock in show_mem() to avoid blocking in atomic context,
>    per Tetsuo Handa
>  - Added mm mailing list into MAINTAINERS, per Kees Cook
>  - Added base commit SHA, per Andy Shevchenko
>  - Added a patch with documentation, per Jani Nikula
>  - Fixed 0day bugs
>  - Added benchmark results [5], per Steven Rostedt
>  - Rebased over Linux 6.8-rc5
> 
> Items not yet addressed:
>  - An early_boot option to prevent pageext overhead. We are looking into
>    ways for using the same sysctr instead of adding additional early boot
>    parameter.

I have reviewed the parts that integrate the tracking with page and slab
allocators, and besides some details to improve it seems ok to me. The
early boot option seems coming so that might eventually be suitable for
build-time enablement in a distro kernel.

The macros (and their potential spread to upper layers to keep the
information useful enough) are of course ugly, but guess it can't be
currently helped and I'm unable to decide whether it's worth it or not.
That's up to those providing their success stories I guess. If there's
at least a path ahead to replace that part with compiler support in the
future, great. So I'm not against merging this. BTW, do we know Linus's
opinion on the macros approach?

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 00/36] Memory allocation profiling
  2024-02-27 13:36 ` [PATCH v4 00/36] Memory allocation profiling Vlastimil Babka
@ 2024-02-27 16:10   ` Suren Baghdasaryan
  0 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-27 16:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Tue, Feb 27, 2024 at 5:35 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > Overview:
> > Low overhead [1] per-callsite memory allocation profiling. Not just for
> > debug kernels, overhead low enough to be deployed in production.
> >
> > Example output:
> >   root@moria-kvm:~# sort -rn /proc/allocinfo
> >    127664128    31168 mm/page_ext.c:270 func:alloc_page_ext
> >     56373248     4737 mm/slub.c:2259 func:alloc_slab_page
> >     14880768     3633 mm/readahead.c:247 func:page_cache_ra_unbounded
> >     14417920     3520 mm/mm_init.c:2530 func:alloc_large_system_hash
> >     13377536      234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
> >     11718656     2861 mm/filemap.c:1919 func:__filemap_get_folio
> >      9192960     2800 kernel/fork.c:307 func:alloc_thread_stack_node
> >      4206592        4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
> >      4136960     1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
> >      3940352      962 mm/memory.c:4214 func:alloc_anon_folio
> >      2894464    22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
> >      ...
> >
> > Since v3:
> >  - Dropped patch changing string_get_size() [2] as not needed
> >  - Dropped patch modifying xfs allocators [3] as non needed,
> >    per Dave Chinner
> >  - Added Reviewed-by, per Kees Cook
> >  - Moved prepare_slab_obj_exts_hook() and alloc_slab_obj_exts() where they
> >    are used, per Vlastimil Babka
> >  - Fixed SLAB_NO_OBJ_EXT definition to use unused bit, per Vlastimil Babka
> >  - Refactored patch [4] into other patches, per Vlastimil Babka
> >  - Replaced snprintf() with seq_buf_printf(), per Kees Cook
> >  - Changed output to report bytes, per Andrew Morton and Pasha Tatashin
> >  - Changed output to report [module] only for loadable modules,
> >    per Vlastimil Babka
> >  - Moved mem_alloc_profiling_enabled() check earlier, per Vlastimil Babka
> >  - Changed the code to handle page splitting to be more understandable,
> >    per Vlastimil Babka
> >  - Moved alloc_tagging_slab_free_hook(), mark_objexts_empty(),
> >    mark_failed_objexts_alloc() and handle_failed_objexts_alloc(),
> >    per Vlastimil Babka
> >  - Fixed loss of __alloc_size(1, 2) in kvmalloc functions,
> >    per Vlastimil Babka
> >  - Refactored the code in show_mem() to avoid memory allocations,
> >    per Michal Hocko
> >  - Changed to trylock in show_mem() to avoid blocking in atomic context,
> >    per Tetsuo Handa
> >  - Added mm mailing list into MAINTAINERS, per Kees Cook
> >  - Added base commit SHA, per Andy Shevchenko
> >  - Added a patch with documentation, per Jani Nikula
> >  - Fixed 0day bugs
> >  - Added benchmark results [5], per Steven Rostedt
> >  - Rebased over Linux 6.8-rc5
> >
> > Items not yet addressed:
> >  - An early_boot option to prevent pageext overhead. We are looking into
> >    ways for using the same sysctr instead of adding additional early boot
> >    parameter.
>
> I have reviewed the parts that integrate the tracking with page and slab
> allocators, and besides some details to improve it seems ok to me. The
> early boot option seems coming so that might eventually be suitable for
> build-time enablement in a distro kernel.

Thanks for reviewing Vlastimil!

>
> The macros (and their potential spread to upper layers to keep the
> information useful enough) are of course ugly, but guess it can't be
> currently helped and I'm unable to decide whether it's worth it or not.
> That's up to those providing their success stories I guess. If there's
> at least a path ahead to replace that part with compiler support in the
> future, great. So I'm not against merging this. BTW, do we know Linus's
> opinion on the macros approach?

We haven't run it by Linus specifically but hopefully we will see a
comment from him on the mailing list at some point.

>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 31/36] lib: add memory allocations report in show_mem()
  2024-02-27 13:19   ` Vlastimil Babka
@ 2024-02-27 16:12     ` Suren Baghdasaryan
  0 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-27 16:12 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Tue, Feb 27, 2024 at 5:18 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > Include allocations in show_mem reports.
> >
> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>
> Nit: there's pr_notice() that's shorter than printk(KERN_NOTICE

I used printk() since other parts of show_mem() used it but I can
change if that's preferable.

>
> > ---
> >  include/linux/alloc_tag.h |  7 +++++++
> >  include/linux/codetag.h   |  1 +
> >  lib/alloc_tag.c           | 38 ++++++++++++++++++++++++++++++++++++++
> >  lib/codetag.c             |  5 +++++
> >  mm/show_mem.c             | 26 ++++++++++++++++++++++++++
> >  5 files changed, 77 insertions(+)
> >
> > diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> > index 29636719b276..85a24a027403 100644
> > --- a/include/linux/alloc_tag.h
> > +++ b/include/linux/alloc_tag.h
> > @@ -30,6 +30,13 @@ struct alloc_tag {
> >
> >  #ifdef CONFIG_MEM_ALLOC_PROFILING
> >
> > +struct codetag_bytes {
> > +     struct codetag *ct;
> > +     s64 bytes;
> > +};
> > +
> > +size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sleep);
> > +
> >  static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct)
> >  {
> >       return container_of(ct, struct alloc_tag, ct);
> > diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> > index bfd0ba5c4185..c2a579ccd455 100644
> > --- a/include/linux/codetag.h
> > +++ b/include/linux/codetag.h
> > @@ -61,6 +61,7 @@ struct codetag_iterator {
> >  }
> >
> >  void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
> > +bool codetag_trylock_module_list(struct codetag_type *cttype);
> >  struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
> >  struct codetag *codetag_next_ct(struct codetag_iterator *iter);
> >
> > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > index cb5adec4b2e2..ec54f29482dc 100644
> > --- a/lib/alloc_tag.c
> > +++ b/lib/alloc_tag.c
> > @@ -86,6 +86,44 @@ static const struct seq_operations allocinfo_seq_op = {
> >       .show   = allocinfo_show,
> >  };
> >
> > +size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sleep)
> > +{
> > +     struct codetag_iterator iter;
> > +     struct codetag *ct;
> > +     struct codetag_bytes n;
> > +     unsigned int i, nr = 0;
> > +
> > +     if (can_sleep)
> > +             codetag_lock_module_list(alloc_tag_cttype, true);
> > +     else if (!codetag_trylock_module_list(alloc_tag_cttype))
> > +             return 0;
> > +
> > +     iter = codetag_get_ct_iter(alloc_tag_cttype);
> > +     while ((ct = codetag_next_ct(&iter))) {
> > +             struct alloc_tag_counters counter = alloc_tag_read(ct_to_alloc_tag(ct));
> > +
> > +             n.ct    = ct;
> > +             n.bytes = counter.bytes;
> > +
> > +             for (i = 0; i < nr; i++)
> > +                     if (n.bytes > tags[i].bytes)
> > +                             break;
> > +
> > +             if (i < count) {
> > +                     nr -= nr == count;
> > +                     memmove(&tags[i + 1],
> > +                             &tags[i],
> > +                             sizeof(tags[0]) * (nr - i));
> > +                     nr++;
> > +                     tags[i] = n;
> > +             }
> > +     }
> > +
> > +     codetag_lock_module_list(alloc_tag_cttype, false);
> > +
> > +     return nr;
> > +}
> > +
> >  static void __init procfs_init(void)
> >  {
> >       proc_create_seq("allocinfo", 0444, NULL, &allocinfo_seq_op);
> > diff --git a/lib/codetag.c b/lib/codetag.c
> > index b13412ca57cc..7b39cec9648a 100644
> > --- a/lib/codetag.c
> > +++ b/lib/codetag.c
> > @@ -36,6 +36,11 @@ void codetag_lock_module_list(struct codetag_type *cttype, bool lock)
> >               up_read(&cttype->mod_lock);
> >  }
> >
> > +bool codetag_trylock_module_list(struct codetag_type *cttype)
> > +{
> > +     return down_read_trylock(&cttype->mod_lock) != 0;
> > +}
> > +
> >  struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
> >  {
> >       struct codetag_iterator iter = {
> > diff --git a/mm/show_mem.c b/mm/show_mem.c
> > index 8dcfafbd283c..1e41f8d6e297 100644
> > --- a/mm/show_mem.c
> > +++ b/mm/show_mem.c
> > @@ -423,4 +423,30 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
> >  #ifdef CONFIG_MEMORY_FAILURE
> >       printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
> >  #endif
> > +#ifdef CONFIG_MEM_ALLOC_PROFILING
> > +     {
> > +             struct codetag_bytes tags[10];
> > +             size_t i, nr;
> > +
> > +             nr = alloc_tag_top_users(tags, ARRAY_SIZE(tags), false);
> > +             if (nr) {
> > +                     printk(KERN_NOTICE "Memory allocations:\n");
> > +                     for (i = 0; i < nr; i++) {
> > +                             struct codetag *ct = tags[i].ct;
> > +                             struct alloc_tag *tag = ct_to_alloc_tag(ct);
> > +                             struct alloc_tag_counters counter = alloc_tag_read(tag);
> > +
> > +                             /* Same as alloc_tag_to_text() but w/o intermediate buffer */
> > +                             if (ct->modname)
> > +                                     printk(KERN_NOTICE "%12lli %8llu %s:%u [%s] func:%s\n",
> > +                                            counter.bytes, counter.calls, ct->filename,
> > +                                            ct->lineno, ct->modname, ct->function);
> > +                             else
> > +                                     printk(KERN_NOTICE "%12lli %8llu %s:%u func:%s\n",
> > +                                            counter.bytes, counter.calls, ct->filename,
> > +                                            ct->lineno, ct->function);
> > +                     }
> > +             }
> > +     }
> > +#endif
> >  }
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 22/36] mm/slab: add allocation accounting into slab allocation and free paths
  2024-02-27 13:07   ` Vlastimil Babka
@ 2024-02-27 16:15     ` Suren Baghdasaryan
  0 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-27 16:15 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Tue, Feb 27, 2024 at 5:07 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
>
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > Account slab allocations using codetag reference embedded into slabobj_ext.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> > Reviewed-by: Kees Cook <keescook@chromium.org>
> > ---
> >  mm/slab.h | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  mm/slub.c |  9 ++++++++
> >  2 files changed, 75 insertions(+)
> >
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 13b6ba2abd74..c4bd0d5348cb 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -567,6 +567,46 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
> >  int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> >                       gfp_t gfp, bool new_slab);
> >
> > +static inline bool need_slab_obj_ext(void)
> > +{
> > +#ifdef CONFIG_MEM_ALLOC_PROFILING
> > +     if (mem_alloc_profiling_enabled())
> > +             return true;
> > +#endif
> > +     /*
> > +      * CONFIG_MEMCG_KMEM creates vector of obj_cgroup objects conditionally
> > +      * inside memcg_slab_post_alloc_hook. No other users for now.
> > +      */
> > +     return false;
> > +}
> > +
> > +static inline struct slabobj_ext *
> > +prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
> > +{
> > +     struct slab *slab;
> > +
> > +     if (!p)
> > +             return NULL;
> > +
> > +     if (!need_slab_obj_ext())
> > +             return NULL;
> > +
> > +     if (s->flags & SLAB_NO_OBJ_EXT)
> > +             return NULL;
> > +
> > +     if (flags & __GFP_NO_OBJ_EXT)
> > +             return NULL;
> > +
> > +     slab = virt_to_slab(p);
> > +     if (!slab_obj_exts(slab) &&
> > +         WARN(alloc_slab_obj_exts(slab, s, flags, false),
> > +              "%s, %s: Failed to create slab extension vector!\n",
> > +              __func__, s->name))
> > +             return NULL;
> > +
> > +     return slab_obj_exts(slab) + obj_to_index(s, slab, p);
> > +}
> > +
> >  #else /* CONFIG_SLAB_OBJ_EXT */
> >
> >  static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
> > @@ -589,6 +629,32 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
> >
> >  #endif /* CONFIG_SLAB_OBJ_EXT */
> >
> > +#ifdef CONFIG_MEM_ALLOC_PROFILING
> > +
> > +static inline void alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> > +                                     void **p, int objects)
>
> Only used from mm/slub.c so could move?

Ack.

>
> > +{
> > +     struct slabobj_ext *obj_exts;
> > +     int i;
> > +
> > +     obj_exts = slab_obj_exts(slab);
> > +     if (!obj_exts)
> > +             return;
> > +
> > +     for (i = 0; i < objects; i++) {
> > +             unsigned int off = obj_to_index(s, slab, p[i]);
> > +
> > +             alloc_tag_sub(&obj_exts[off].ref, s->size);
> > +     }
> > +}
> > +
> > +#else
> > +
> > +static inline void alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> > +                                     void **p, int objects) {}
> > +
> > +#endif /* CONFIG_MEM_ALLOC_PROFILING */
> > +
> >  #ifdef CONFIG_MEMCG_KMEM
> >  void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
> >                    enum node_stat_item idx, int nr);
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 5dc7beda6c0d..a69b6b4c8df6 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3826,6 +3826,7 @@ void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg,
> >                         unsigned int orig_size)
> >  {
> >       unsigned int zero_size = s->object_size;
> > +     struct slabobj_ext *obj_exts;
> >       bool kasan_init = init;
> >       size_t i;
> >       gfp_t init_flags = flags & gfp_allowed_mask;
> > @@ -3868,6 +3869,12 @@ void slab_post_alloc_hook(struct kmem_cache *s,        struct obj_cgroup *objcg,
> >               kmemleak_alloc_recursive(p[i], s->object_size, 1,
> >                                        s->flags, init_flags);
> >               kmsan_slab_alloc(s, p[i], init_flags);
> > +             obj_exts = prepare_slab_obj_exts_hook(s, flags, p[i]);
> > +#ifdef CONFIG_MEM_ALLOC_PROFILING
> > +             /* obj_exts can be allocated for other reasons */
> > +             if (likely(obj_exts) && mem_alloc_profiling_enabled())
> > +                     alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
> > +#endif
>
> I think that like in the page allocator, this could be better guarded by
> mem_alloc_profiling_enabled() as the outermost thing.

Oops, missed it. Will fix.

>
> >       }
> >
> >       memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
> > @@ -4346,6 +4353,7 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
> >              unsigned long addr)
> >  {
> >       memcg_slab_free_hook(s, slab, &object, 1);
> > +     alloc_tagging_slab_free_hook(s, slab, &object, 1);
>
> Same here, the static key is not even inside of this?

Ack.

>
> >
> >       if (likely(slab_free_hook(s, object, slab_want_init_on_free(s))))
> >               do_slab_free(s, slab, object, object, 1, addr);
> > @@ -4356,6 +4364,7 @@ void slab_free_bulk(struct kmem_cache *s, struct slab *slab, void *head,
> >                   void *tail, void **p, int cnt, unsigned long addr)
> >  {
> >       memcg_slab_free_hook(s, slab, p, cnt);
> > +     alloc_tagging_slab_free_hook(s, slab, p, cnt);
>
> Ditto.

Ack.

>
> >       /*
> >        * With KASAN enabled slab_free_freelist_hook modifies the freelist
> >        * to remove objects, whose reuse must be delayed.
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 19/36] mm: create new codetag references during page splitting
  2024-02-27 10:11   ` Vlastimil Babka
@ 2024-02-27 16:38     ` Suren Baghdasaryan
  2024-02-28  8:47       ` Vlastimil Babka
  0 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-27 16:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Tue, Feb 27, 2024 at 2:10 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > When a high-order page is split into smaller ones, each newly split
> > page should get its codetag. The original codetag is reused for these
> > pages but it's recorded as 0-byte allocation because original codetag
> > already accounts for the original high-order allocated page.
>
> This was v3 but then you refactored (for the better) so the commit log
> could reflect it?

Yes, technically mechnism didn't change but I should word it better.
Smth like this:

When a high-order page is split into smaller ones, each newly split
page should get its codetag. After the split each split page will be
referencing the original codetag. The codetag's "bytes" counter
remains the same because the amount of allocated memory has not
changed, however the "calls" counter gets increased to keep the
counter correct when these individual pages get freed.

>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>
> I was going to R-b, but now I recalled the trickiness of
> __free_pages() for non-compound pages if it loses the race to a
> speculative reference. Will the codetag handling work fine there?

I think so. Each non-compoud page has its individual reference to its
codetag and will decrement it whenever the page is freed. IIUC the
logic in  __free_pages(), when it loses race to a speculative
reference it will free all pages except for the first one and the
first one will be freed when the last put_page() happens. If prior to
this all these pages were split from one page then all of them will
have their own reference which points to the same codetag. Every time
one of these pages are freed that codetag's "bytes" and "calls"
counters will be decremented. I think accounting will work correctly
irrespective of where these pages are freed, in __free_pages() or by
put_page().

>
> > ---
> >  include/linux/pgalloc_tag.h | 30 ++++++++++++++++++++++++++++++
> >  mm/huge_memory.c            |  2 ++
> >  mm/page_alloc.c             |  2 ++
> >  3 files changed, 34 insertions(+)
> >
> > diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
> > index b49ab955300f..9e6ad8e0e4aa 100644
> > --- a/include/linux/pgalloc_tag.h
> > +++ b/include/linux/pgalloc_tag.h
> > @@ -67,11 +67,41 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int order)
> >       }
> >  }
> >
> > +static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
> > +{
> > +     int i;
> > +     struct page_ext *page_ext;
> > +     union codetag_ref *ref;
> > +     struct alloc_tag *tag;
> > +
> > +     if (!mem_alloc_profiling_enabled())
> > +             return;
> > +
> > +     page_ext = page_ext_get(page);
> > +     if (unlikely(!page_ext))
> > +             return;
> > +
> > +     ref = codetag_ref_from_page_ext(page_ext);
> > +     if (!ref->ct)
> > +             goto out;
> > +
> > +     tag = ct_to_alloc_tag(ref->ct);
> > +     page_ext = page_ext_next(page_ext);
> > +     for (i = 1; i < nr; i++) {
> > +             /* Set new reference to point to the original tag */
> > +             alloc_tag_ref_set(codetag_ref_from_page_ext(page_ext), tag);
> > +             page_ext = page_ext_next(page_ext);
> > +     }
> > +out:
> > +     page_ext_put(page_ext);
> > +}
> > +
> >  #else /* CONFIG_MEM_ALLOC_PROFILING */
> >
> >  static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
> >                                  unsigned int order) {}
> >  static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {}
> > +static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {}
> >
> >  #endif /* CONFIG_MEM_ALLOC_PROFILING */
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 94c958f7ebb5..86daae671319 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -38,6 +38,7 @@
> >  #include <linux/sched/sysctl.h>
> >  #include <linux/memory-tiers.h>
> >  #include <linux/compat.h>
> > +#include <linux/pgalloc_tag.h>
> >
> >  #include <asm/tlb.h>
> >  #include <asm/pgalloc.h>
> > @@ -2899,6 +2900,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> >       /* Caller disabled irqs, so they are still disabled here */
> >
> >       split_page_owner(head, nr);
> > +     pgalloc_tag_split(head, nr);
> >
> >       /* See comment in __split_huge_page_tail() */
> >       if (PageAnon(head)) {
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 58c0e8b948a4..4bc5b4720fee 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2621,6 +2621,7 @@ void split_page(struct page *page, unsigned int order)
> >       for (i = 1; i < (1 << order); i++)
> >               set_page_refcounted(page + i);
> >       split_page_owner(page, 1 << order);
> > +     pgalloc_tag_split(page, 1 << order);
> >       split_page_memcg(page, 1 << order);
> >  }
> >  EXPORT_SYMBOL_GPL(split_page);
> > @@ -4806,6 +4807,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
> >               struct page *last = page + nr;
> >
> >               split_page_owner(page, 1 << order);
> > +             pgalloc_tag_split(page, 1 << order);
> >               split_page_memcg(page, 1 << order);
> >               while (page < --last)
> >                       set_page_refcounted(last);

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 15/36] lib: introduce support for page allocation tagging
  2024-02-27  9:30       ` Vlastimil Babka
  2024-02-27  9:45         ` Kent Overstreet
@ 2024-02-27 16:55         ` Suren Baghdasaryan
  1 sibling, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-27 16:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Tue, Feb 27, 2024 at 1:30 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
>
>
> On 2/26/24 18:11, Suren Baghdasaryan wrote:
> > On Mon, Feb 26, 2024 at 9:07 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >>
> >> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> >>> Introduce helper functions to easily instrument page allocators by
> >>> storing a pointer to the allocation tag associated with the code that
> >>> allocated the page in a page_ext field.
> >>>
> >>> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> >>> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
> >>> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> >>
> >> The static key usage seems fine now. Even if the page_ext overhead is still
> >> always paid when compiled in, you mention in the cover letter there's a plan
> >> for boot-time toggle later, so
> >
> > Yes, I already have a simple patch for that to be included in the next
> > revision: https://github.com/torvalds/linux/commit/7ca367e80232345f471b77b3ea71cf82faf50954
>
> This opt-out logic would require a distro kernel with allocation
> profiling compiled-in to ship together with something that modifies
> kernel command line to disable it by default, so it's not very
> practical. Could the CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT be
> turned into having 3 possible choices, where one of them would
> initialize mem_profiling_enabled to false?

I was thinking about a similar approach of having the early boot
parameter to be a tri-state with "0 | 1 | Never". The default option
would be "Never" if CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n
and "1" if CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y. Would that
solve the problem for distributions?

>
> Or, taking a step back, is it going to be a common usecase to pay the
> memory overhead unconditionally, but only enable the profiling later
> during runtime?

I think that would be the option one would use in the early
deployments, to be able to enable the feature on specific devices
without a reboot. Pasha brought up also an option when we disable the
feature initially (via early boot option) but can enable it and reboot
the system that will come up with enabled option.

As Kent mentioned, he has been working on a pointer compression
mechanism to cut the overhead of each codtag reference from one
pointer (8 bytes) to 2 bytes index. I'm yet to check the performance
but if that works and we can fit this index into page flags, that
would completely eliminate dependency on page_ext and this memory
overhead will be gone. This mechanism is not mature enough and I don't
want to include these optimizations into the initial patchset, that's
why it's not included in this patchset.

> Also what happens if someone would enable and disable it
> multiple times during one boot? Would the statistics get all skewed
> because some frees would be not accounted while it's disabled?

Yes and this was discussed during last LSFMM when the runtime control
was brought up for the first time. That loss of accounting while the
feature is disabled seems to be expected and acceptable. One could
snapshot the state before re-enabling the feature and then compare
later results with the initial snapshot to figure out the allocation
growth.

>
> >>
> >> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> >
> > Thanks!
> >
> >>
> >>
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling
  2024-02-21 19:40 ` [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling Suren Baghdasaryan
  2024-02-21 23:05   ` Kees Cook
@ 2024-02-28  8:29   ` Vlastimil Babka
  2024-02-28 18:05     ` Suren Baghdasaryan
  2024-02-28  8:41   ` Vlastimil Babka
  2 siblings, 1 reply; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-28  8:29 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> 
> +static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
> +{
> + __alloc_tag_sub(ref, bytes);
> +}
> +
> +static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes)
> +{
> + __alloc_tag_sub(ref, bytes);
> +}
> +

Nit: just notice these are now the same and maybe you could just drop both
wrappers and rename __alloc_tag_sub to alloc_tag_sub?

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling
  2024-02-21 19:40 ` [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling Suren Baghdasaryan
  2024-02-21 23:05   ` Kees Cook
  2024-02-28  8:29   ` Vlastimil Babka
@ 2024-02-28  8:41   ` Vlastimil Babka
  2024-02-28 18:07     ` Suren Baghdasaryan
  2 siblings, 1 reply; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-28  8:41 UTC (permalink / raw)
  To: Suren Baghdasaryan, akpm
  Cc: kent.overstreet, mhocko, hannes, roman.gushchin, mgorman, dave,
	willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

Another thing I noticed, dunno how critical

On 2/21/24 20:40, Suren Baghdasaryan wrote:
> +static inline void __alloc_tag_sub(union codetag_ref *ref, size_t bytes)
> +{
> +	struct alloc_tag *tag;
> +
> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
> +	WARN_ONCE(ref && !ref->ct, "alloc_tag was not set\n");
> +#endif
> +	if (!ref || !ref->ct)
> +		return;

This is quite careful.

> +
> +	tag = ct_to_alloc_tag(ref->ct);
> +
> +	this_cpu_sub(tag->counters->bytes, bytes);
> +	this_cpu_dec(tag->counters->calls);
> +
> +	ref->ct = NULL;
> +}
> +
> +static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
> +{
> +	__alloc_tag_sub(ref, bytes);
> +}
> +
> +static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes)
> +{
> +	__alloc_tag_sub(ref, bytes);
> +}
> +
> +static inline void alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag *tag)
> +{
> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
> +	WARN_ONCE(ref && ref->ct,
> +		  "alloc_tag was not cleared (got tag for %s:%u)\n",\
> +		  ref->ct->filename, ref->ct->lineno);
> +
> +	WARN_ONCE(!tag, "current->alloc_tag not set");
> +#endif
> +	if (!ref || !tag)
> +		return;

This too.

> +
> +	ref->ct = &tag->ct;
> +	/*
> +	 * We need in increment the call counter every time we have a new
> +	 * allocation or when we split a large allocation into smaller ones.
> +	 * Each new reference for every sub-allocation needs to increment call
> +	 * counter because when we free each part the counter will be decremented.
> +	 */
> +	this_cpu_inc(tag->counters->calls);
> +}
> +
> +static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag, size_t bytes)
> +{
> +	alloc_tag_ref_set(ref, tag);

We might have returned from alloc_tag_ref_set() due to !tag

> +	this_cpu_add(tag->counters->bytes, bytes);

But here we still assume it's valid.

> +}
> +


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 19/36] mm: create new codetag references during page splitting
  2024-02-27 16:38     ` Suren Baghdasaryan
@ 2024-02-28  8:47       ` Vlastimil Babka
  2024-02-28 17:50         ` Suren Baghdasaryan
  0 siblings, 1 reply; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-28  8:47 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/27/24 17:38, Suren Baghdasaryan wrote:
> On Tue, Feb 27, 2024 at 2:10 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 2/21/24 20:40, Suren Baghdasaryan wrote:
>> > When a high-order page is split into smaller ones, each newly split
>> > page should get its codetag. The original codetag is reused for these
>> > pages but it's recorded as 0-byte allocation because original codetag
>> > already accounts for the original high-order allocated page.
>>
>> This was v3 but then you refactored (for the better) so the commit log
>> could reflect it?
> 
> Yes, technically mechnism didn't change but I should word it better.
> Smth like this:
> 
> When a high-order page is split into smaller ones, each newly split
> page should get its codetag. After the split each split page will be
> referencing the original codetag. The codetag's "bytes" counter
> remains the same because the amount of allocated memory has not
> changed, however the "calls" counter gets increased to keep the
> counter correct when these individual pages get freed.

Great, thanks.
The concern with __free_pages() is not really related to splitting, so for
this patch:

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> 
>>
>> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>>
>> I was going to R-b, but now I recalled the trickiness of
>> __free_pages() for non-compound pages if it loses the race to a
>> speculative reference. Will the codetag handling work fine there?
> 
> I think so. Each non-compoud page has its individual reference to its
> codetag and will decrement it whenever the page is freed. IIUC the
> logic in  __free_pages(), when it loses race to a speculative
> reference it will free all pages except for the first one and the

The "tail" pages of this non-compound high-order page will AFAICS not have
code tags assigned, so alloc_tag_sub() will be a no-op (or a warning with
_DEBUG).

> first one will be freed when the last put_page() happens. If prior to
> this all these pages were split from one page then all of them will
> have their own reference which points to the same codetag.

Yeah I'm assuming there's no split before the freeing. This patch about
splitting just reminded me of that tricky freeing scenario.

So IIUC the "else if (!head)" path of __free_pages() will do nothing about
the "tail" pages wrt code tags as there are no code tags.
Then whoever took the speculative "head" page reference will put_page() and
free it, which will end up in alloc_tag_sub(). This will decrement calls
properly, but bytes will become imbalanced, because that put_page() will
pass order-0 worth of bytes - the original order is lost.

Now this might be rare enough that it's not worth fixing if that would be
too complicated, just FYI.


> Every time
> one of these pages are freed that codetag's "bytes" and "calls"
> counters will be decremented. I think accounting will work correctly
> irrespective of where these pages are freed, in __free_pages() or by
> put_page().
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 19/36] mm: create new codetag references during page splitting
  2024-02-28  8:47       ` Vlastimil Babka
@ 2024-02-28 17:50         ` Suren Baghdasaryan
  2024-02-28 18:28           ` Vlastimil Babka
  0 siblings, 1 reply; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-28 17:50 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 28, 2024 at 12:47 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/27/24 17:38, Suren Baghdasaryan wrote:
> > On Tue, Feb 27, 2024 at 2:10 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >>
> >> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> >> > When a high-order page is split into smaller ones, each newly split
> >> > page should get its codetag. The original codetag is reused for these
> >> > pages but it's recorded as 0-byte allocation because original codetag
> >> > already accounts for the original high-order allocated page.
> >>
> >> This was v3 but then you refactored (for the better) so the commit log
> >> could reflect it?
> >
> > Yes, technically mechnism didn't change but I should word it better.
> > Smth like this:
> >
> > When a high-order page is split into smaller ones, each newly split
> > page should get its codetag. After the split each split page will be
> > referencing the original codetag. The codetag's "bytes" counter
> > remains the same because the amount of allocated memory has not
> > changed, however the "calls" counter gets increased to keep the
> > counter correct when these individual pages get freed.
>
> Great, thanks.
> The concern with __free_pages() is not really related to splitting, so for
> this patch:
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>
> >
> >>
> >> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> >>
> >> I was going to R-b, but now I recalled the trickiness of
> >> __free_pages() for non-compound pages if it loses the race to a
> >> speculative reference. Will the codetag handling work fine there?
> >
> > I think so. Each non-compoud page has its individual reference to its
> > codetag and will decrement it whenever the page is freed. IIUC the
> > logic in  __free_pages(), when it loses race to a speculative
> > reference it will free all pages except for the first one and the
>
> The "tail" pages of this non-compound high-order page will AFAICS not have
> code tags assigned, so alloc_tag_sub() will be a no-op (or a warning with
> _DEBUG).

Yes, that is correct.

>
> > first one will be freed when the last put_page() happens. If prior to
> > this all these pages were split from one page then all of them will
> > have their own reference which points to the same codetag.
>
> Yeah I'm assuming there's no split before the freeing. This patch about
> splitting just reminded me of that tricky freeing scenario.

Ah, I see. I thought you were talking about a page that was previously split.

>
> So IIUC the "else if (!head)" path of __free_pages() will do nothing about
> the "tail" pages wrt code tags as there are no code tags.
> Then whoever took the speculative "head" page reference will put_page() and
> free it, which will end up in alloc_tag_sub(). This will decrement calls
> properly, but bytes will become imbalanced, because that put_page() will
> pass order-0 worth of bytes - the original order is lost.

Yeah, that's true. put_page() will end up calling
free_unref_page(&folio->page, 0) even if the original order was more
than 0.

>
> Now this might be rare enough that it's not worth fixing if that would be
> too complicated, just FYI.

Yeah. We can fix this by subtracting the "bytes" counter of the "head"
page for all free_the_page(page + (1 << order), order) calls we do
inside __free_pages(). But we can't simply use pgalloc_tag_sub()
because the "calls" counter will get over-decremented (we allocated
all of these pages with one call). I'll need to introduce a new
pgalloc_tag_sub_bytes() API and use it here. I feel it's too targeted
of a solution but OTOH this is a special situation, so maybe it's
acceptable. WDYT?

>
>
> > Every time
> > one of these pages are freed that codetag's "bytes" and "calls"
> > counters will be decremented. I think accounting will work correctly
> > irrespective of where these pages are freed, in __free_pages() or by
> > put_page().
> >
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling
  2024-02-28  8:29   ` Vlastimil Babka
@ 2024-02-28 18:05     ` Suren Baghdasaryan
  0 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-28 18:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 28, 2024 at 8:29 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> >
> > +static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
> > +{
> > + __alloc_tag_sub(ref, bytes);
> > +}
> > +
> > +static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes)
> > +{
> > + __alloc_tag_sub(ref, bytes);
> > +}
> > +
>
> Nit: just notice these are now the same and maybe you could just drop both
> wrappers and rename __alloc_tag_sub to alloc_tag_sub?

Ack.

>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling
  2024-02-28  8:41   ` Vlastimil Babka
@ 2024-02-28 18:07     ` Suren Baghdasaryan
  0 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-28 18:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 28, 2024 at 8:41 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> Another thing I noticed, dunno how critical
>
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > +static inline void __alloc_tag_sub(union codetag_ref *ref, size_t bytes)
> > +{
> > +     struct alloc_tag *tag;
> > +
> > +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
> > +     WARN_ONCE(ref && !ref->ct, "alloc_tag was not set\n");
> > +#endif
> > +     if (!ref || !ref->ct)
> > +             return;
>
> This is quite careful.
>
> > +
> > +     tag = ct_to_alloc_tag(ref->ct);
> > +
> > +     this_cpu_sub(tag->counters->bytes, bytes);
> > +     this_cpu_dec(tag->counters->calls);
> > +
> > +     ref->ct = NULL;
> > +}
> > +
> > +static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
> > +{
> > +     __alloc_tag_sub(ref, bytes);
> > +}
> > +
> > +static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes)
> > +{
> > +     __alloc_tag_sub(ref, bytes);
> > +}
> > +
> > +static inline void alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag *tag)
> > +{
> > +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
> > +     WARN_ONCE(ref && ref->ct,
> > +               "alloc_tag was not cleared (got tag for %s:%u)\n",\
> > +               ref->ct->filename, ref->ct->lineno);
> > +
> > +     WARN_ONCE(!tag, "current->alloc_tag not set");
> > +#endif
> > +     if (!ref || !tag)
> > +             return;
>
> This too.
>
> > +
> > +     ref->ct = &tag->ct;
> > +     /*
> > +      * We need in increment the call counter every time we have a new
> > +      * allocation or when we split a large allocation into smaller ones.
> > +      * Each new reference for every sub-allocation needs to increment call
> > +      * counter because when we free each part the counter will be decremented.
> > +      */
> > +     this_cpu_inc(tag->counters->calls);
> > +}
> > +
> > +static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag, size_t bytes)
> > +{
> > +     alloc_tag_ref_set(ref, tag);
>
> We might have returned from alloc_tag_ref_set() due to !tag
>
> > +     this_cpu_add(tag->counters->bytes, bytes);
>
> But here we still assume it's valid.

Yes, this is a blunder on my side after splitting alloc_tag_ref_set()
into a separate function. I'll fix this in the next version. Thanks!

>
> > +}
> > +
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 19/36] mm: create new codetag references during page splitting
  2024-02-28 17:50         ` Suren Baghdasaryan
@ 2024-02-28 18:28           ` Vlastimil Babka
  2024-02-28 18:38             ` Suren Baghdasaryan
  0 siblings, 1 reply; 98+ messages in thread
From: Vlastimil Babka @ 2024-02-28 18:28 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On 2/28/24 18:50, Suren Baghdasaryan wrote:
> On Wed, Feb 28, 2024 at 12:47 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> 
>>
>> Now this might be rare enough that it's not worth fixing if that would be
>> too complicated, just FYI.
> 
> Yeah. We can fix this by subtracting the "bytes" counter of the "head"
> page for all free_the_page(page + (1 << order), order) calls we do
> inside __free_pages(). But we can't simply use pgalloc_tag_sub()
> because the "calls" counter will get over-decremented (we allocated
> all of these pages with one call). I'll need to introduce a new
> pgalloc_tag_sub_bytes() API and use it here. I feel it's too targeted
> of a solution but OTOH this is a special situation, so maybe it's
> acceptable. WDYT?

Hmm I think there's a problem that once you fail put_page_testzero() and
detect you need to do this, the page might be already gone or reallocated so
you can't get to the tag for decrementing bytes. You'd have to get it
upfront (I guess for "head && order > 0" cases) just in case it happens.
Maybe it's not worth the trouble for such a rare case.

>>
>>
>> > Every time
>> > one of these pages are freed that codetag's "bytes" and "calls"
>> > counters will be decremented. I think accounting will work correctly
>> > irrespective of where these pages are freed, in __free_pages() or by
>> > put_page().
>> >
>>


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 19/36] mm: create new codetag references during page splitting
  2024-02-28 18:28           ` Vlastimil Babka
@ 2024-02-28 18:38             ` Suren Baghdasaryan
  0 siblings, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-02-28 18:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, kent.overstreet, mhocko, hannes, roman.gushchin, mgorman,
	dave, willy, liam.howlett, penguin-kernel, corbet, void, peterz,
	juri.lelli, catalin.marinas, will, arnd, tglx, mingo,
	dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
	nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
	yosryahmed, yuzhao, dhowells, hughd, andreyknvl, keescook,
	ndesaulniers, vvvvvv, gregkh, ebiggers, ytcoode, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, vschneid, cl,
	penberg, iamjoonsoo.kim, 42.hyeyoo, glider, elver, dvyukov,
	shakeelb, songmuchun, jbaron, rientjes, minchan, kaleshsingh,
	kernel-team, linux-doc, linux-kernel, iommu, linux-arch,
	linux-fsdevel, linux-mm, linux-modules, kasan-dev, cgroups

On Wed, Feb 28, 2024 at 10:28 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/28/24 18:50, Suren Baghdasaryan wrote:
> > On Wed, Feb 28, 2024 at 12:47 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> >>
> >> Now this might be rare enough that it's not worth fixing if that would be
> >> too complicated, just FYI.
> >
> > Yeah. We can fix this by subtracting the "bytes" counter of the "head"
> > page for all free_the_page(page + (1 << order), order) calls we do
> > inside __free_pages(). But we can't simply use pgalloc_tag_sub()
> > because the "calls" counter will get over-decremented (we allocated
> > all of these pages with one call). I'll need to introduce a new
> > pgalloc_tag_sub_bytes() API and use it here. I feel it's too targeted
> > of a solution but OTOH this is a special situation, so maybe it's
> > acceptable. WDYT?
>
> Hmm I think there's a problem that once you fail put_page_testzero() and
> detect you need to do this, the page might be already gone or reallocated so
> you can't get to the tag for decrementing bytes. You'd have to get it
> upfront (I guess for "head && order > 0" cases) just in case it happens.
> Maybe it's not worth the trouble for such a rare case.

Yes, that hit me when I tried to implement it but there is a simple
solution around that. I can obtain alloc_tag before doing
put_page_testzero() and then decrement bytes counter directly as
needed.
Not sure if it is a rare enough case that we can ignore it but if the
fix is simple enough then might as well do it?

>
> >>
> >>
> >> > Every time
> >> > one of these pages are freed that codetag's "bytes" and "calls"
> >> > counters will be decremented. I think accounting will work correctly
> >> > irrespective of where these pages are freed, in __free_pages() or by
> >> > put_page().
> >> >
> >>
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 13/36] lib: prevent module unloading if memory is not freed
  2024-02-26 16:58   ` Vlastimil Babka
  2024-02-26 17:13     ` Suren Baghdasaryan
@ 2024-03-12 18:23     ` Luis Chamberlain
  2024-03-12 19:56       ` Suren Baghdasaryan
  2024-03-12 20:07       ` Kent Overstreet
  1 sibling, 2 replies; 98+ messages in thread
From: Luis Chamberlain @ 2024-03-12 18:23 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, akpm, kent.overstreet, mhocko, hannes,
	roman.gushchin, mgorman, dave, willy, liam.howlett,
	penguin-kernel, corbet, void, peterz, juri.lelli,
	catalin.marinas, will, arnd, tglx, mingo, dave.hansen, x86,
	peterx, david, axboe, masahiroy, nathan, dennis, tj, muchun.song,
	rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao, dhowells,
	hughd, andreyknvl, keescook, ndesaulniers, vvvvvv, gregkh,
	ebiggers, ytcoode, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, bristot, vschneid, cl, penberg, iamjoonsoo.kim,
	42.hyeyoo, glider, elver, dvyukov, shakeelb, songmuchun, jbaron,
	rientjes, minchan, kaleshsingh, kernel-team, linux-doc,
	linux-kernel, iommu, linux-arch, linux-fsdevel, linux-mm,
	linux-modules, kasan-dev, cgroups

On Mon, Feb 26, 2024 at 05:58:40PM +0100, Vlastimil Babka wrote:
> On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > Skip freeing module's data section if there are non-zero allocation tags
> > because otherwise, once these allocations are freed, the access to their
> > code tag would cause UAF.
> > 
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> 
> I know that module unloading was never considered really supported etc.

If its not supported then we should not have it on modules. Module
loading and unloading should just work, otherwise then this should not
work with modules and leave them in a zombie state.

  Luis

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 13/36] lib: prevent module unloading if memory is not freed
  2024-03-12 18:23     ` Luis Chamberlain
@ 2024-03-12 19:56       ` Suren Baghdasaryan
  2024-03-12 20:07       ` Kent Overstreet
  1 sibling, 0 replies; 98+ messages in thread
From: Suren Baghdasaryan @ 2024-03-12 19:56 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Vlastimil Babka, akpm, kent.overstreet, mhocko, hannes,
	roman.gushchin, mgorman, dave, willy, liam.howlett,
	penguin-kernel, corbet, void, peterz, juri.lelli,
	catalin.marinas, will, arnd, tglx, mingo, dave.hansen, x86,
	peterx, david, axboe, masahiroy, nathan, dennis, tj, muchun.song,
	rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao, dhowells,
	hughd, andreyknvl, keescook, ndesaulniers, vvvvvv, gregkh,
	ebiggers, ytcoode, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, bristot, vschneid, cl, penberg, iamjoonsoo.kim,
	42.hyeyoo, glider, elver, dvyukov, shakeelb, songmuchun, jbaron,
	rientjes, minchan, kaleshsingh, kernel-team, linux-doc,
	linux-kernel, iommu, linux-arch, linux-fsdevel, linux-mm,
	linux-modules, kasan-dev, cgroups

On Tue, Mar 12, 2024 at 11:23 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Mon, Feb 26, 2024 at 05:58:40PM +0100, Vlastimil Babka wrote:
> > On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > > Skip freeing module's data section if there are non-zero allocation tags
> > > because otherwise, once these allocations are freed, the access to their
> > > code tag would cause UAF.
> > >
> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> >
> > I know that module unloading was never considered really supported etc.
>
> If its not supported then we should not have it on modules. Module
> loading and unloading should just work, otherwise then this should not
> work with modules and leave them in a zombie state.

I replied on the v5 thread here:
https://lore.kernel.org/all/20240306182440.2003814-13-surenb@google.com/
. Let's continue the discussion in that thread. Thanks!

>
>   Luis

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v4 13/36] lib: prevent module unloading if memory is not freed
  2024-03-12 18:23     ` Luis Chamberlain
  2024-03-12 19:56       ` Suren Baghdasaryan
@ 2024-03-12 20:07       ` Kent Overstreet
  1 sibling, 0 replies; 98+ messages in thread
From: Kent Overstreet @ 2024-03-12 20:07 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Vlastimil Babka, Suren Baghdasaryan, akpm, mhocko, hannes,
	roman.gushchin, mgorman, dave, willy, liam.howlett,
	penguin-kernel, corbet, void, peterz, juri.lelli,
	catalin.marinas, will, arnd, tglx, mingo, dave.hansen, x86,
	peterx, david, axboe, masahiroy, nathan, dennis, tj, muchun.song,
	rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao, dhowells,
	hughd, andreyknvl, keescook, ndesaulniers, vvvvvv, gregkh,
	ebiggers, ytcoode, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, bristot, vschneid, cl, penberg, iamjoonsoo.kim,
	42.hyeyoo, glider, elver, dvyukov, shakeelb, songmuchun, jbaron,
	rientjes, minchan, kaleshsingh, kernel-team, linux-doc,
	linux-kernel, iommu, linux-arch, linux-fsdevel, linux-mm,
	linux-modules, kasan-dev, cgroups

On Tue, Mar 12, 2024 at 11:23:45AM -0700, Luis Chamberlain wrote:
> On Mon, Feb 26, 2024 at 05:58:40PM +0100, Vlastimil Babka wrote:
> > On 2/21/24 20:40, Suren Baghdasaryan wrote:
> > > Skip freeing module's data section if there are non-zero allocation tags
> > > because otherwise, once these allocations are freed, the access to their
> > > code tag would cause UAF.
> > > 
> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > 
> > I know that module unloading was never considered really supported etc.
> 
> If its not supported then we should not have it on modules. Module
> loading and unloading should just work, otherwise then this should not
> work with modules and leave them in a zombie state.

Not have memory allocation profiling on modules?

^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2024-03-12 20:08 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-21 19:40 [PATCH v4 00/36] Memory allocation profiling Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 01/36] fix missing vmalloc.h includes Suren Baghdasaryan
2024-02-21 21:09   ` Pasha Tatashin
2024-02-21 19:40 ` [PATCH v4 02/36] asm-generic/io.h: Kill vmalloc.h dependency Suren Baghdasaryan
2024-02-21 21:11   ` Pasha Tatashin
2024-02-21 19:40 ` [PATCH v4 03/36] mm/slub: Mark slab_free_freelist_hook() __always_inline Suren Baghdasaryan
2024-02-21 21:15   ` Pasha Tatashin
2024-02-24  2:02     ` Suren Baghdasaryan
2024-02-26 14:31       ` Vlastimil Babka
2024-02-26 15:21         ` Pasha Tatashin
2024-02-26 16:09           ` Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 04/36] scripts/kallysms: Always include __start and __stop symbols Suren Baghdasaryan
2024-02-21 21:20   ` Pasha Tatashin
2024-02-21 19:40 ` [PATCH v4 05/36] fs: Convert alloc_inode_sb() to a macro Suren Baghdasaryan
2024-02-21 21:23   ` Pasha Tatashin
2024-02-26 15:44   ` Vlastimil Babka
2024-02-26 17:48     ` Suren Baghdasaryan
2024-02-26 20:50     ` Kent Overstreet
2024-02-21 19:40 ` [PATCH v4 06/36] mm: enumerate all gfp flags Suren Baghdasaryan
2024-02-21 21:25   ` Pasha Tatashin
2024-02-22 12:12   ` Michal Hocko
2024-02-22 12:24     ` Petr Tesařík
2024-02-23 19:26       ` Suren Baghdasaryan
2024-02-24  1:59         ` Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 07/36] mm: introduce slabobj_ext to support slab object extensions Suren Baghdasaryan
2024-02-21 21:30   ` Pasha Tatashin
2024-02-26 16:26   ` Vlastimil Babka
2024-02-26 17:22     ` Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 08/36] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation Suren Baghdasaryan
2024-02-22  0:08   ` Pasha Tatashin
2024-02-26 16:51   ` Vlastimil Babka
2024-02-21 19:40 ` [PATCH v4 09/36] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation Suren Baghdasaryan
2024-02-22  0:09   ` Pasha Tatashin
2024-02-26 16:52   ` Vlastimil Babka
2024-02-21 19:40 ` [PATCH v4 10/36] slab: objext: introduce objext_flags as extension to page_memcg_data_flags Suren Baghdasaryan
2024-02-22  0:12   ` Pasha Tatashin
2024-02-26 16:53   ` Vlastimil Babka
2024-02-21 19:40 ` [PATCH v4 11/36] lib: code tagging framework Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 12/36] lib: code tagging module support Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 13/36] lib: prevent module unloading if memory is not freed Suren Baghdasaryan
2024-02-26 16:58   ` Vlastimil Babka
2024-02-26 17:13     ` Suren Baghdasaryan
2024-03-12 18:23     ` Luis Chamberlain
2024-03-12 19:56       ` Suren Baghdasaryan
2024-03-12 20:07       ` Kent Overstreet
2024-02-21 19:40 ` [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling Suren Baghdasaryan
2024-02-21 23:05   ` Kees Cook
2024-02-21 23:29     ` Kent Overstreet
2024-02-22  0:25       ` Kees Cook
2024-02-22  0:34         ` Kent Overstreet
2024-02-22  0:57           ` Kees Cook
2024-02-28  8:29   ` Vlastimil Babka
2024-02-28 18:05     ` Suren Baghdasaryan
2024-02-28  8:41   ` Vlastimil Babka
2024-02-28 18:07     ` Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 15/36] lib: introduce support for page allocation tagging Suren Baghdasaryan
2024-02-26 17:07   ` Vlastimil Babka
2024-02-26 17:11     ` Suren Baghdasaryan
2024-02-27  9:30       ` Vlastimil Babka
2024-02-27  9:45         ` Kent Overstreet
2024-02-27 16:55         ` Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 16/36] mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation tags Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 17/36] change alloc_pages name in dma_map_ops to avoid name conflicts Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 18/36] mm: enable page allocation tagging Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 19/36] mm: create new codetag references during page splitting Suren Baghdasaryan
2024-02-27 10:11   ` Vlastimil Babka
2024-02-27 16:38     ` Suren Baghdasaryan
2024-02-28  8:47       ` Vlastimil Babka
2024-02-28 17:50         ` Suren Baghdasaryan
2024-02-28 18:28           ` Vlastimil Babka
2024-02-28 18:38             ` Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 20/36] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y Suren Baghdasaryan
2024-02-27 10:18   ` Vlastimil Babka
2024-02-21 19:40 ` [PATCH v4 21/36] lib: add codetag reference into slabobj_ext Suren Baghdasaryan
2024-02-27 10:19   ` Vlastimil Babka
2024-02-21 19:40 ` [PATCH v4 22/36] mm/slab: add allocation accounting into slab allocation and free paths Suren Baghdasaryan
2024-02-27 13:07   ` Vlastimil Babka
2024-02-27 16:15     ` Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 23/36] mm/slab: enable slab allocation tagging for kmalloc and friends Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 24/36] rust: Add a rust helper for krealloc() Suren Baghdasaryan
2024-02-22  9:59   ` Alice Ryhl
2024-02-23 22:17     ` Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 25/36] mempool: Hook up to memory allocation profiling Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 26/36] mm: percpu: Introduce pcpuobj_ext Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 27/36] mm: percpu: Add codetag reference into pcpuobj_ext Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 28/36] mm: percpu: enable per-cpu allocation tagging Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 29/36] mm: vmalloc: Enable memory allocation profiling Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 30/36] rhashtable: Plumb through alloc tag Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 31/36] lib: add memory allocations report in show_mem() Suren Baghdasaryan
2024-02-27 13:19   ` Vlastimil Babka
2024-02-27 16:12     ` Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 32/36] codetag: debug: skip objext checking when it's for objext itself Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 33/36] codetag: debug: mark codetags for reserved pages as empty Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 34/36] codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 35/36] MAINTAINERS: Add entries for code tagging and memory allocation profiling Suren Baghdasaryan
2024-02-21 19:40 ` [PATCH v4 36/36] memprofiling: Documentation Suren Baghdasaryan
2024-02-27 13:36 ` [PATCH v4 00/36] Memory allocation profiling Vlastimil Babka
2024-02-27 16:10   ` Suren Baghdasaryan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.